Remainder mainly to myself: short list of useful options of wget for recursive downloading of dynamic (PHP, ASP, ...) webpages (because wget's man page is too long):
--no-clobber: do not redownload pages that already exist locally.--html-extension: append extension.htmlto webpages of which the URL does not end on.htmlor.htmbut with things like.phpor.php?q=boo&bar=4.--recursive: turn on recursive downloading.--level=3: set the recursion depth.--convert-links: make the links in downloaded documents point to local files if possible.--page-requisites: download embedded images and stylesheets for each downloaded html document.--relative: only follow relative links, not absolute links (even if in the same domain).--no-parent: do not ascend to parent directory of the given URL while recursively retrieving.
Thanks Stefaan for your blog // Mr Mizzen is awesome.
I used the script provided by Mr Mizzen and all I can say is this - it gives a neat face to wget - makes it all easy to do offline browsing ; even better than webhttrack.
Great job Mr Mizzen!
wget work on http://www.decasasyautos.com
Thanks, we use wget to test http://www.decasasyautos.com and it works.
Mark
http://www.decasasyautos.com
Wget recursive
I am using Wget recursive to download content from websites. However I want all the files to be saved with the absolute urls as the file name.
For example http://www.whatever.com/whatever1/whatever2
Can someone help me with this?
Thanks,
M
have You ever wondered how to download photos from a page like..
:) have You ever tried to download photos from pages like http://dermatlas.med.jhmi.edu/derm/ using wget
if jest please hand me a tip:)
take care
Easy when you find it.....
Many thanks for the solution to php links.
I have added you suggestions to my site-ripper script and it works very well.
As a thank you, here is the script with zenity dialog and the desktop file. Very handy to click this, check off what you want, enter the url and let it go....
#!/bin/sh
# export FCBASE=`pwd`
STDOUT=`mktemp`
#
# Place site-ripper.desktop file in /usr/share/applications
# Place site-ripper in /usr/bin
#
################ Begin intro #################################################
zenity --title "Welcome to: Mr. Mizzen's Site Ripper Script" \
--width=700 \
--height=370 \
--list \
--checklist \
--column " " \
--column " Item " \
--column " Description " \
--checklist \
--multiple \
TRUE recurse " -r recursively get files from page(s)" \
TRUE noclobber " -nc Use the No Clobber option" \
TRUE noparent " -np Do not save partent directory structure" \
TRUE robots " -e robots=off Ignore the robots instructions" \
FALSE span " -H Span hosts "\
FALSE conver " --convert-links: make the links in downloaded documents point to local files if possible." \
TRUE html " --html-extension: append extension .html to webpages like .php or .php?q=boo&bar=4." \
TRUE page " --page-requisites: download embedded images and stylesheets for each downloaded html document." \
TRUE relative " --relative: only follow relative links, not absolute links (even if in the same domain)." \
> $STDOUT
####################### Test for exit #################################
# True = 1, False =0
if [ $? -eq 0 ] ; then
cancelsetup=0 # false
cancelyesno="no do not cancel, continue"
else
cancelsetup=1 # true
cancelyesno="yes cancel"
echo "You selected Cancel"
exit 0
fi
echo "<------------- Here we go! --------------> "
starts=`date +%s`
#####################################################################
levels=$(zenity --entry --text "Levels to drill down? (Defaul of 1 will get a page, 0 is endless) " --entry-text "1")
site=$(zenity --entry --text "Site URL " --entry-text "")
# Setup all the variables now..............
############################### Update data section #################
if grep recurse $STDOUT > /dev/null ; then
recurse="-r"
else
recurse=""
fi
if grep noclobber $STDOUT > /dev/null ; then
noclobber="-nc"
else
noclobber=""
fi
if grep noparent $STDOUT > /dev/null ; then
noparent="-np"
else
noparent=""
fi
if grep robots $STDOUT > /dev/null ; then
robots="-e robots=off"
else
robots=""
fi
if grep span $STDOUT > /dev/null ; then
span="-H"
else
span=""
fi
if grep conver $STDOUT > /dev/null ; then
conver="--convert-links"
else
conver=""
fi
if grep html $STDOUT > /dev/null ; then
html="--html-extension"
else
html=""
fi
if grep page $STDOUT > /dev/null ; then
page="--page-requisites"
else
page=""
fi
if grep relative $STDOUT > /dev/null ; then
relative="--relative"
else
relative=""
fi
#####################################################################
echo " This is the command line"
echo wget $recurse $noclobber $noparent $conver $html $page $relative --tries=2 $span -l $levels $robots '--user-agent="Microsoft Internet Explorer"' $site
wget $recurse $noclobber $noparent $conver $html $page $relative --tries=2 $span -l $levels $robots '--user-agent="Microsoft Internet Explorer"' $site
exit 0
The desktop file data - Save as site-ripper.desktop
[Desktop Entry]
Encoding=UTF-8
Name=Site-Ripper
Comment=Rip Web Sites
Exec=site-ripper
Icon=gnome-color-browser
OnlyShowIn=GNOME;XFCE;
Terminal=true
Type=Application
StartupNotify=true
Categories=GNOME;GTK;Utility;Site-Ripper
Post new comment