Monday, September 25, 2006

Workaround for globbing or pattern matching on http with Wget

Don't you love wget? The fact that you don't need a browser to save a file.......hmmm....niiiice.

You might have noticed that you can do something like
$wget ftp://ftpserver.com/*.pdf
and it retrieves all *.pdf files for you

However, (you must have expected a catch - there always is) you cannot do the same with http.
i.e.
$wget http://httpserver.com/*.pdf
will give you an error!

This is because http retrieval does not support globbing! (What is globbing? - it is just using the *.pdf or any of the regexp kind of thingies [ ] )
What is your workaround? Use the command below:

$wget -r -l1 --no-parent -A.pdf http://httpserver.com/

-r -l1 means to retrieve recursively , with maximum depth of 1. -
-no-parent means that references to the parent directory are ignored
-A.pdf means to download only the gif files.
-A "*.pdf" would have worked too.

Happy globbing with wget on http servers too from today .... :)

No comments: