Jump to content
PDS Geosciences Node Community

Using Wget for downloading PDS Geosciences Node archive directories or ODE cart requests


Dan Scholes

Recommended Posts

Below I have included example Wget commands for downloading files from the PDS Geosciences Node. The first example demonstrates downloading a PDS data set from the PDS Geosciences Node archive. The second example demonstrates using Wget to download an Orbital Data Explorer (ODE) cart request.

Dan Scholes 2/20/18
Example of downloading data files using 
links from HTTP PDS Geosciences Node Data Archive
or Orbital Data Explorer (ODE) Cart location

Note: In the future, changes to the PDS Geosciences Node website and Orbital Data Explorer website may cause this example to no longer function.
Disclaimer: This sample code is provided "as is", without warranty of any kind, express or implied. In no event shall the author be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the sample code or the use or other dealings with the sample code.

Wget website: https://www.gnu.org/software/wget/

Example of downloading a PDS Geosciences Node archive subdirectory
wget -rkpN -P c:\temp\data -nH --cut-dirs 2 --level=15 --no-parent --reject "index.html*" -e robots=off http://pds-geosciences.wustl.edu/mro/mro-m-crism-4-typespec-v1/mrocr_8001/

Example of downloading ODE Cart Request
wget -rkpN -P c:\temp\data -nH --cut-dirs 2 --level=15 --no-parent --reject "index.html*" -e robots=off http://ode.rsl.wustl.edu/cartdownload/data/sample

-r means recursively download files

-k means convert links. Links on the webpage will be localhost instead of example.com/path.

-p means get all webpage resources, so wget will obtain images and javascript files to make website work properly.

-N is to retrieve timestamps, so if local files are newer than files on remote website, the remote files will be skipped.

-P sets the local destination directory for the downloaded files.

-e is a flag option that must be set for the robots=off to work.

robots=off means ignore robots file.

-c allows the command to pick up where it left off if the connection is dropped and the command I re-run.

--no-parent keeps the command from downloading all the files in the directories above the requested level.

--reject "index.html*" keeps wget from downloading every directory's default index.html.

-nH  will disable the generation of the host-prefixed directories. 
	In the example above, a directory ode.rsl.wustl.edu will not be created locally.

--cut-dirs 2  Ignore the count of directory components. Basically, this example will omit the first 2 directory levels from the path it creates locally for the files that are downloaded.
	Example: http://ode.rsl.wustl.edu/cartdownload/data/sample  The first directory in the destination directory will be "sample".

--level=depth
--level=15  Levels to recursively search. The default is 5, but we will need to go farther with ODE cart and PDS Geosciences Archive.

---------------------------------------------------------------------------------------------------------------------------------------------------------
-nd or --no-directories  it is used to put all the requested files in one directory. We are not using this feature, but a user may prefer this option.


 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...