Jump to content
PDS Geosciences Node Community

Forums

  1. PDS Geosciences Node

    1. Announcements

      The latest news from the Geosciences Node

      157
      posts
    2. For data providers

      Discussions pertaining to preparing data for archiving with PDS

      4
      posts
    3. For data users

      Questions and comments related to locating, accessing, and using PDS Geo data

      295
      posts
  2. PDS Geo Tools

    1. Analyst's Notebook

      Posts relating to the Analyst's Notebook

      50
      posts
    2. ODE - Orbital Data Explorer

      Posts related to the Orbital Data Explorer

      316
      posts
    3. 4
      posts
    4. Spectral Library

      Posts relating to the PDS Geosciences Node Spectral Library

      1
      post
  3. Workshops

    1. MRO/CRISM Data Users' Workshop 2012

      Posts relating to the 2012 MRO/CRISM Data Users' Workshop.

      2
      posts
  • Recent Topics

    • Below I have included a Python 3.6 script sample, which I hope will help some PDS Geosciences Node users. It demonstrates downloading a PDS data set or an  Orbital Data Explorer (ODE) cart request. The user will need to have Python 3.6 installed. This sample is available for download in a zip file under the downloads section of the forum. The beginning of the script includes variables that should be set by the user. The variables include the local destination directory, target URL, base URL, recursive action (True/False), and verbose reporting. The current variables are functional.   # PDSGeosciencesNode_FileDownload.py # Dan Scholes 2/19/18 # Pypthon 3.6 compatible version # Example of downloading data files using # links from HTTP PDS Geosciences Node Data Archive # or Orbital Data Explorer (ODE) Cart location # Note: One drawback of this script is that it downloads one file at a time, rather than multiple streams. # Additional Note: In the future, changes to the PDS Geosciences Node website and Orbital Data Explorer website may cause this example to no longer function. # Disclaimer: This sample code is provided "as is", without warranty of any kind, express or implied. In no event shall the author be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the sample code or the use or other dealings with the sample code. # Phython download website: https://www.python.org/downloads/ import urllib.request import re import time from pathlib import Path # Variables for user to populate---------- saveFilesToThisDirectory = 'c:/temp/data/' # local destination path to save the files #next two lines are for downloading from the PDS Geosciences Node archive url = "http://pds-geosciences.wustl.edu/mro/mro-m-rss-5-sdp-v1/mrors_1xxx/" #enter the directory you would like to download relativeLinkPathBase = "http://pds-geosciences.wustl.edu" #this is the default location for the relative paths on the website (just leave this value) #next two lines are for downloading an ODE cart request #url = "http://ode.rsl.wustl.edu/cartdownload/data/sample/" #enter the directory you would like to download #relativeLinkPathBase = "http://ode.rsl.wustl.edu/" #this is the default location for the relative paths on the ode cart website (just leave this value) recursiveVal = True # True/False whether to download files in subdirectories of the specified location in the url variable verboseMessages = False # True/False whether to display verbose messages during the script processing # End of variables for user to populate---------- relativeLinkPathBase = relativeLinkPathBase.rstrip('/') maxDownloadAttempts = 3 filesToDownloadList = [] def get_pageLinks(inUrl,inRecursive): if verboseMessages: print("Cataloging Directory: ",inUrl) #directory to process myURLReader = urllib.request.urlopen(inUrl.rstrip('/')) myResults = myURLReader.read().decode('utf-8').replace("<a href=","<A HREF=").replace("</a>","</A>") myURLReader.close() data=myResults.split("</A>") tag="<A HREF=\"" endtag="\">" for item in data: if "<A HREF" in item: try: ind = item.index(tag) item=item[ind+len(tag):] end=item.index(endtag) except: pass else: #The link is found itemToDownload = item[:end] if "." in itemToDownload: #the link is to a file if relativeLinkPathBase not in itemToDownload: #is the path relative, so we add the base url itemToDownload = relativeLinkPathBase + itemToDownload filesToDownloadList.append(itemToDownload) else: # it's a directory, so let's go into it if recursive is chosen if inRecursive: if itemToDownload not in inUrl: #we make sure it isn't a link to parent directory if relativeLinkPathBase not in itemToDownload: itemToDownload = relativeLinkPathBase + itemToDownload # the directory is a subdirectory, so we will follow it if verboseMessages: print("subdirectory to process ", itemToDownload) get_pageLinks(itemToDownload,inRecursive) def download_files(): # download the files that were identified # this is refering to the global list of files to download localSuccessfulDownloads = 0 print("==Downloads starting ==============") for link in filesToDownloadList: downloadAttempts = 0 fileDownloaded = False if verboseMessages: print("downloading file: ",link) local_link = link; local_link = saveFilesToThisDirectory + local_link.replace(relativeLinkPathBase,"") local_filename = link.split('/')[-1] #make sure the local directory stucture has been created path = Path(local_link.replace(local_filename,"")) path.mkdir(parents=True, exist_ok=True) while not fileDownloaded and downloadAttempts < maxDownloadAttempts: try: urllib.request.urlretrieve(link,local_link) localSuccessfulDownloads += 1 fileDownloaded = True except urllib.error.URLError as e: downloadAttempts += 1 #we will retry the download the number of times allowed by maxDownloadAttempts variable if verboseMessages: print("downloadError: ",e.reason) if verboseMessages: print("downloadErrorFile: ",link," attempt:",downloadAttempts) if downloadAttempts < maxDownloadAttempts: time.sleep(15) #wait 15 seconds before the next attempt else: print("Could not successfully download: ",link," after ",downloadAttempts," download attempts") print("==Downloads complete ==============") print("SuccessfulDownloads: ",localSuccessfulDownloads," out of ",len(filesToDownloadList)) print('==Process is starting ===================') #get the file links get_pageLinks(url, recursiveVal) print("==Collected ", len(filesToDownloadList), " file links ======") #now download the files download_files()  
    • Below I have listed two Wget examples, which I hope will help some PDS Geosciences Node users. The first example demonstrates downloading a PDS data set. The second example displays downloading a sample Orbital Data Explorer (ODE) cart request. Wget website: https://www.gnu.org/software/wget/ Example of downloading a PDS Geosciences Node archive subdirectory wget -rkpN -P c:\temp\data -nH --cut-dirs 2 --level=15 --no-parent --reject "index.html*" -e robots=off http://pds-geosciences.wustl.edu/mro/mro-m-crism-4-typespec-v1/mrocr_8001/ Example of downloading ODE Cart Request wget -rkpN -P c:\temp\data -nH --cut-dirs 2 --level=15 --no-parent --reject "index.html*" -e robots=off http://ode.rsl.wustl.edu/cartdownload/data/sample -r means recursively download files -k means convert links. Links on the webpage will be localhost instead of example.com/path. -p means get all webpage resources, so wget will obtain images and javascript files to make website work properly. -N is to retrieve timestamps, so if local files are newer than files on remote website, the remote files will be skipped. -P sets the local destination directory for the downloaded files. -e is a flag option that must be set for the robots=off to work. robots=off means ignore robots file. -c allows the command to pick up where it left off if the connection is dropped and the command I re-run. --no-parent keeps the command from downloading all the files in the directories above the requested level. --reject "index.html*" keeps wget from downloading every directory's default index.html. -nH will disable the generation of the host-prefixed directories. In the example above, a directory ode.rsl.wustl.edu will not be created locally. --cut-dirs 2 Ignore the count of directory components. Basically, this example will omit the first 2 directory levels from the path it creates locally for the files that are downloaded. Example: http://ode.rsl.wustl.edu/cartdownload/data/sample The first directory in the destination directory will be "sample". --level=depth --level=15 Levels to recursively search. The default is 5, but we will need to go farther with ODE cart and PDS Geosciences Archive. ---------------------------------------------------------------------------------------------------------------------------------------- -nd or --no-directories it is used to put all the requested files in one directory. We are not using this feature, but a user may prefer this option. Disclaimer: This sample code is provided "as is", without warranty of any kind, express or implied. In no event shall the author be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the sample code or the use or other dealings with the sample code.  
    • Feb. 20, 2018.  - Revised and new MARSIS EDR and Subsurface RDR data have been loaded into ODE, with coverage from May 2005 through December 2016. See https://wufs.wustl.edu/ode/odeholdings/Mars_holdings.html  
    • February 15th, 2018 - MRO SHARAD EDRs from ASI for previous MRO releases 39-42 are now loaded into ODE. Data coverage: 2016-052 through 2017-049. Please see https://wufs.wustl.edu/ode/odeholdings/Mars_holdings.html and ERRATA.TXT(http://pds-geosciences.wustl.edu/mro/mro-m-sharad-3-edr-v1/mrosh_0004/errata.txt) for more details.
    • SHARAD EDR data from the ASI team members have been posted, covering previous MRO releases 39-42 (through Feb. 18, 2017). The team is recovering data from deliveries that were missed due to a hiatus in ground operations. See ERRATA.TXT for details. The data are available on the PDS Geosciences Node's SHARAD page.
×