If you download data from NCBI, National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/) you probably use prefetch
command.
This article shows how to use prefetch
(with ascp as transport layer) in batch job. Use of ascp
is advised because it is more efficent (shorter download times) than default prefetch protocol ( https ).
Anthill's computing nodes offer prefetch
in version 2.8.2 (from sra-tools
package). If you would like to use newer version it is available at https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software .
Default path where prefetch downloads files is ~/ncbi/
. Files you download from NCBI will be rather big, so it is a good idea to change this path to /workspace/${USER}
where usually more space is available. To do this start interactive session, create new folder and then use vdb-config
command to change download folder (you can use vdb-config -i
to inspect other configuration options in tui mode).
jsmith@anthill23:~$ srun --pty bash -l jsmith@ant009:~$ mkdir -p /workspace/${USER}/ncbi jsmith@ant009:~$ vdb-config --set default-path=/workspace/${USER}/ncbi jsmith@ant009:~$ exit jsmith@anthill23:~$
Aspera cli tool need to be installed by user. Login to anthill (ssh) and install aspera cli with following commands :
jsmith@anthill23:~$ mkdir -p ~/downloads/aspera jsmith@anthill23:~$ cd ~/downloads/aspera jsmith@anthill23:~$ wget https://download.asperasoft.com/download/sw/cli/3.9.6/ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh jsmith@anthill23:~$ chmod +x ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh jsmith@anthill23:~$ ./ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh
After aspera cli installation, test it with bellow commands:
jsmith@anthill23:~$ export PATH=/home/users/${USER}/.aspera/cli/bin:$PATH jsmith@anthill23:~$ aspera --version IBM Aspera CLI version 3.9.6.1467.159c5b1 jsmith@anthill23:~$ ascp --version IBM Aspera CLI version 3.9.6.1467.159c5b1 ascp version 3.9.1.168954 Operating System: Linux FIPS 140-2-validated crypto ready to configure AES-NI Supported License max rate=(unlimited), account no.=1, license no.=2000
Whith this sbatch file you can use anthill's computing node to download SRR390728
with prefetch and ascp as transport protocol.
#!/bin/bash -l #SBATCH --partition=short #SBATCH --ntasks=1 #SBATCH --mem 3G #SBATCH --job-name SRR390728 export PATH=/home/users/${USER}/.aspera/cli/bin:$PATH FILENAME="SRR390728" echo "date: `date -R`" echo "host: `hostname`" echo "will download ${FILENAME}" echo "download started" prefetch -t ascp -a "/home/users/${USER}/.aspera/cli/bin/ascp|/home/users/${USER}/.aspera/cli/etc/asperaweb_id_dsa.openssh" ${FILENAME} echo "download stopped"
Please remember to not to overuse Internet connections of CeNT or NCBI when downloading files. Do not start dozens of download processes at once, this will stress Internet connections both of CeNT and NCBI. Be aware that some organizations recognize massive download as DOS attack and will block access (temporary or permanently) to its resources.
*) https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=prefetch
*) http://download.asperasoft.com/download/docs/cli/3.9.6/user_linux/pdf2/Aspera_CLI_Admin_3.9.6_Linux.pdf
*) https://www.ncbi.nlm.nih.gov/