Table of Contents

description

If you download data from NCBI, National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/) you probably use prefetch command. This article shows how to use prefetch (with ascp as transport layer) in batch job. Use of ascp is advised because it is more efficent (shorter download times) than default prefetch protocol ( https ).

prefetch 2.8.2

Anthill's computing nodes offer prefetch in version 2.8.2 (from sra-tools package). If you would like to use newer version it is available at https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software .

prefetch configuration

Default path where prefetch downloads files is ~/ncbi/. Files you download from NCBI will be rather big, so it is a good idea to change this path to /workspace/${USER} where usually more space is available. To do this start interactive session, create new folder and then use vdb-config command to change download folder (you can use vdb-config -i to inspect other configuration options in tui mode).

jsmith@anthill23:~$ srun --pty bash -l
jsmith@ant009:~$ mkdir -p /workspace/${USER}/ncbi
jsmith@ant009:~$ vdb-config --set default-path=/workspace/${USER}/ncbi
jsmith@ant009:~$ exit
jsmith@anthill23:~$   

aspera 3.9.6

Aspera cli tool need to be installed by user. Login to anthill (ssh) and install aspera cli with following commands :

jsmith@anthill23:~$ mkdir -p ~/downloads/aspera
jsmith@anthill23:~$ cd ~/downloads/aspera
jsmith@anthill23:~$ wget https://download.asperasoft.com/download/sw/cli/3.9.6/ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh
jsmith@anthill23:~$ chmod +x ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh
jsmith@anthill23:~$ ./ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh

aspera 3.9.6 (test)

After aspera cli installation, test it with bellow commands:

jsmith@anthill23:~$ export PATH=/home/users/${USER}/.aspera/cli/bin:$PATH

jsmith@anthill23:~$ aspera --version
IBM Aspera CLI version 3.9.6.1467.159c5b1

jsmith@anthill23:~$ ascp --version
IBM Aspera CLI version 3.9.6.1467.159c5b1
ascp version 3.9.1.168954
Operating System: Linux
FIPS 140-2-validated crypto ready to configure
AES-NI Supported
License max rate=(unlimited), account no.=1, license no.=2000

sbatch example

Whith this sbatch file you can use anthill's computing node to download SRR390728 with prefetch and ascp as transport protocol.

#!/bin/bash -l
#SBATCH --partition=short
#SBATCH --ntasks=1
#SBATCH --mem 3G
#SBATCH --job-name SRR390728

export PATH=/home/users/${USER}/.aspera/cli/bin:$PATH

FILENAME="SRR390728"

echo "date: `date -R`"
echo "host: `hostname`"
echo "will download ${FILENAME}"
echo "download started"
prefetch -t ascp -a "/home/users/${USER}/.aspera/cli/bin/ascp|/home/users/${USER}/.aspera/cli/etc/asperaweb_id_dsa.openssh" ${FILENAME}
echo "download stopped"

other

Please remember to not to overuse Internet connections of CeNT or NCBI when downloading files. Do not start dozens of download processes at once, this will stress Internet connections both of CeNT and NCBI. Be aware that some organizations recognize massive download as DOS attack and will block access (temporary or permanently) to its resources.

*) https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=prefetch
*) http://download.asperasoft.com/download/docs/cli/3.9.6/user_linux/pdf2/Aspera_CLI_Admin_3.9.6_Linux.pdf
*) https://www.ncbi.nlm.nih.gov/