Site Tools


anthill_sri013

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
anthill_sri013 [2019/09/22 14:30] – created tmatejukanthill_sri013 [2023/08/01 01:08] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ==== description ==== ==== description ====
-This article shows how to parallel download hundreds of BAM files using anthill cluster.+This article shows how to download in parallel hundreds of BAM files using Anthill cluster. \\ 
 +It can also be used as a template when using the Anthill cluster for similar tasks .
  
 ==== file list ==== ==== file list ====
Line 9: Line 10:
  
 ==== where to download ==== ==== where to download ====
-Login to pier23 or anthill23. Go to ''W01_NFS'' storage with ''cd /workspace/${USER}''. And create new folder for your ''download.list'' file. Remember to check with ''quota -sg'' how much free space you have. If not enough compress/remove some old files, or request for quota increase. \\+Login to anthill23. Go to ''W01_NFS'' storage with ''cd /workspace/${USER}''. And create new folder for your ''download.list'' file. Remember to check with ''quota -sg'' how much free space you have. If not enough compress/remove some old files, or request for quota increase. \\
 Lets assume your ''download.list'' file was located in new folder ''/workspace/${USER}/anthill23_wget'' at anthill23 host. Lets assume your ''download.list'' file was located in new folder ''/workspace/${USER}/anthill23_wget'' at anthill23 host.
  
Line 49: Line 50:
  
 ==== run sbatch example ==== ==== run sbatch example ====
-To run ''/workspace/${USER}/anthill23_wget/batch/download_list.batch'' we need to provide additional argument of how many lines there is in ''download.list'' file as ''--array=1-999''Numer of lines in ''download.list'' file is counted automatically ( with ''wc'' conmand ).+To run ''/workspace/${USER}/anthill23_wget/batch/download_list.batch'' we need to provide additional argument of how many lines there is in ''download.list'' file as ''--array=1-999'' argumentIn our case number of lines in ''download.list'' file is counted automatically ( with ''wc'' conmand ).
   cd /workspace/${USER}/anthill23_wget/batch/   cd /workspace/${USER}/anthill23_wget/batch/
   sbatch --array=1-`cat ../download.list | wc -l` download_list.batch   sbatch --array=1-`cat ../download.list | wc -l` download_list.batch
Line 101: Line 102:
  
 ==== other / limits ==== ==== other / limits ====
-Usually 1 wget process uses only few % of CPU core. And ussualy download speed for single file from public ftp servers is about ~1MB/s . So more efficient way would be subbmiting more than one wget proccess to one core or one wget process with more than one URL. That change however, would complicate sagnifically batch file and clarity of whole process.+Usually 1 wget process uses only few % of CPU core. And usually download speed for single file from public ftp servers is about ~1MB/s . So more efficient way would be submitting more than one wget process to one core or one wget process with more than one URL. That change however, would complicate significantly batch file and clarity of whole process and it is hard to say how much efficient download process would be. 
 + 
 +It is very unlikely but currently ( 2019.09 ) on Anthill cluster described download process has potential to generate about 5GB/s total transfer. Such transfers still would be handled by CeNTs network/storage infrastructure but it is more than possible latency increase of CeNTs storage / network would be noticeable for all users. 
  
  
  
  
anthill_sri013.1569155436.txt.gz · Last modified: 2023/08/01 06:38 (external edit)