This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
anthill_sri001 [2019/05/25 17:30] – tmatejuk | anthill_sri001 [2023/08/01 01:08] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | FIXME TODO UNDERCONSTRUCTION | + | ==== description |
- | + | ||
- | ==== software | + | |
CD-HIT is a fast program for clustering and comparing large sets of protein or nucleotide sequences. Accelerated for clustering the next generation sequencing data. Software page : [[http:// | CD-HIT is a fast program for clustering and comparing large sets of protein or nucleotide sequences. Accelerated for clustering the next generation sequencing data. Software page : [[http:// | ||
==== software version ==== | ==== software version ==== | ||
- | |||
CD-HIT , from Ubuntu 18.04 repo (CD-HIT version 4.7 (built on Jul 1 2017)) . | CD-HIT , from Ubuntu 18.04 repo (CD-HIT version 4.7 (built on Jul 1 2017)) . | ||
Line 40: | Line 36: | ||
==== performance tests ==== | ==== performance tests ==== | ||
- | Bellow results how efficiently cd-hit scales in function of used cores. Each computation were run 3 times separately for nodes type ant6nn and type ant0nn. | + | Bellow results how efficiently cd-hit scales in function of used cores. Each computation were run 3 times separately for each node type ant6nn and type ant0nn. |
- | ^ node ^ cores ^ successful executions | + | {{ : |
- | | ant6nn | 1 | 6 | min | avg | median | max | 100% | | + | ^ node ^ cores ^ min[s] ^ avg [s] ^ median [s] ^ max[s] ^ efficiency [%] ^ |
- | | ant6nn | 2 | 6 | min | avg | median | + | | ant6nn | 1 | 107.66 |
- | | ant6nn | 4 | 6 | min | avg | median | + | | ant6nn | 2 | 103.79 |
- | | ant6nn | 8 | 6 | min | avg | median | + | | ant6nn | 4 | 75.78 | 76.11 | 76.21 | 76.29 | 35.50% | |
- | | ant6nn | 10 | 6 | min | avg | median | + | | ant6nn | 8 | 62.15 | 62.18 | 62.17 | 62.26 | 21.76% | |
- | + | | ant6nn | 10 | 57.41 | 58.53 | 58.92 | 59.10 | 18.36% | | |
- | < | + | ^ node ^ cores ^ min[s] ^ avg [s] ^ median [s] ^ max[s] ^ efficiency [%] ^ |
+ | | ant00n | 1 | 150.84 | 160.29 | 151.97 | 178.35 | 100.00% | | ||
+ | | ant00n | 2 | 154.55 | 158.30 | 155.96 | 170.05 | 48.72% | | ||
+ | | ant00n | 4 | 114.23 | 117.90 | 116.52 | 127.71 | 32.61% | | ||
+ | | ant00n | 8 | 92.79 | 108.49 | 99.84 | 139.51 | 19.03% | | ||
+ | | ant00n | 10 | 90.37 | 96.00 | 96.08 | 104.37 | 15.82% | | ||
+ | | ant00n | 14 | 84.71 | 88.76 | 88.17 | 94.25 | 12.31% | | ||
+ | | ant00n | 16 | 82.85 | 88.40 | 90.33 | 92.43 | 10.52% | | ||
+ | | ant00n | 21 | 80.98 | 86.11 | 88.26 | 89.28 | 8.20% | | ||
+ | | ant00n | 28 | 78.94 | 89.15 | 91.17 | 93.79 | 5.95% | | ||
+ | *) ant6nn = ant602 and ant604, ant00n = ant007 and ant008 \\ | ||
+ | *) efficiency as '' | ||
+ | *) command run '' | ||
- | Files come form UniProt webpage (UK mirror: ftp:// | ||
- | |||
- | batch file | ||
- | #!/bin/bash -l | ||
- | #SBATCH --partition=long | ||
- | #SBATCH --nodelist=ant006 | ||
- | #SBATCH --ntasks=8 | ||
- | #SBATCH --mem 14G | ||
- | |||
- | #test filename | ||
- | FASTAFILE=uniprot_sprot.fasta | ||
- | |||
- | #dir with input file and dir for results | ||
- | INPUTFILEDIR="/ | ||
- | |||
- | #download input file if not present | ||
- | if [ ! -f ${INPUTFILEDIR}/ | ||
- | echo "File ${FASTAFILE} not found!" | ||
- | echo " | ||
- | exit | ||
- | fi | ||
- | |||
- | #printout to output file some info | ||
- | echo " | ||
- | echo " | ||
- | echo "date: " `date` | ||
- | echo " | ||
- | echo " | ||
- | |||
- | #create temporary folder on local disk for input and output data | ||
- | TESTDIR=`mktemp -d` | ||
- | mkdir -p ${TESTDIR} | ||
- | cp ${INPUTFILEDIR}/ | ||
- | |||
- | #run cdhit computation and remove output files | ||
- | cd ${TESTDIR} | ||
- | time -p cdhit -i ${FASTAFILE} -o out -c 0.9 -n 5 -T ${SLURM_NPROCS} -M 14000 | ||
- | |||
- | #remove data after test: | ||
- | cd ~ | ||
- | rm -rf ${TESTDIR} | ||
- | |||
- | |||
- | |||
- | test conditions | ||
- | |||
- | Each job was run alone, that is only one job run on whole cluster. So jobs did not compete for nfs shares or memory bus etc. To automate whole test process cd-hit job batch file was modified. All jobs were committed with bash script. To make sure that jobs do not run simultaneously slurm' | ||
- | |||
- | Auxiliary batch script (empty.batch): | ||
- | |||
- | #!/bin/bash -l | ||
- | #SBATCH --partition=short | ||
- | #SBATCH --ntasks=1 | ||
- | |||
- | sleep 1; | ||
- | |||
- | ch-hit job script (cdhit_test.batch) : | ||
- | |||
- | #!/bin/bash -l | ||
- | #SBATCH --partition=short | ||
- | #SBATCH --mem 2048M | ||
- | |||
- | #test filename | ||
- | FASTAFILE=uniprot_sprot.fasta | ||
- | |||
- | #dir with input file and dir for results | ||
- | INPUTFILEDIR="/ | ||
- | CDHITTESTDIR="/ | ||
- | |||
- | #download input file if not present | ||
- | if [ ! -f ${INPUTFILEDIR}/ | ||
- | echo "File ${FASTAFILE} not found!" | ||
- | echo " | ||
- | exit | ||
- | fi | ||
- | |||
- | #pintout to output file some info | ||
- | echo " | ||
- | echo "input file : " ${FASTAFILE} | ||
- | echo "date: " `date` | ||
- | echo " | ||
- | echo " | ||
- | |||
- | #copy uniprot_sprot.fasta to result directory if not present | ||
- | mkdir -p ${CDHITTESTDIR} | ||
- | cd ${CDHITTESTDIR} | ||
- | if [ ! -f ${FASTAFILE} ]; then | ||
- | echo "File uniprot_sprot.fasta not present in results dir!" | ||
- | cp ${INPUTFILEDIR}/ | ||
- | fi | ||
- | |||
- | #run cdhit computation and remove output files | ||
- | cd ${CDHITTESTDIR} | ||
- | time -p cdhit -i ${FASTAFILE} -o out -c 0.9 -n 5 -T ${SLURM_NPROCS} -M 2048 | ||
- | rm out out.clstr | ||
- | Bash script used to commit all jobs (run_cdhit_test_on_anthill.sh) : | ||
- | |||
- | #!/bin/sh | ||
- | |||
- | BATCHNAME=cdhit_test.batch | ||
- | |||
- | # first job - no dependencies, | ||
- | previd=$(sbatch empty.batch | awk ' | ||
- | |||
- | for i in $(seq 1 3) #repeat each test 3 times | ||
- | do | ||
- | #nodes : ant001, ant002 | ||
- | for nodename in ant001 ant002 | ||
- | do | ||
- | for ntasksvalue in 1 2 | ||
- | do | ||
- | echo "node : ${nodename}" | ||
- | echo " | ||
- | |||
- | nextid=$(sbatch --dependency=afterany: | ||
- | previd=${nextid} | ||
- | |||
- | done | ||
- | done | ||
- | |||
- | #nodes : ant003, ant004 | ||
- | for nodename in ant003 ant004 | ||
- | do | ||
- | for ntasksvalue in 1 2 4 | ||
- | do | ||
- | echo "node : ${nodename}" | ||
- | echo " | ||
- | |||
- | nextid=$(sbatch --dependency=afterany: | ||
- | previd=${nextid} | ||
- | |||
- | done | ||
- | done | ||
- | |||
- | #nodes : ant005, ant006 | ||
- | for nodename in ant005 ant006 | ||
- | do | ||
- | for ntasksvalue in 1 2 4 8 | ||
- | do | ||
- | echo "node : ${nodename}" | ||
- | echo " | ||
- | |||
- | nextid=$(sbatch --dependency=afterany: | ||
- | previd=${nextid} | ||
- | |||
- | done | ||
- | done | ||
- | |||
- | #nodes : ant007, ant008 | ||
- | for nodename in ant007 ant008 | ||
- | do | ||
- | for ntasksvalue in 1 2 4 8 14 16 28 | ||
- | do | ||
- | echo "node : ${nodename}" | ||
- | echo " | ||
- | |||
- | nextid=$(sbatch --dependency=afterany: | ||
- | previd=${nextid} | ||
- | |||
- | done | ||
- | done | ||
- | |||
- | done # repeat each test 3 times | ||
- | |||
- | # show dependencies in squeue output: | ||
- | squeue -u $USER -o "%.8A %.4C %.10m %.20E" | ||
- | results | ||
- | |||
- | In this test each node was occupied only by one job. File : uniprot_sprot.fasta . Memory allocated for each job : 2048 MB (cd-hit reported memory usage lower than 1400 MB). | ||
- | node name cores used result1 [s] result2 [s] result3 [s] average [s] | ||
- | ant001 1 365.94 371.68 367.26 368.29 | ||
- | ant001 2 244.42 240.67 242.41 242.50 | ||
- | ant002 1 409.78 369.57 368.46 382.60 | ||
- | ant002 2 243.23 238.74 244.39 242.12 | ||
- | ant003 1 360.88 361.21 365.22 362.44 | ||
- | ant003 2 245.87 251.52 245.33 247.57 | ||
- | ant003 4 202.77 205.5 203.77 204.01 | ||
- | ant004 1 357.43 364.11 372.73 364.76 | ||
- | ant004 2 239.34 239.92 236.58 238.61 | ||
- | ant004 4 189.77 194 194.24 192.67 | ||
- | ant005 1 334.39 331.71 318.99 328.36 | ||
- | ant005 2 238.52 234.26 239.69 237.49 | ||
- | ant005 4 196.94 199.28 198.59 198.27 | ||
- | ant005 8 175.24 178.24 173.24 175.57 | ||
- | ant006 1 318.6 330.38 330.18 326.39 | ||
- | ant006 2 239.27 238.46 240.01 239.25 | ||
- | ant006 4 192.16 191.58 196.91 193.55 | ||
- | ant006 8 156.69 176.7 178.24 170.54 | ||
- | ant007 1 149.8 140.85 147.45 146.03 | ||
- | ant007 2 114.65 112.76 118.23 115.21 | ||
- | ant007 4 98.15 96.56 98.67 97.79 | ||
- | ant007 8 90.63 89.64 88.38 89.55 | ||
- | ant007 14 85.28 87.12 83.37 85.26 | ||
- | ant007 16 82.46 86.69 86.67 85.27 | ||
- | ant007 28 75.35 87.45 85.33 82.71 | ||
- | ant008 1 140.42 147.09 142.25 143.25 | ||
- | ant008 2 113.35 116.83 118.27 116.15 | ||
- | ant008 4 99.1 91.73 90.12 93.65 | ||
- | ant008 8 88.89 80.2 89.31 86.13 | ||
- | ant008 14 76.03 85.79 86.91 82.91 | ||
- | ant008 16 86.97 82.77 85.11 84.95 | ||
- | ant008 28 83.19 89.83 89.45 87.49 | ||
- | ant009 1 135.40 135.35 140.31 137.02 | ||
- | ant009 2 108.71 111.89 108.27 109.62 | ||
- | ant009 4 85.06 90.44 84.27 86.59 | ||
- | ant009 8 72.98 78.29 77.61 76.29 | ||
- | ant009 14 71.93 72.95 67.78 70.89 | ||
- | ant009 16 67.35 71.71 72.00 70.35 | ||
- | ant011 1 104.71 105.06 104.5 104.76 | ||
- | ant011 2 89.57 90.33 89.65 89.85 | ||
- | ant011 4 72.20 71.92 72.22 72.11 | ||
- | ant012 1 106.92 106.50 107.06 106.83 | ||
- | ant012 2 93.41 93.48 93.27 93.39 | ||
- | ant012 4 75.24 75.16 75.23 75.21 | ||
- | ant100 1 498.72 499.5 486.61 494.94 | ||
- | ant100 2 452.51 422.12 447.15 440.59 | ||
- | ant100 4 389.68 388.35 407.17 395.07 | ||
- | ant100 8 388.59 382.68 396.2 389.16 | ||
- | ant100 14 368.64 351.9 401.13 373.89 | ||
- | ant100 16 391.47 396.53 357.93 381.98 | ||
- | ant101 1 503.45 452.31 487.26 481.01 | ||
- | ant101 2 458.42 435.75 451.08 448.42 | ||
- | ant101 4 417.15 382.07 394.57 397.93 | ||
- | ant101 8 369.29 383.49 397.68 383.49 | ||
- | ant101 14 367.95 379.02 377.11 374.69 | ||
- | ant101 16 359.85 377.17 383.34 373.45 | ||
- | ant102 1 458.29 493.13 550.64 500.69 | ||
- | ant102 2 426.48 431.78 425.33 427.86 | ||
- | ant102 4 389.84 397.64 400.21 395.90 | ||
- | ant102 8 376.8 386.24 384.44 382.49 | ||
- | ant102 14 378.7 380.39 363.19 374.09 | ||
- | ant102 16 386.91 377.53 362.45 375.63 | ||
- | ant103 1 542.24 512.03 536.04 530.10 | ||
- | ant103 2 419.03 421.15 439.53 426.57 | ||
- | ant103 4 403.98 401.51 413.29 406.26 | ||
- | ant103 8 419.37 401.93 400.54 407.28 | ||
- | ant103 14 382.57 390.96 399.59 391.04 | ||
- | ant103 16 381.75 352.94 408.68 381.12 | ||
- | ant104 1 506.13 491.18 498.93 498.75 | ||
- | ant104 2 436.75 457.8 460.15 451.57 | ||
- | ant104 4 392.18 399.32 426.82 406.11 | ||
- | ant104 8 398.33 376.98 410.96 395.42 | ||
- | ant104 14 410.99 407.83 361.08 393.30 | ||
- | ant104 16 379.62 446.03 387.65 404.43 | ||
- | ant105 1 531.36 496.54 497.53 508.48 | ||
- | ant105 2 427.13 436.78 420.61 428.17 | ||
- | ant105 4 390.03 421.38 402.52 404.64 | ||
- | ant105 8 405.04 381.83 398.82 395.23 | ||
- | ant105 14 392.35 381.58 400.12 391.35 | ||
- | ant105 16 394.61 372.8 391.66 386.36 | ||
- | ant106 1 571.33 521.11 501.05 531.16 | ||
- | ant106 2 426.67 459.36 420.83 435.62 | ||
- | ant106 4 373.5 389.3 409.29 390.70 | ||
- | ant106 8 406.36 419.5 387.7 404.52 | ||
- | ant106 14 404.83 403.86 365.56 391.42 | ||
- | ant106 16 389.17 390.04 387.46 388.89 | ||
- | ant107 1 499.09 556.72 503.26 519.69 | ||
- | ant107 2 440.45 447.31 433.92 440.56 | ||
- | ant107 4 400.1 396.56 401.83 399.50 | ||
- | ant107 8 390.98 396.28 389.81 392.36 | ||
- | ant107 14 372.23 348.51 378.69 366.48 | ||
- | ant107 16 388.68 389.47 380.07 386.07 | ||
- | ant108 1 495.9 556.77 548.07 533.58 | ||
- | ant108 2 444.71 412.32 425.88 427.64 | ||
- | ant108 4 391.17 400.76 388.44 393.46 | ||
- | ant108 8 412.31 402.63 394.59 403.18 | ||
- | ant108 14 362.67 398.34 394.56 385.19 | ||
- | ant108 16 386.53 388.5 362.56 379.20 | ||
- | ant109 1 545.91 556.75 494.27 532.31 | ||
- | ant109 2 461.51 444.07 450.6 452.06 | ||
- | ant109 4 403.82 399.62 397.57 400.34 | ||
- | ant109 8 386.67 372.51 382.94 380.71 | ||
- | ant109 14 394.64 398.26 403.91 398.94 | ||
- | ant109 16 393.43 380.63 381.21 385.09 | ||
- | ant110 1 483.4 504.87 565.43 517.90 | ||
- | ant110 2 436.53 445.72 440.23 440.83 | ||
- | ant110 4 386.86 376.82 397.58 387.09 | ||
- | ant110 8 406.49 400.98 404.7 404.06 | ||
- | ant110 14 377.74 376.83 387.77 380.78 | ||
- | ant110 16 388.67 414.8 338.77 380.75 | ||
- | ant300 1 505.57 508.71 507.09 507.12 | ||
- | ant300 2 465.51 469.14 443.56 459.40 | ||
- | ant300 4 391.08 387.43 423.27 400.59 | ||
- | ant300 8 375.34 391.55 384.94 383.94 | ||
- | ant300 14 370 385.67 392.79 382.82 | ||
- | ant300 16 417.78 371.86 394 394.55 | ||
- | ant301 1 529.53 555.58 554.29 546.47 | ||
- | ant301 2 446.13 428.45 433.15 435.91 | ||
- | ant301 4 391.19 400.25 381.3 390.91 | ||
- | ant301 8 378.6 403.8 398.7 393.70 | ||
- | ant301 14 387.28 404.41 416.27 402.65 | ||
- | ant301 16 407.38 384.49 393.4 395.09 | ||
- | ant302 1 487.93 540.43 509.09 512.48 | ||
- | ant302 2 455.83 416.38 422.88 431.70 | ||
- | ant302 4 402.54 402.98 411.03 405.52 | ||
- | ant302 8 410.27 396.98 387.37 398.21 | ||
- | ant302 14 388.08 361.4 404.88 384.79 | ||
- | ant302 16 393.81 388.89 350.89 377.86 | ||
- | ant303 1 543 496.53 497.46 512.33 | ||
- | ant303 2 438.08 431.19 450.57 439.95 | ||
- | ant303 4 374.69 412.4 415.05 400.71 | ||
- | ant303 8 392.73 402.06 371.14 388.64 | ||
- | ant303 14 385.49 403.59 377.39 388.82 | ||
- | ant303 16 385.74 380.52 368.13 378.13 | ||
- | ant304 1 488.44 530.69 505.74 508.29 | ||
- | ant304 2 426.9 429.9 416.27 424.36 | ||
- | ant304 4 400.71 411.94 409.4 407.35 | ||
- | ant304 8 397.68 390.12 395.15 394.32 | ||
- | ant304 14 400.9 399.97 395.25 398.71 | ||
- | ant304 16 376.71 411.95 391.9 393.52 | ||
- | ant305 1 493.29 457.06 499.45 483.27 | ||
- | ant305 2 435.51 416.19 441.24 430.98 | ||
- | ant305 4 383.56 383.94 374.93 380.81 | ||
- | ant305 8 390.53 374.54 381.38 382.15 | ||
- | ant305 14 383.11 378.42 366.69 376.07 | ||
- | ant305 16 339.12 371.78 384.37 365.09 | ||
- | ant306 1 525.12 562.76 554.43 547.44 | ||
- | ant306 2 422.76 428.61 455.4 435.59 | ||
- | ant306 4 399.04 418.49 393.59 403.71 | ||
- | ant306 8 401.78 398.08 398.79 399.55 | ||
- | ant306 14 372.71 344.34 387.41 368.15 | ||
- | ant306 16 402.07 397.08 363.71 387.62 | ||
- | ant307 1 529.49 491.09 551.61 524.06 | ||
- | ant307 2 436.61 450.36 410.17 432.38 | ||
- | ant307 4 369.26 382.06 372.96 374.76 | ||
- | ant307 8 399.93 368.99 381.69 383.54 | ||
- | ant307 14 393.04 402.61 392.46 396.04 | ||
- | ant307 16 397.69 366.65 368.11 377.48 | ||
- | ant308 1 493.37 485 495.46 491.28 | ||
- | ant308 2 424.94 407.66 440.15 424.25 | ||
- | ant308 4 379.14 381.28 370.09 376.84 | ||
- | ant308 8 393.84 385.62 379.88 386.45 | ||
- | ant308 14 386.46 367.32 382.12 378.63 | ||
- | ant308 16 374.33 381.08 394.61 383.34 | ||
- | ant309 1 497.76 454.63 534.66 495.68 | ||
- | ant309 2 425.73 411.48 429.8 422.34 | ||
- | ant309 4 397.27 402.02 390.32 396.54 | ||
- | ant309 8 401.44 391.34 365.22 386.00 | ||
- | ant309 14 344.78 387.31 390.56 374.22 | ||
- | ant309 16 381.41 370.21 381.87 377.83 | ||
- | ant310 1 530.09 535.96 484.61 516.89 | ||
- | ant310 2 452.09 452.9 439.9 448.30 | ||
- | ant310 4 394.19 387.42 411.44 397.68 | ||
- | ant310 8 390.54 401.96 396.51 396.34 | ||
- | ant310 14 358.21 397.18 377.33 377.57 | ||
- | ant310 16 379.86 399.12 386.91 388.63 | ||
- | ant311 1 460.55 491.7 497.99 483.41 | ||
- | ant311 2 442.68 419.38 411.68 424.58 | ||
- | ant311 4 365.68 411.03 386.79 387.83 | ||
- | ant311 8 400.72 381.16 358.73 380.20 | ||
- | ant311 14 399.81 385.22 377.79 387.61 | ||
- | ant311 16 399.33 400.72 371.35 390.47 | ||
- | </ |