rMATS-turbo-xxx-UCS4
is one to be used, more http://rnaseq-mats.sourceforge.net/user_guide.htm in section 'Which version to use'This is an old revision of the document!
Multivariate Analysis of Transcript Splicing (MATS). MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data. Software page : http://rnaseq-mats.sourceforge.net/ .
Below release of rMATS 4.0.21) ( 04/25/2018 ) is described and used.
Login to anthill23 and download rMATS along with some test datasets, and gft files2):
@anthill23:~$ mkdir -p ~/anthill23_rmats/{testData,gtf,STARindex} cd ~/anthill23_rmats wget https://sourceforge.net/projects/rnaseq-mats/files/MATS/rMATS.4.0.2.tgz tar xzf rMATS.4.0.2.tgz cd ~/anthill23_rmats/testData wget https://sourceforge.net/projects/rnaseq-mats/files/MATS/testData.tgz tar xzf testData.tgz cd ~/anthill23_rmats/gtf wget ftp://ftp.ensembl.org/pub/grch37/current/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.gtf.gz gunzip Homo_sapiens.GRCh37.87.gtf.gz
rMATS uses gsl library that need to be prepared separately3). This library have to be compiled on computing node, so it is done in interactive mode (SLURM, batch and interactive mode) :
@anthill23:~$ srun -J gsl_compile -N1 -n4 --pty bash -l cd ~/anthill23_rmats/ wget http://gnu.mirror.vexxhost.com/gsl/gsl-2.4.tar.gz tar xzf gsl-2.4.tar.gz cd gsl-2.4 ./configure --prefix="/home/users/${USER}/anthill23_rmats/gsl-2.4" make -j ${SLURM_NTASKS} make install cd ~/anthill23_rmats/gsl-2.4/lib ln -s libgsl.so libgsl.so.0 exit
File http://rmaps.cecsresearch.org/STAR/STARindex.tgz
referenced in rMATS is no longer available. So to be able to test fast examples we need to generate an index of the reference genome ourselfs http://people.duke.edu/~ccc14/duke-hts-2018/bioinformatics/genome_prep.html .
Use bellow to install STAR software :
@anthill23:~$
srun -J star_compile --pty bash -l cd ~/anthill23_rmats/ wget https://github.com/alexdobin/STAR/archive/2.7.1a.tar.gz tar xzf 2.7.1a.tar.gz cd STAR-2.7.1a cd STAR/source make STAR exit
Use bellow to generate index required to run sbatch example: fastq (more info how to use STAR ) .
@anthill23:~$ srun -J create_index -N1 -n28 --mem 100G --pty bash -l cd ~/anthill23_rmats/gtf wget ftp://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens/Homo_sapiens.GRCh38.91.gtf.gz gunzip Homo_sapiens.GRCh38.91.gtf.gz wget ftp://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz gunzip Homo_sapiens.GRCh38.dna.toplevel.fa.gz BASE_DIR="/home/users/${USER}/anthill23_rmats/" STAR_BIN="${BASE_DIR}/STAR-2.7.1a/source/" GENOME_DIR="${BASE_DIR}/gtf/" FASTA="${BASE_DIR}/gtf/Homo_sapiens.GRCh38.dna.toplevel.fa" GTF="${BASE_DIR}/gtf/Homo_sapiens.GRCh38.91.gtf" STAR_OUT="${BASE_DIR}/STARindex/" ${BASE_DIR}/STAR-2.7.1a/source/STAR \ --runThreadN 4 \ --runMode genomeGenerate \ --genomeDir $GENOME_DIR \ --genomeFastaFiles ${FASTA} \ --sjdbGTFfile ${GTF} \ --outFileNamePrefix ${STAR_OUT}/genome_ \ --genomeSAindexNbases 11
If above steps ( prepare software, and prepare gsl lib ) finish successfully this simple test should finish with success. It is also done in interactive mode ( SLURM, batch and interactive mode) :
@anthill23:~$ srun -J rMATS_test --pty bash -l cd /home/users/tmatejuk/anthill23_rmats export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/users/${USER}/anthill23_rmats/gsl-2.4/lib python rMATS.4.0.2/rMATS-turbo-Linux-UCS4/rmats.py --b1 testData/b1.txt --b2 testData/b2.txt --gtf gtf/Homo_sapiens.GRCh37.87.gtf --od bam_test -t paired --readLength 50 --cstat 0.0001 --libType fr-unstranded ... #read output and exit when finished exit
Bellow example was taken form rMATS User Guide. This description assumes that above steps ( prepare software, and prepare gsl lib, simple test ) were done beforehand if not, remember to modify them accordingly.
File rmats_bam.batch
.
#!/bin/bash -l #SBATCH --partition=test #SBATCH --ntasks=4 #SBATCH --mem 4G cd /home/users/${USER}/anthill23_rmats export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/users/${USER}/anthill23_rmats/gsl-2.4/lib python rMATS.4.0.2/rMATS-turbo-Linux-UCS4/rmats.py --b1 b1.txt --b2 b2.txt -gtf gtf/Homo_sapiens.GRCh38.91.gtf -od bam_test -t paired --readLength 50 --cstat 0.0001 --libType fr-unstranded
Run computation with sbatch rmats_bam.batch
. You will find results in /home/users/${USER}/anthill23_rmats/bam_test
folder.
…
...
rMATS-turbo-xxx-UCS4
is one to be used, more http://rnaseq-mats.sourceforge.net/user_guide.htm in section 'Which version to use'