Differential RNA splicing analysis with bulk RNA-seq data¶
Before you start¶
- Perform mapping of RNA-seq reads to the reference genome and generate bam files with their index files (
.bai
) by software such as STAR and HISAT2.- You can download test RNA-seq bam files with their index (two replicates for reference and alternative groups) mapped by STAR on the mouse genome from here.
- Download a gene annotataion file of your interest in GTF format.
Installation¶
- Shiba:
1 2 3 4 |
|
- MameShiba, a lightweight version of Shiba:
1 2 3 4 |
|
Shiba¶
1. Prepare inputs¶
experiment.tsv
: A tab-separated text file of sample ID, path to bam files, and groups for differential analysis.
1 2 3 4 5 |
|
Make sure to use tabs
If you copy and paste the above example, your experiment.tsv file may contain spaces instead of tabs, which will causes an error when you run Shiba. Please make sure that you are using a tab character between the columns.
Shiba supports long-read RNA-seq data
If you have long-read RNA-seq data (i.e., PacBio or ONT), please add the 4th column to the experiment.tsv
file with the value long
for long-read data and short
for short-read data. For example:
1 2 3 4 5 |
|
The 4th column is optional. If you do not have long-read data, you can omit the 4th column. Blank values are also accepted and will be treated as short
.
Use long-read data only for discovery of alternative RNA splicing events
If you want to use long-read RNA-seq data only for discovery of alternative RNA splicing events and NOT for differential analysis, you can set the 3rd column to different values than that of short-read data. For example, if you want to perform differential splicing analysis between Ref
and Alt
groups using short-read data, you can set Ref
and Alt
for short-read data, and set Ref_long
and Alt_long
for long-read data, so that the long-read data will be used only for transcript assembly and not for differential analysis.
config.yaml
: A yaml file of the configuration.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
- The path to the working directory. This is where the output files will be saved. Please make sure that you have write permission to this directory.
- The path to the gene annotation file in GTF format.
- The path to the
experiment.tsv
file. - True if you want to include unannotated splicing events in the analysis. If False, only annotated events are considered.
- Junctions having a minimum overlap of this much on both ends are reported.
- Minimum length of the intron sequence.
- Maximum length of the intron sequence.
- Strand specificity of RNA library preparation, where the options XS, use XS tags provided by aligner; RF, first-strand; FR, second-strand.
- True if you want to skip the differential analysis and only calculate PSI values for each sample.
- True if you want to skip the differential analysis and only calculate PSI values for each group.
- Significance threshold for differential splicing analysis.
- Minimum difference in PSI values between groups to be considered significant.
- Reference group for differential splicing analysis.
- Alternative group for differential splicing analysis.
- Minimum number of reads required to calculate PSI values.
- True if you want to print PSI values for each sample in the output file.
- True if you want to perform t-test for differential splicing analysis.
- True if you want to generate a file of splicing analysis results in excel format.
2. Run¶
1 |
|
You are going to use 4 threads for parallelization. You can change the number of threads by changing the -p
option.
Did you encounter any problems?
You can run Shiba with the --verbose
option to see the debug log. This will help you to find the problem.
1 |
|
MameShiba¶
MameShiba is a lightweight version of Shiba that can be run on a local machine without Docker or Singularity. It is designed for users who want to perform splicing analysis only and do not need the full functionality of Shiba.
1. Prepare inputs¶
experiment.tsv
: A tab-separated text file of sample ID, path to bam files, and groups for differential analysis. This is the same as the input for Shiba.
config.yaml
: A yaml file of the configuration. This is the same as the configuration for Shiba.
2. Run¶
Make sure running with --mame
option.
1 |
|
SnakeShiba¶
A snakemake-based workflow of Shiba. This is useful for running Shiba on a cluster. Snakemake automatically parallelizes the jobs and manages the dependencies between them.
1. Prepare inputs¶
experiment.tsv
: A tab-separated text file of sample ID, path to fastq files, and groups for differential analysis. This is the same as the input for Shiba.
config.yaml
: A yaml file of the configuration. This is the same as the configuration for Shiba but with the addition of the container
field and without the only_psi
and only_psi_group
fields as they are not supported in SnakeShiba.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
- The path to the working directory. This is where the output files will be saved. Please make sure that you have write permission to this directory.
- The Docker image of Shiba.
- The path to the gene annotation file in GTF format.
- The path to the
experiment.tsv
file. - Junctions having a minimum overlap of this much on both ends are reported.
- Minimum length of the intron sequence.
- Maximum length of the intron sequence.
- Strand specificity of RNA library preparation, where the options XS, use XS tags provided by aligner; RF, first-strand; FR, second-strand.
- Significance threshold for differential splicing analysis.
- Minimum difference in PSI values between groups to be considered significant.
- Reference group for differential splicing analysis.
- Alternative group for differential splicing analysis.
- Minimum number of reads required to calculate PSI values.
- True if you want to print PSI values for each sample in the output file.
- True if you want to perform t-test for differential splicing analysis.
- True if you want to generate a file of splicing analysis results in excel format.
2. Run¶
Please make sure that you have installed Snakemake and Singularity and cloned the Shiba repository on your system.
1 2 3 4 5 |
|