Commit 9e27da33 authored by Jonas Weber's avatar Jonas Weber
Browse files

updated README and journal.md (-> workflow.md)
parent de12ce05
# RNA_velocity
The RNA velocity analysis done for the department of Molecular Cardiology (Professor Jürgen Schrader)
The RNA velocity analysis done for the department of Molecular Cardiology (Prof. Schrader) as my Projektarbeit.
## Current progress
Find the current TODOs and the things done previously in `documentation/journal.md`.
## Data
- `data/52_NGS_EPDC_aggr/`: Contains the aggregated samples of EPD cells.
- `data/52_NGS_EPDC_reanalyze/`: Contains the cellranger analysis of the above dataset.
- `data/52_NGS_EPDC_merged.loom`: spliced and unspliced counts merged from the loom files of the 3 datasets
- `data/52_NGS_EPDC_reanalyze/` contains the cellranger analysis
- `data/52_NGS_EPDC_merged.loom` spliced and unspliced counts merged from the loom files of the 3 datasets
- `data/52_NGS_EPDC_merged_filtered_renamed.loom` filtered an renamed. used for velocity
- `data/EPDC.SCTransform.integrated_minus9_13_6_CellNames.csv` cell names
- `data/Seurat_data/` contains cluster, cell & gene names, pca and umap data from seurat
## Code
- `join_loom_files.py` small script to join loom files
- `velocity.py` contains the velocyto workflow
- `filter_rename_loom.py`
- `velocity.py` Python notebook containing the velocyto workflow
## Documentation
## Oberseminar
- `documentation/` contains slides and reports
- `documentation/graphics` collection of all graphics
- `workflow.md`
`documentation/oberseminar` contains slides from the different dates.
All figures are located in `documentation/oberseminar/graphics`.
# Data
`52_NGS_EPDC_..._count` datasets not contained as they are to large
`52_NGS_EPDC_aggr/`
`52_NGS_EPDC_reanalyze/`
`52_NGS_EPDC_merged.loom`
# Things that I did
## 1. cellranger
Got cellranger aggr and count output from Tobias after failing to run `cellranger count` due to lack of knowledge and permissions.
### 1.1 cellranger aggr
Was used to aggregate MI1, MI2 and MI3 into one dataset: [cellranger aggr tutorial](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/aggregate)
screen
qsub or hilbert3
module load CellRanger/3.1.0
cd /gpfs/scratch/joweb106/singelcell_data/cellranger_count
cellranger aggr --id=52_NGS_EPDC_aggr --csv=/gpfs/scratch/joweb106/singlecell_data/cellranger_count/52_NGS_EPDC_aggre.csv |& tee -a log_aggr.txt
### 1.2 cellranger reanalyze
I ran `cellranger reanalyze` on the `52_NGS_EPDC_aggr` data without any parameters. [cellranger reanalyze tutorial](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_ra)
ssh hilbert
screen
qsub -A singlecellseq -I -l select=1:ncpus=4:mem=10G -l walltime=3:00:00
cd /gpfs/scratch/joweb106/singelcell_data/
module load CellRanger/3.1.0
cellranger reanalyze --id=52_NGS_EPDC_reanalyze \
--matrix=52_NGS_EPDC_aggr/outs/filtered_feature_bc_matrix.h5 |& tee -a log_renalyze.txt
Result in `/data/52_NGS_EPDC_reanalyze/outs/web_summary.html`:
![](oberseminar/graphics/clusters.png)
## 2. velocyto
[velocyto.py tutorial](https://velocyto.org/velocyto.py/index.html)
### 2.1 Installation
Failed to install velocyto in the HPC because pysam failed to install. Now the software can be loaded with the module `Velocyto/0.17.17`
### 2.2 run10x
Run `velocyto run10x ...` to create a .loom file which can be imported to python. [tutorial](https://velocyto.org/velocyto.py/tutorial/cli.html)
#### 2.2.1 Prepare gtf
I got `Mus_musculus.GRCm38.93.filtered.gtf` by following [this](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#mm10_3.0.0).
#### 2.2.3 Run run10x
First, I ran into FileNotFoundErrors because the bam file is not contained in the aggregated dataset. After getting the 3 count datasets from Tobias it worked.
screen
qsub or hilbert3
module load Velocyto/0.17.17
module load SamTools/1.6
cd /gpfs/scratch/joweb106/singlecell_data/cellranger_count/
velocyto run10x 52_NGS_MI1_EPDC/ ../../mousedata/Mus_musculus.GRCm38.93.filtered.gtf |& tee -a log_run10x_MI1.txt
velocyto run10x 52_NGS_MI2_EPDC/ ../../mousedata/Mus_musculus.GRCm38.93.filtered.gtf |& tee -a log_run10x_MI2.txt
velocyto run10x 52_NGS_MI3_EPDC/ ../../mousedata/Mus_musculus.GRCm38.93.filtered.gtf |& tee -a log_run10x_MI3.txt
#### 2.2.4 combine loom files
`code/join_loom_files.py` combines the 3 loom files into `data/52_NGS_EPDC_merged.loom`
### 2.3 velcito.py (TODO)
[Example notebooks of the velocyto.py workflow](https://github.com/velocyto-team/velocyto-notebooks/tree/master/python)
Get graphical session [here](https://view-2018.hpc.rz.uni-duesseldorf.de/enginframe/vdi).
Then use:
cd /gpfs/scratch/joweb106/rna_velocity
module load velocyto
module load Jupyter
jupyter notebook
The notebooks are in `code/`. This Jupyter module caused the jupyter notebook kernel to die multiiple times.
# Workflow
## 1. Cellranger
Started workflow with output directories from `cellranger count`. When working on the HPC use the module `CellRanger/3.1.0`. I only used cellranger before I got the clusters and embedding from Seurat.
[Cellranger Tutorial](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger)
### 1.1 aggr
Combine datasets in order to run `cellranger renanalyse`.
[Cellranger aggr tutorial](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/aggregate)
cellranger aggr --id=52_NGS_EPDC_aggr --csv=<path>/52_NGS_EPDC_aggr.csv |& tee -a log_aggr.txt
`52_NGS_EPDC_aggr.csv`:
library_id,molecule_h5
52_NGS_MI1_EPDC,<path>/52_NGS_MI1_EPDC/outs/molecule_info.h5
...
### 1.2 reanalyze
I ran `cellranger reanalyze` on the `52_NGS_EPDC_aggr` data with standard parameters. [cellranger reanalyze tutorial](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_ra)
cellranger reanalyze --id=52_NGS_EPDC_reanalyze --matrix=52_NGS_EPDC_aggr/outs/filtered_feature_bc_matrix.h5 |& tee -a log_renalyze.txt
Result in `/data/52_NGS_EPDC_reanalyze/outs/web_summary.html`. This was not used for the final analysis. Just for testing.
## 2. Velocyto.py
When working on the HPC use the module `Velocyto/0.17.17` (probably needs to be updated). With velocyto I created loom files, joined them, filtered and renamed them according to the Seurat data and then calculated and plotted the velocity.
[velocyto.py tutorial](https://velocyto.org/velocyto.py/index.html)
### 2.1 Loom file
Run `velocyto run10x ...` to create .loom files, which can be imported to python. [tutorial](https://velocyto.org/velocyto.py/tutorial/cli.html)
#### 2.1.1 Prepare gtf
I got the genome annotation file `Mus_musculus.GRCm38.93.filtered.gtf` by following [this](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#mm10_3.0.0).
#### 2.1.3 Run run10x
Run `run10x` on every `cellranger count` output directory to create a loom file for each. The finished loom files will be in the respective velocyto directories.
module load Velocyto/0.17.17
module load SamTools/1.6
velocyto run10x 52_NGS_MI1_EPDC/ Mus_musculus.GRCm38.93.filtered.gtf |& tee -a log_run10x_MI1.txt
velocyto run10x 52_NGS_MI2_EPDC/ Mus_musculus.GRCm38.93.filtered.gtf |& tee -a log_run10x_MI2.txt
velocyto run10x 52_NGS_MI3_EPDC/ Mus_musculus.GRCm38.93.filtered.gtf |& tee -a log_run10x_MI3.txt
#### 2.1.4 Combine loom files
`code/join_loom_files.py` combines the loom files.
### 2.2 filter and rename
When working with clusters and embeddings from for example Seurat you might need to rename and filter the cells in the loom file with `code/filter_rename_loom.py`.
### 2.3 velcito.py
[Example notebooks of the velocyto.py workflow](https://github.com/velocyto-team/velocyto-notebooks/tree/master/python)
Run the `velocyto.py` script. You might need to adjust some parameters (and code).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment