Commit 245cc6a3 authored by joweb106's avatar joweb106
Browse files

cleanup

parent d0b8b83e
data/300_NGS*
data/52_NGS_EPDC_aggr/*
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import velocyto as vcy\n",
"\n",
"vlm = vcy.VelocytoLoom(\"../data/52_NGS_EPDC_merged.loom\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vlm.plot_fractions()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'CellID': array(['52_NGS_MI1_EPDC:AAAGATGCACTCGACGx',\n",
" '52_NGS_MI1_EPDC:AAACGGGCACCGTTGGx',\n",
" '52_NGS_MI1_EPDC:AAACCTGAGTGCAAGCx', ...,\n",
" '52_NGS_MI3_EPDC:TTGCGTCCAGAGTGTGx',\n",
" '52_NGS_MI3_EPDC:TTTCCTCCACACCGACx',\n",
" '52_NGS_MI3_EPDC:TTCTACAGTAAAGTCAx'], dtype=object)}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vlm.ca"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vlm.filter_cells(bool_array=vlm.initial_Ucell_size > \n",
" np.percentile(vlm.initial_Ucell_size, 0.5))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
%% Cell type:code id: tags:
``` python
import numpy as np
import velocyto as vcy
vlm = vcy.VelocytoLoom("../data/52_NGS_EPDC_merged.loom")
```
%% Cell type:code id: tags:
``` python
vlm.plot_fractions()
```
%% Cell type:code id: tags:
``` python
vlm.ca
```
%%%% Output: execute_result
{'CellID': array(['52_NGS_MI1_EPDC:AAAGATGCACTCGACGx',
'52_NGS_MI1_EPDC:AAACGGGCACCGTTGGx',
'52_NGS_MI1_EPDC:AAACCTGAGTGCAAGCx', ...,
'52_NGS_MI3_EPDC:TTGCGTCCAGAGTGTGx',
'52_NGS_MI3_EPDC:TTTCCTCCACACCGACx',
'52_NGS_MI3_EPDC:TTCTACAGTAAAGTCAx'], dtype=object)}
%% Cell type:code id: tags:
``` python
vlm.filter_cells(bool_array=vlm.initial_Ucell_size >
np.percentile(vlm.initial_Ucell_size, 0.5))
```
%% Cell type:code id: tags:
``` python
```
import loompy
# Schrader data
#files = ["../../singlecell_data/cellranger_count/52_NGS_MI1_EPDC/velocyto/52_NGS_MI1_EPDC.loom",
# "../../singlecell_data/cellranger_count/52_NGS_MI2_EPDC/velocyto/52_NGS_MI2_EPDC.loom",
# "../../singlecell_data/cellranger_count/52_NGS_MI3_EPDC/velocyto/52_NGS_MI3_EPDC.loom"]
#
#ds = loompy.combine(files, "../data/52_NGS_EPDC_merged.loom")
files = ["../../singlecell_data/cellranger_count/52_NGS_MI1_EPDC/velocyto/52_NGS_MI1_EPDC.loom",
"../../singlecell_data/cellranger_count/52_NGS_MI2_EPDC/velocyto/52_NGS_MI2_EPDC.loom",
"../../singlecell_data/cellranger_count/52_NGS_MI3_EPDC/velocyto/52_NGS_MI3_EPDC.loom"]
# daniel data
files = ["../data/300_NGS/300_NGS_Blau4KO.loom",
"../data/300_NGS/300_NGS_Rot2Control.loom",
"../data/300_NGS/300_NGS_Rot5KO.loom",
"../data/300_NGS/300_NGS_Blau5_Control.loom",
"../data/300_NGS/300_NGS_Rot4KO.loom"]
ds = loompy.combine(files, "../data/52_NGS_EPDC_merged.loom")
#ds = loompy.connect("../data/52_NGS_EPDC_merged.loom")
#for fn in files[1:]:
# ds.add_loom(fn, batch_size=1000)
ds = loompy.combine(files, "../data/300_NGS_merged.loom")
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
plot PCA ... Finished
did knn
did tsne
saved fig of tsne
save analysis state to ../data/after_PCA.hdf5 ... Traceback (most recent call last):
File "velocity.py", line 125, in <module>
save_state(vlm, "../data/after_PCA.hdf5")
File "velocity.py", line 22, in save_state
vlm.to_hdf5(name) # z.b. "my_velocyto_analysis"
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/velocyto/analysis.py", line 94, in to_hdf5
dump_hdf5(self, filename, **kwargs)
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/velocyto/serialization.py", line 83, in dump_hdf5
fletcher32=False, shuffle=False)
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/h5py/_hl/group.py", line 136, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 118, in make_new_dset
tid = h5t.py_create(dtype, logical=1)
File "h5py/h5t.pyx", line 1634, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1656, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1717, in h5py.h5t.py_create
TypeError: No conversion path for dtype: dtype('<U2')
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
filter dataset ... Finished
feature selection (score_cv_vs_mean) ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
did knn
fit gamma
extrapolated cells
did tsne
saved fig of tsne
finished projection
Traceback (most recent call last):
File "velocity.py", line 176, in <module>
plt.figure(None,(20,10))
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/velocyto/analysis.py", line 2043, in plot_grid_arrows
arrows_scale = np.percentile(np.linalg.norm(self.flow_rndm[self.total_p_mass >= min_mass, :], 2, 1), 90) # Tipical lenght of an arrow
File "/software/python/3.6.5/ivybridge/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4291, in percentile
interpolation=interpolation)
File "/software/python/3.6.5/ivybridge/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4033, in _ureduce
r = func(a, **kwargs)
File "/software/python/3.6.5/ivybridge/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4405, in _percentile
x1 = take(ap, indices_below, axis=axis) * weights_below
File "/software/python/3.6.5/ivybridge/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 159, in take
return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
File "/software/python/3.6.5/ivybridge/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 52, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:The arrow scale was set to be 'absolute' make sure you know how to properly interpret the plots
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
filter dataset ... Finished
feature selection (score_cv_vs_mean) ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
did knn
fit gamma
extrapolated cells
did tsne
saved fig of tsne
finished projection
calculated grid arrows
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:The arrow scale was set to be 'absolute' make sure you know how to properly interpret the plots
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
filter dataset ... Finished
feature selection (score_cv_vs_mean) ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
did knn
fit gamma
extrapolated cells
did tsne
saved fig of tsne
finished projection
calculated grid arrows
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:The arrow scale was set to be 'absolute' make sure you know how to properly interpret the plots
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
filter dataset ... Finished
feature selection (score_cv_vs_mean) ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
did knn
fit gamma
extrapolated cells
did tsne
saved fig of tsne
finished projection
calculated grid arrows
/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/velocyto/analysis.py:1411: RuntimeWarning: invalid value encountered in true_divide
self.delta_S = self.Sx_sz * egt + (1 - egt) * Ux_szo / self.gammas[:, None] - self.Sx_sz
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
/software/python/3.6.5/ivybridge/lib/python3.6/site-packages/numpy/lib/function_base.py:4291: RuntimeWarning: Invalid value encountered in percentile
interpolation=interpolation)
WARNING:root:The arrow scale was set to be 'absolute' make sure you know how to properly interpret the plots
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
filter dataset ... Finished
feature selection (score_cv_vs_mean) ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
did knn
fit gamma
extrapolated cells
did tsne
saved fig of tsne
finished projection
calculated grid arrows
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
filter dataset ... Finished
feature selection (score_cv_vs_mean) ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
did knn
fit gamma
extrapolated cells
did tsne
saved fig of tsne
finished projection
Traceback (most recent call last):
File "velocity.py", line 183, in <module>
vlm.calculate_grid_arrows(smooth=0.5, steps=(40, 40), n_neighbors= N_cells / 40)
NameError: name 'N_cells' is not defined
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:The arrow scale was set to be 'absolute' make sure you know how to properly interpret the plots
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
filter dataset ... Finished
feature selection (score_cv_vs_mean) ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
did knn
fit gamma
extrapolated cells
did tsne
saved fig of tsne
finished projection
calculated grid arrows
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Traceback (most recent call last):
File "velocity.py", line 100, in <module>
set_clusters_from_10x(vlm, "../data/52_NGS_EPDC_reanalyze/outs/analysis/clustering/graphclust/clusters.csv")
File "velocity.py", line 60, in set_clusters_from_10x
vlm.set_clusters(clusters, cluster_colors_dict=colors_dict)
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/velocyto/analysis.py", line 189, in set_clusters
self.colorandum = np.array([cluster_colors_dict[i] for i in cluster_labels])
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/velocyto/analysis.py", line 189, in <listcomp>
self.colorandum = np.array([cluster_colors_dict[i] for i in cluster_labels])
KeyError: 6
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
plot PCA ... Finished
did knn
did tsne
saved fig of tsne
save analysis state to ../data/after_PCA.hdf5 ... Traceback (most recent call last):
File "velocity.py", line 146, in <module>
File "velocity.py", line 22, in save_state
vlm.to_hdf5(name) # z.b. "my_velocyto_analysis"
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/velocyto/analysis.py", line 94, in to_hdf5
dump_hdf5(self, filename, **kwargs)
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/velocyto/serialization.py", line 74, in dump_hdf5
serialized = _obj2uint(attribute, compression=noarray_compression, protocol=pickle_protocol)
File "/software/velocyto/0.17.17/ivybridge/lib/python3.6/site-packages/velocyto/serialization.py", line 25, in _obj2uint
zstr = zlib.compress(pickle.dumps(obj, protocol=protocol), compression)
OverflowError: cannot serialize a string larger than 4GiB
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:The arrow scale was set to be 'absolute' make sure you know how to properly interpret the plots
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
filter dataset ... Finished
feature selection (score_cv_vs_mean) ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
Do knn pooling ... Finished
Fit gammas ... Finished
predict velocity ... Finished
Calculate TSNE ... Finished
TSNE time: 41.995718240737915
Plot TSNE ... Finished
Projection into embedding ... Finished
calculate_grid_arrows ... Finished
Plot velocity full arrows ... Finished
Plot velocity grid arrows ... Finished
overall time: 1013.6554074287415
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:The arrow scale was set to be 'absolute' make sure you know how to properly interpret the plots
load loom file ../data/52_NGS_EPDC_merged.loom ... Finished
set clusters from 10x reanalyze ... Finished
filter dataset ... Finished
feature selection (score_cv_vs_mean) ... Finished
normalize S ... Finished
normalize U ... Finished
Perform PCA ... Finished
Do knn pooling ... Finished
Fit gammas ... Finished
predict velocity ... Finished
Calculate TSNE ... Finished
TSNE time: 42.173816204071045
Plot TSNE ... Finished
Projection into embedding ... Finished
calculate_grid_arrows ... Finished
Plot velocity full arrows ... Finished
Plot velocity grid arrows ... Finished
overall time: 1016.4283459186554
# HPC usage
by Jonas Weber
## Useful links:
[HPC Antrag](https://www.zim.hhu.de/high-performance-computing.html)
[HPC wiki](https://wiki.hhu.de/display/HPC/Wissenschaftliches+Hochleistungs-Rechnen+am+ZIM)
[screen wiki][screen]
[myJam website][myJam]
## How to connect via terminal
normal login over terminal (exchange joweb106 with your username):
ssh joweb106@hpc.rz.uni-duesseldorf.de
A useful step using Linux is creating an alias so you don't have to type the full HPC address every time you want to connect. This is done in the following way:
Open the following file with an editor of your choice:
~/.ssh/config
and add the lines
Host hilbert
HostName hpc.rz.uni-duesseldorf.de
User [yourUserName]
This way you can just type
ssh hilbert
## Other ways of connecting
You can mount a HPC directory:
sshfs username@hpc.rz.uni-duesseldorf.de:some/path/on/the/HPC some/path/on/your/system
Example:
sshfs joweb106@hpc.rz.uni-duesseldorf.de:/gpfs/scratch/joweb106 Documents/hilbert3/
Or use [Filezilla](https://filezilla-project.org/)
Also look at [this HPC wiki post](https://wiki.hhu.de/display/HPC/Filesysteme+mounten)
## screen
One can use [screen] to start a new screen session
screen # without name
screen -S sitzung1 # with name
Use `ctrl+a+d` to detach from this screen.
reconnect using:
screen -r
screen -r sitzung1 # with specific name
## Which directory should be used
Work in `/gpfs/scratch/username`
cd /gpfs/scratch/joweb106/
## Modules
You can load specific software with modules:
module load module-name
List of all modules:
module avail
Some Modules used:
CellRanger/3.1.0
Velocyto/0.17.17
SamTools/1.6
Jupyter
## Interactive session
To get an interactive session use the `qsub` command when connected to the HPC. Make sure to first call `screen` so you can detach an reconnect from/to your session. You can monitor your jobs with [myJam]. You might need to catch the output in some logfile. Tips for this can be found in the section Handling stdout and stderr.
template:
screen
qsub -A project-name -I -l select=1:ncpus=number:mem=numberG -l walltime=h:mm:ss
cd /gpfs/scratch/username
module load ...
somecommand |& tee -a log.txt
example
screen
qsub -A singlecellseq -I -l select=1:ncpus=4:mem=10G -l walltime=3:00:00
cd /gpfs/scratch/joweb106/
module load Python/3.6.5
python my_script.py |& tee -a log.txt
Also look at [this HPC wiki post](https://wiki.hhu.de/display/HPC/Entwicklungs-Server)
## Other useful stuff
### Display sizes of all files and dirs
du -bsh *
### Find all files of certain form
find . -name "foo*"
### Show current path
pwd
### Handling stdout and stderr
echo "hi" >> log.txt #stdout -> log
echo "hi" | tee -a log.txt #stdout -> log & stdout
echo "hi" &>> log.txt #stdout & stderr -> log
echo "hi" |& tee -a log.txt #stdout & stderr -> log & stdout
[screen]: https://wiki.ubuntuusers.de/Screen/
[myJam]: myjam3.hhu.de
# Todos:
- clusters:
1. `cellranger reanalyze` with better parameters?
2. Ask Tobias about clusters done with Seurat
- velocyto:
1. Is it ok to simply combine the loom files of the 3 count datasets? A [github issue](https://github.com/velocyto-team/velocyto.py/issues/2) about this. And [another one](https://github.com/velocyto-team/velocyto.R/issues/63).
2. Implement velocyto.py workflow of rna velocity. (started with python notebook)
- snakemake:
- implement workflow in snakemake. (also with qsub)
# Data
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment