Skip to content

Commit f219849

Browse files
changed order of readme
1 parent 2efd87b commit f219849

File tree

1 file changed

+74
-74
lines changed

1 file changed

+74
-74
lines changed

README.md

+74-74
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,80 @@ chr1 60 61 89.0 115.0 chr1 60 61 92.0 117.0
6767
```
6868

6969

70+
## Code
71+
72+
### EM Script
73+
74+
After preparing data as above, you can run EM script as follows:
75+
76+
```bash
77+
python EM/em.py <input_path> <output_directory> <num_samples> <--max_iterations> <--unknowns> <--parallel_job_id <--convergence> <--random_restarts>
78+
```
79+
80+
CelFiE takes several parameters. `Input_path`, `output_directory,` and `num_samples` are the only mandatory parameters.
81+
82+
```bash
83+
usage: em.py [-h] [-m MAX_ITERATIONS] [-u UNKNOWNS] [-p PARALLEL_JOB_ID]
84+
[-c CONVERGENCE] [-r RANDOM_RESTARTS]
85+
input_path output_directory num_samples
86+
87+
CelFiE - Cell-free DNA decomposition. CelFie estimated the cell type of origin
88+
proportions of a cell-free DNA sample.
89+
90+
positional arguments:
91+
input_path The path to the input file
92+
output_directory The path to the output directory
93+
num_samples Number of cfdna samples
94+
95+
optional arguments:
96+
-h, --help show this help message and exit
97+
-m MAX_ITERATIONS, --max_iterations MAX_ITERATIONS
98+
How long the EM should iterate before stopping, unless
99+
convergence criteria is met. Default 1000.
100+
-u UNKNOWNS, --unknowns UNKNOWNS
101+
Number of unknown categories to be estimated along
102+
with the reference data. Default 1. Can be increased to 2+ for large samples.
103+
-p PARALLEL_JOB_ID, --parallel_job_id PARALLEL_JOB_ID
104+
Replicate number in a simulation experiment. Default
105+
1.
106+
-c CONVERGENCE, --convergence CONVERGENCE
107+
Convergence criteria for EM. Default 0.001.
108+
-r RANDOM_RESTARTS, --random_restarts RANDOM_RESTARTS
109+
CelFiE will perform several random restarts and select
110+
the one with the highest log-likelihood. Default 10.
111+
```
112+
113+
### Output
114+
115+
CelFiE will output the tissue estimates for each sample in your input - i.e. the proportion of each tissue in the reference making up the cfDNA sample. See `celfie_demo/sample_output/1_tissue_proportions.txt` for an example of this output.
116+
117+
```
118+
tissue1 tissue2 .... unknown
119+
sample1 0.05 0.08 .... 0.1
120+
sample2 0.7 0.12 .... 0.2
121+
122+
```
123+
124+
CelFiE also outputs the methylation proportions for each of the tissues plus however many unknowns were estimated. This output will look like this:
125+
126+
```
127+
tissue1 tissue2 ... unknown
128+
CpG1 0.99 1.0 ... 0.3
129+
CpG2 0.45 0.88 ... 0.1
130+
```
131+
132+
Sample code for processing both of these outputs can be seen in `demo.ipynb`.
133+
134+
### L1 projection method
135+
136+
We also developed a method to project estimates onto the L1 ball, based on Duchi et al 2008. The code for this method is available at `EM/projection.py`. It can be ran as
137+
138+
```python
139+
python projection.py <output_dir> <replicate> <number of tissues> <number of sites> <number of individuals> <input depth> <reference depth> <tissue_proportions.pkl>
140+
```
141+
142+
Sample tissue proportions are included at `EM/simulations/unknown_sim_0201_10people.pkl`.
143+
70144
## Tissue Informative Markers
71145

72146
In our paper, we identified a set of tissue informative markers (TIMs). We claim that these are a good set of CpGs to use for decomposition.
@@ -143,80 +217,6 @@ The pipeline can then be ran as
143217
./tim.sh
144218
```
145219

146-
## Code
147-
148-
### EM Script
149-
150-
After preparing data as above, you can run EM script as follows:
151-
152-
```bash
153-
python EM/em.py <input_path> <output_directory> <num_samples> <--max_iterations> <--unknowns> <--parallel_job_id <--convergence> <--random_restarts>
154-
```
155-
156-
CelFiE takes several parameters. `Input_path`, `output_directory,` and `num_samples` are the only mandatory parameters.
157-
158-
```bash
159-
usage: em.py [-h] [-m MAX_ITERATIONS] [-u UNKNOWNS] [-p PARALLEL_JOB_ID]
160-
[-c CONVERGENCE] [-r RANDOM_RESTARTS]
161-
input_path output_directory num_samples
162-
163-
CelFiE - Cell-free DNA decomposition. CelFie estimated the cell type of origin
164-
proportions of a cell-free DNA sample.
165-
166-
positional arguments:
167-
input_path The path to the input file
168-
output_directory The path to the output directory
169-
num_samples Number of cfdna samples
170-
171-
optional arguments:
172-
-h, --help show this help message and exit
173-
-m MAX_ITERATIONS, --max_iterations MAX_ITERATIONS
174-
How long the EM should iterate before stopping, unless
175-
convergence criteria is met. Default 1000.
176-
-u UNKNOWNS, --unknowns UNKNOWNS
177-
Number of unknown categories to be estimated along
178-
with the reference data. Default 1. Can be increased to 2+ for large samples.
179-
-p PARALLEL_JOB_ID, --parallel_job_id PARALLEL_JOB_ID
180-
Replicate number in a simulation experiment. Default
181-
1.
182-
-c CONVERGENCE, --convergence CONVERGENCE
183-
Convergence criteria for EM. Default 0.001.
184-
-r RANDOM_RESTARTS, --random_restarts RANDOM_RESTARTS
185-
CelFiE will perform several random restarts and select
186-
the one with the highest log-likelihood. Default 10.
187-
```
188-
189-
### Output
190-
191-
CelFiE will output the tissue estimates for each sample in your input - i.e. the proportion of each tissue in the reference making up the cfDNA sample. See `celfie_demo/sample_output/1_tissue_proportions.txt` for an example of this output.
192-
193-
```
194-
tissue1 tissue2 .... unknown
195-
sample1 0.05 0.08 .... 0.1
196-
sample2 0.7 0.12 .... 0.2
197-
198-
```
199-
200-
CelFiE also outputs the methylation proportions for each of the tissues plus however many unknowns were estimated. This output will look like this:
201-
202-
```
203-
tissue1 tissue2 ... unknown
204-
CpG1 0.99 1.0 ... 0.3
205-
CpG2 0.45 0.88 ... 0.1
206-
```
207-
208-
Sample code for processing both of these outputs can be seen in `demo.ipynb`.
209-
210-
### L1 projection method
211-
212-
We also developed a method to project estimates onto the L1 ball, based on Duchi et al 2008. The code for this method is available at `EM/projection.py`. It can be ran as
213-
214-
```python
215-
python projection.py <output_dir> <replicate> <number of tissues> <number of sites> <number of individuals> <input depth> <reference depth> <tissue_proportions.pkl>
216-
```
217-
218-
Sample tissue proportions are included at `EM/simulations/unknown_sim_0201_10people.pkl`.
219-
220220
## Figures
221221

222222
Jupyter notebooks to reproduce figures and statistical analyses for the final version of this manuscript can be found in `paper_figures` directory.

0 commit comments

Comments
 (0)