Command Line Interface

Netzoopy also has a command line interface that allows to run the animal methods from the command line, for example:

netzoopy panda --e expression.txt --m motif.txt --p ppi.txt --o output_panda.txt

The command above would run the panda message passing method using the expression, motif and ppi data passed as input, and returning an output file containing the panda weighted edges.

While we expect to provide most animals from CLI, so far you can use: panda,lioness.

If you want to use the previous command line calls (python run_panda.py), check the Run functions (legacy)

Commands

netzoopy

netzoopy [OPTIONS] COMMAND [ARGS]...

panda

Run panda using expression, motif and ppi data. Use flags to modify the function behavior. By default, boolean flags are false. Output is a text file, with the TF, Gene, Motif, Force columns, where TF and Gene are the nodes of the network considered, Motif is the prior and force is the actual panda score.

warning: To keep the command line call clean, we have set all booleans to false as default. To set a boolean to True, from command line you just have to add the respective flag

>>> netzoopy panda ....  --rm_missing --save_tmp

In this case rm_missing and save_tmp are set to True. To replicate the Panda class default behavior pass –save_memory and –save_tmp

Example:

netzoopy panda -e tests/puma/ToyData/ToyExpressionData.txt -m tests/puma/ToyData/ToyMotifData.txt -p tests/puma/ToyData/ToyPPIData.txt -o test_panda.txt

Reference:

Glass, Kimberly, et al. “Passing messages between biological networks to refine predicted interactions.” PloS one 8.5 (2013): e64832.

netzoopy panda [OPTIONS]

Options

-e, --expression <expression>: Required Path to file containing the gene expression data. By default, the expression file does not have a header, and the cells are separated by a tab.

-m, --motif <motif>: Required Path to pair file containing the transcription factor DNA binding motif edges in the form of TF-gene-weight(0/1). If not provided, the gene coexpression matrix is returned as a result network.

-p, --ppi <ppi>: Required Path to pair file containing the PPI edges. The PPI can be symmetrical, if not, it will be transformed into a symmetrical adjacency matrix.

-o, --out <output>: Required Output panda file. Format as txt

--computing <computing>

computing option, choose one between cpu and gpu

Default:: 'cpu'

--precision <precision>

precision option

Default:: 'double'

--with_header: Pass if the expression file has a header. It will be used to save samples with the correct name.

--save_memory

panda option. When true the result network is weighted adjacency matrix of size (nTFs, nGenes). when false The result network has 4 columns in the form gene - TF - weight in motif prior - PANDA edge.

Default:: False

--as_adjacency

If true, the final PANDA is saved as an adjacency matrix. Works only when save_memory is false

Default:: False

--old_compatible

If true, PANDA is saved without headers. Pass this if you want the same results of netzoopy before v0.9.11

Default:: False

--save_tmp

panda option

Default:: False

--rm_missing: Removes the genes and TFs that are not present in one of the priors. Works only if modeProcess=legacy

--keep_expr

Keeps the input expression matrix in the result

Default:: False

--mode_process <mode_process>

panda option for input data processing. Choose between union(default), legacy and intersection

Default:: 'union'

--alpha <alpha>

panda and lioness first sample

Default:: 0.1

--start <start>

panda first sample

Default:: 1

--end <end>: panda last sample

lioness

Run Lioness to extract single-sample networks. First runs panda using expression, motif and ppi data. Then runs lioness and puts results in the output_lioness folder. Use flags to modify the function behavior. By default, boolean flags are false.

Example:

netzoopy lioness -e tests/puma/ToyData/ToyExpressionData.txt -m tests/puma/ToyData/ToyMotifData.txt -p tests/puma/ToyData/ToyPPIData.txt -op test_panda.txt -ol lioness/

Example for GPU computing. Here the biggest limitation is GPU size, hence we need to optimize the precision and make sure we don’t use all genes, but only the intersection with the PANDA priors. We also save the single lioness networks as they are computed:

netzoopy lioness -e <expression_file> -m <motif_file> -p <ppi_file> -op output_panda.txt -ol output_lioness_folder/ –computing gpu –precision single –mode_process intersection –save_single_lioness –ignore_final

LIONESS on a subset of samples. This is especially useful if you need to test whether your data is suitable for LIONESS/you have enough resources. By specifying –panda_start and –panda_end the number of samples are restricted and only the samples in the subset are used for LIONESS. The background is the one specified by panda_start and panda_end. Alternatively you can use the –subset_numbers and –subset_names flags to specify the samples to use. In this case the PANDA will be computed on all samples, but then LIONESS will be computed and saved only for a subset of samples:

netzoopy lioness -e <expression_file> -m <motif_file> -p <ppi_file> -op output_panda.txt -ol output_lioness_folder/ –computing <gpu|cpu> –panda_start 1 –panda_end 5 –precision single –mode_process intersection –save_single_lioness –ignore_final

Reference:: Kuijjer, Marieke Lydia, et al. “Estimating sample-specific regulatory networks.” Iscience 14 (2019): 226-240.

netzoopy lioness [OPTIONS]

Options

-e, --expression <expression>: Required Path to file containing the gene expression data. By default, the expression file does not have a header, and the cells are separated by a tab.

-m, --motif <motif>: Required Path to pair file containing the transcription factor DNA binding motif edges in the form of TF-gene-weight(0/1). If not provided, the gene coexpression matrix is returned as a result network.

-p, --ppi <ppi>: Required Path to pair file containing the PPI edges. The PPI can be symmetrical, if not, it will be transformed into a symmetrical adjacency matrix.

-op, --out-panda <output_panda>: Required Output panda file. Format as txt

-ol, --out-lioness <output_lioness>: Required Output lioness folder

--el <el>

Lioness output export. If a file is passed, the final output will be saved as a complete table, with indices and column names, using the format specified here. Otherwise a standard liones.fmt file with no annotation is saved

Default:: 'None'

--fmt <fmt>

Lioness network files output format. Choose one between .npy,.txt,.mat

Default:: 'npy'

--computing <computing>

computing option, choose one between cpu and gpu

Default:: 'cpu'

--precision <precision>

precision option

Default:: 'double'

--ncores <ncores>

Number of cores. Lioness CPU parallelizes over ncores

Default:: 1

--save_tmp

panda option

Default:: False

--rm_missing: Removes the genes and TFs that are not present in one of the priors. Works only if modeProcess=legacy

--mode_process <mode_process>

panda option for input data processing. Choose between union(default), legacy and intersection

Default:: 'union'

--output_type <output_type>

lioness option for output format. Choose one between network, gene_targeting, tf_targeting

Default:: 'network'

--alpha <alpha>

panda and lioness first sample

Default:: 0.1

--panda_start <panda_start>

panda first sample

Default:: 1

--panda_end <panda_end>: panda last sample

--start <start>

lioness first sample

Default:: 1

--end <end>: lioness last sample

--subset_numbers <subset_numbers>

Specify a list of samples (numbers,comma separated) to run lioness on. Background is the one specified by panda_start and panda_end

Default:: ''

--subset_names <subset_names>

Specify a list of samples (sample names,comma separated) to run lioness on. Background is the one specified by panda_start and panda_end

Default:: ''

--with_header: Pass if the expression file has a header. It will be used to save samples with the correct name.

--save_single_lioness: Pass this flag to save all single lioness networks generated.

--ignore_final: The whole lioness data is not kept in memory. Always use save_single_lioness for this

--as_adjacency

If true, the final PANDA is saved as an adjacency matrix. Works only when save_memory is false

Default:: False

--old_compatible

If true, PANDA is saved without headers. Pass this if you want the same results of netzoopy before v0.9.11

Default:: False

condor

Computation of the whole condor process. It creates a condor object and runs all the steps of BRIM on it. The function outputs Note: The edgelist is assumed to contain a bipartite network. The program will relabel the nodes so that the edgelist represents a bipartite network anyway. It is on the user to know that the network they are using is suitable for the method.

tar and reg final memberships are saved to csv

netzoopy condor [OPTIONS]

Options

-n, --network_file <network>: Required Path to file encoding an edgelist.

--sep <sep>

network file separator

Default:: ','

--index_col <index_col>

Column that stores the index of the edgelist. E.g. None, 0…

Default:: 0

--header <header>

Row that stores the header of the edgelist. E.g. None, 0…

Default:: 0

--initial_method <initial_method>

Method to determine intial community assignment. (By default Multilevel method)

Default:: 'LCS'

--initial_project

Whether to project the network onto one of the bipartite sets for the initial community detection.

Default:: False

--com_num <com_num>

Max number of communities. It is recomended to leave this to default, otherwise if the initial community assignement is bigger the program will crash.

Default:: 'def'

--delta_qmin <delta_qmin>

Difference modularity threshold for stopping the iterative process.

Default:: 'def'

--resolution <resolution>

Not yet implemented

Default:: 1

--tar_output <tar_output>

Filename for saving the tar node final membership.

Default:: 'tar_memb.txt'

--reg_output <reg_output>

Filename for saving the reg node final membership.

Default:: 'reg_memb.txt'

bonobo

Compute BONOBOs from an expression file.

Parameters the user cannot access from the CLI: - computing: for now it is only CPU - cores: number of cores to use, for now there is no parallelization - online_coexpression: we have not implemented the online coexpression yet

netzoopy bonobo [OPTIONS]

Options

-e, --expression_file <expression_file>: Required Path to file containing the gene expression data or pandas dataframe. By default, the expression file does not have a header, and the cells ares separated by a tab.

--output_folder <output_folder>

Output folder for the bonobo files. If not specified, the bonobo files are saved in the current directory, in the bonobo subdirectory.

Default:: 'bonobo/'

--output_format <output_format>

format of output bonobo matrix. By default it is an hdf file, can be a txt or csv.

Default:: '.h5'

--keep_in_memory

if True, the bonobo coexpression matrix is kept in memory, otherwise it is discarded after saving

Default:: False

--delta <delta>: delta parameter. If default (None) delta is trained, otherwise pass a value.Recommended is 0.3.

--sparsify

if True, bonobo gets sparsified and relative pvalues are returned

Default:: False

--confidence <confidence>

if sparsify is True, this is the CI for the approximate zscore.

Default:: 0.05

--save_pvals

if True, bonobo gets sparsified and relative pvalues are saved in the same format and folder of bonobo

Default:: False

--precision <precision>

matrix precision (single or double), defaults to single precision.

Default:: 'single'

--sample_names <sample_names>

Compute BONOBO only on a subset of samples. Pass a comma separated list of sample names. If not specified, all samples are used.

Default:: ''

Run functions (legacy)

Panda

netZooPy.panda.run_panda.main(argv)[source]

Description:: Run PANDA algorithm from the command line.
Inputs:: -h, –help: help -e, –expression: expression_file : Path to file containing the gene expression data. By default, the expression file does not have a header, and the cells ares separated by a tab. -m, –motif: Path to pair file containing the transcription factor DNA binding motif edges in the form of TF-gene-weight(0/1). If not provided, the gene coexpression matrix is returned as a result network. -p, –ppi: Path to pair file containing the PPI edges. The PPI can be symmetrical, if not, it will be transformed into a symmetrical adjacency matrix. -o, –out: output file. -r, –rm_missing. -q, –lioness: output for Lioness single sample networks.

Example

python run_panda.py -e ../../tests/ToyData/ToyExpressionData.txt -m ../../tests/ToyData/ToyMotifData.txt -p ../../tests/ToyData/ToyPPIData.txt -o test_panda.txt -q output_panda.txt

Reference:: Glass, Kimberly, et al. “Passing messages between biological networks to refine predicted interactions.” PloS one 8.5 (2013): e64832.

Lioness

netZooPy.lioness.run_lioness.main(argv)[source]

Description:: Run LIONESS algorithm from the command line.
Usage:: -h, –help: help -e, –expression: expression matrix (.npy) -m, –motif: motif matrix, normalized (.npy) -p, –ppi: ppi matrix, normalized (.npy) -g, –comp: use cpu (default) or gpu -r, –pre: number of digits to calcluate -c, –ncores: number cores -n, –npy: PANDA network (.npy) -o, –out: output folder -f, –format: output format (txt, npy, or mat) start: to start from nth sample (optional) end: to end at nth sample (optional, must with start)

Example

python3 run_lioness.py -e ../../tests/ToyData/ToyExpressionData.txt -m ../../tests/ToyData/ToyMotifData.txt -p ../../tests/ToyData/ToyPPIData.txt -g cpu -r single -c 2 -o /tmp -f npy 1 2

Reference:: Kuijjer, Marieke Lydia, et al. “Estimating sample-specific regulatory networks.” Iscience 14 (2019): 226-240.

Common usage