cellmaps_pipeline package
Submodules
cellmaps_pipeline.cellmaps_pipelinecmd module
- cellmaps_pipeline.cellmaps_pipelinecmd.main(args)[source]
Main entry point for program. The Cell Maps Pipeline takes ImmunoFluorescent images from the Human Protein Atlas along with Affinity Purification Mass Spectrometry data from one or more sources, converts them into embeddings that are then co-embedded and converted into an integrated interaction network from which a hierarchical model is derived.
- Parameters:
args (list) – arguments passed to command line usually
sys.argv[1:]()
- Returns:
return value of
cellmaps_pipeline.runner.CellmapsPipeline.run()
or2
if an exception is raised- Return type:
cellmaps_pipeline.exceptions module
cellmaps_pipeline.runner module
- class cellmaps_pipeline.runner.CellmapsPipeline(outdir=None, runner=None, input_data_dict=None)[source]
Bases:
object
Manages the execution of the Cellmaps pipeline. This class is responsible for setting up the environment, executing the runner, and handling the logging and cleanup tasks associated with the pipeline execution.
Constructor
- Parameters:
outdir – The directory where the pipeline’s output will be stored.
runner – The runner object responsible for executing the pipeline steps.
input_data_dict – A dictionary of input data settings that may affect pipeline execution.
- Raises:
CellmapsPipelineError – If the output directory is not provided, it raises an error.
- class cellmaps_pipeline.runner.PipelineRunner(outdir)[source]
Bases:
object
Base class for running pipeline commands in a generic execution environment. This class should be subclassed to provide specific implementations for different execution environments such as local or SLURM-based clusters.
Constructor
- Parameters:
outdir – The output directory where all pipeline generated files will be stored.
- run()[source]
Abstract method to run the pipeline. This method should be implemented by subclasses.
- Raises:
NotImplementedError – If the subclass does not implement this method.
- class cellmaps_pipeline.runner.ProgrammaticPipelineRunner(outdir=None, cm4ai_apms=None, cm4ai_image=None, samples=None, unique=None, edgelist=None, baitlist=None, model_path=None, proteinatlasxml=None, ppi_cutoffs=None, fake=None, provenance=None, skip_logging=False, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>, fold=[1], input_data_dict=None)[source]
Bases:
PipelineRunner
Runs pipeline programmatically in a serial fashion
Constructor
- Parameters:
outdir – Output directory for results and logs.
cm4ai_apms – Path to CM4AI AP-MS data.
cm4ai_image – Path to CM4AI image data.
samples – Path to samples data.
unique – Path to unique data.
edgelist – Path to the network edge list.
baitlist – Path to the bait list.
model_path – Path to the model used for embedding.
proteinatlasxml – Path to the ProteinAtlas XML data.
ppi_cutoffs – Cutoff thresholds for PPI data processing.
fake – Uses fake embeddings for testing purposes.
provenance – Provenance information for reproducibility.
skip_logging – Skips logging of pipeline steps if True.
provenance_utils – Utility for handling provenance data.
fold – List of fold of image data.
input_data_dict – Dictionary containing input data configurations.
- run()[source]
Runs pipeline programmatically in serial steps. This would be the same as running the steps in a notebook.
- Raises:
CellmapsPipelineError – If any step in the pipeline fails, indicating the step and reason.
- Returns:
Exit code 0 if successful, other values indicate failure.
- class cellmaps_pipeline.runner.SLURMPipelineRunner(outdir=None, cm4ai_apms=None, cm4ai_image=None, samples=None, unique=None, edgelist=None, baitlist=None, model_path=None, proteinatlasxml=None, ppi_cutoffs=None, fake=None, provenance=None, fold=[1], input_data_dict=None, slurm_partition=None, slurm_account=None)[source]
Bases:
PipelineRunner
Generates SLURM batch files and wrapper script to run various steps in a SLURM environment
- Parameters:
outdir – Path to the output directory.
cm4ai_apms – Path to the CM4AI APMS data file.
cm4ai_image – Path to the CM4AI image data file.
samples – Path to the samples data file.
unique – Path to the unique data file.
edgelist – Path to the edge list file for PPI data.
baitlist – Path to the bait list file for PPI data.
model_path – Path to the pre-trained model for embedding generation.
proteinatlasxml – Path to the Protein Atlas XML data.
ppi_cutoffs – Cutoff thresholds for PPI data filtering.
fake – Boolean indicating whether to use fake data embedding for testing.
provenance – Path to the provenance data file.
fold – Data folds to process.
input_data_dict – Dictionary of input data configurations.
slurm_partition – Name of the SLURM partition to submit jobs to.
slurm_account – SLURM account name for job submission.
Module contents
Top-level package for Cell Maps Pipeline.