cellmaps_pipeline package

Submodules

cellmaps_pipeline.cellmaps_pipelinecmd module

cellmaps_pipeline.cellmaps_pipelinecmd.main(args)[source]

Main entry point for program. The Cell Maps Pipeline takes ImmunoFluorescent images from the Human Protein Atlas along with Affinity Purification Mass Spectrometry data from one or more sources, converts them into embeddings that are then co-embedded and converted into an integrated interaction network from which a hierarchical model is derived.

Parameters:

args (list) – arguments passed to command line usually sys.argv[1:]()

Returns:

return value of cellmaps_pipeline.runner.CellmapsPipeline.run() or 2 if an exception is raised

Return type:

int

cellmaps_pipeline.exceptions module

exception cellmaps_pipeline.exceptions.CellmapsPipelineError[source]

Bases: Exception

Base exception for cellmaps_pipeline

cellmaps_pipeline.runner module

class cellmaps_pipeline.runner.CellmapsPipeline(outdir=None, runner=None, input_data_dict=None)[source]

Bases: object

Manages the execution of the Cellmaps pipeline. This class is responsible for setting up the environment, executing the runner, and handling the logging and cleanup tasks associated with the pipeline execution.

Constructor

Parameters:
  • outdir – The directory where the pipeline’s output will be stored.

  • runner – The runner object responsible for executing the pipeline steps.

  • input_data_dict – A dictionary of input data settings that may affect pipeline execution.

Raises:

CellmapsPipelineError – If the output directory is not provided, it raises an error.

run()[source]

Runs CM4AI Pipeline. This method ensures that all steps are logged and any exceptions are caught, and the final status is returned.

Returns:

The exit status of the pipeline run. Returns 0 if successful, otherwise returns an error code.

Return type:

int

class cellmaps_pipeline.runner.PipelineRunner(outdir)[source]

Bases: object

Base class for running pipeline commands in a generic execution environment. This class should be subclassed to provide specific implementations for different execution environments such as local or SLURM-based clusters.

Constructor

Parameters:

outdir – The output directory where all pipeline generated files will be stored.

run()[source]

Abstract method to run the pipeline. This method should be implemented by subclasses.

Raises:

NotImplementedError – If the subclass does not implement this method.

class cellmaps_pipeline.runner.ProgrammaticPipelineRunner(outdir=None, cm4ai_apms=None, cm4ai_image=None, samples=None, unique=None, edgelist=None, baitlist=None, model_path=None, proteinatlasxml=None, ppi_cutoffs=None, fake=None, provenance=None, skip_logging=False, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>, fold=[1], input_data_dict=None)[source]

Bases: PipelineRunner

Runs pipeline programmatically in a serial fashion

Constructor

Parameters:
  • outdir – Output directory for results and logs.

  • cm4ai_apms – Path to CM4AI AP-MS data.

  • cm4ai_image – Path to CM4AI image data.

  • samples – Path to samples data.

  • unique – Path to unique data.

  • edgelist – Path to the network edge list.

  • baitlist – Path to the bait list.

  • model_path – Path to the model used for embedding.

  • proteinatlasxml – Path to the ProteinAtlas XML data.

  • ppi_cutoffs – Cutoff thresholds for PPI data processing.

  • fake – Uses fake embeddings for testing purposes.

  • provenance – Provenance information for reproducibility.

  • skip_logging – Skips logging of pipeline steps if True.

  • provenance_utils – Utility for handling provenance data.

  • fold – List of fold of image data.

  • input_data_dict – Dictionary containing input data configurations.

run()[source]

Runs pipeline programmatically in serial steps. This would be the same as running the steps in a notebook.

Raises:

CellmapsPipelineError – If any step in the pipeline fails, indicating the step and reason.

Returns:

Exit code 0 if successful, other values indicate failure.

class cellmaps_pipeline.runner.SLURMPipelineRunner(outdir=None, cm4ai_apms=None, cm4ai_image=None, samples=None, unique=None, edgelist=None, baitlist=None, model_path=None, proteinatlasxml=None, ppi_cutoffs=None, fake=None, provenance=None, fold=[1], input_data_dict=None, slurm_partition=None, slurm_account=None)[source]

Bases: PipelineRunner

Generates SLURM batch files and wrapper script to run various steps in a SLURM environment

Parameters:
  • outdir – Path to the output directory.

  • cm4ai_apms – Path to the CM4AI APMS data file.

  • cm4ai_image – Path to the CM4AI image data file.

  • samples – Path to the samples data file.

  • unique – Path to the unique data file.

  • edgelist – Path to the edge list file for PPI data.

  • baitlist – Path to the bait list file for PPI data.

  • model_path – Path to the pre-trained model for embedding generation.

  • proteinatlasxml – Path to the Protein Atlas XML data.

  • ppi_cutoffs – Cutoff thresholds for PPI data filtering.

  • fake – Boolean indicating whether to use fake data embedding for testing.

  • provenance – Path to the provenance data file.

  • fold – Data folds to process.

  • input_data_dict – Dictionary of input data configurations.

  • slurm_partition – Name of the SLURM partition to submit jobs to.

  • slurm_account – SLURM account name for job submission.

run()[source]

Runs pipelines

Module contents

Top-level package for Cell Maps Pipeline.