What is and how to create a provenance file
A provenance file is a JSON formatted file that contains information describing sources of input data fed into the cellmaps_pipeline. It is required to maintain a chain of history for FAIRSCAPE
Template provenance file:
{
"name": "Name for pipeline run",
"organization-name": "Name of lab or group. Ex: Ideker",
"project-name": "Name of funding source or project",
"cell-line": "Name of cell line. Ex: U2OS",
"treatment": "Name of treatment, Ex: untreated",
"release": "Name of release. Example: 0.1 alpha",
"gene-set": "Name of gene set. Example chromatin",
"edgelist": {
"name": "Name of dataset",
"author": "Author of dataset",
"version": "Version of dataset",
"date-published": "Date dataset was published",
"description": "Description of dataset",
"data-format": "Format of data"
},
"baitlist": {
"name": "Name of dataset",
"author": "Author of dataset",
"version": "Version of dataset",
"date-published": "Date dataset was published",
"description": "Description of dataset",
"data-format": "Format of data"
},
"samples": {
"name": "Name of dataset",
"author": "Author of dataset",
"version": "Version of dataset",
"date-published": "Date dataset was published",
"description": "Description of dataset",
"data-format": "Format of data"
},
"unique": {
"name": "Name of dataset",
"author": "Author of dataset",
"version": "Version of dataset",
"date-published": "Date dataset was published",
"description": "Description of dataset",
"data-format": "Format of data"
}
}
The above template provenance file can be created a few ways:
By grabbing the JSON test from help output from cellmaps_pipelinecmd.py like so:
cellmaps_pipelinecmd.py -h
Or by directly writing the JSON to a file (in example below it is writing to provenance.json) via this command line invocation:
cellmaps_pipelinecmd.py . --example_provenance > provenance.json
Or, if input datasets are already registered with FAIRSCAPE
TODO
Note
FAIRSCAPE registration documentation is coming soon…