Usage
Command Line Interface
PhenoQC provides a command-line interface for batch processing of phenotypic data files.
Basic Usage
Process a single file:
phenoqc \
--input examples/samples/sample_data.json \
--output ./reports/ \
--schema examples/schemas/pheno_schema.json \
--config config.yaml \
--impute mice \
--unique_identifiers SampleID \
--phenotype_columns '{"PrimaryPhenotype": ["HPO"], "DiseaseCode": ["DO"]}' \
--ontologies HPO DO
Batch Processing
Process multiple files:
phenoqc \
--input examples/samples/sample_data.csv examples/samples/sample_data.json \
--output ./reports/ \
--schema examples/schemas/pheno_schema.json \
--config config.yaml \
--impute none \
--unique_identifiers SampleID \
--ontologies HPO DO MPO
Parameters
--input: One or more data files or directories (.csv, .tsv, .json, .zip)--output: Directory for saving processed data and reports--schema: Path to the JSON schema for data validation--config: YAML config file defining ontologies and settings--custom_mappings: Path to custom term-mapping JSON (optional)--impute: Strategy for missing data (mean, median, mode, knn, mice, svd, none)--unique_identifiers: Columns that uniquely identify each record--phenotype_columns: JSON mapping of columns to ontologies--ontologies: List of ontology IDs--recursive: Enable recursive scanning of directories
Graphical User Interface
Launch the GUI:
python run_gui.py
The GUI provides an interactive interface for:
Uploading configuration and schema files
Uploading data files
Selecting unique identifiers and ontologies
Choosing missing data strategies
Running QC and viewing results
Configuration
PhenoQC uses a YAML configuration file to define settings. Example config.yaml:
ontologies:
HPO:
name: Human Phenotype Ontology
source: url
url: http://purl.obolibrary.org/obo/hp.obo
format: obo
DO:
name: Disease Ontology
source: url
url: http://purl.obolibrary.org/obo/doid.obo
format: obo
default_ontologies:
- HPO
- DO
fuzzy_threshold: 80
cache_expiry_days: 30
imputation_strategies:
Age: mean
Gender: mode
Height: median
Output
PhenoQC generates:
Validated and processed data files
Quality control reports (PDF/Markdown)
Visual summaries of data quality
Detailed logs of the QC process
Troubleshooting
Common issues:
Ontology Mapping Failures: Check if config.yaml points to valid ontology URLs
Missing Required Columns: Ensure specified columns exist in the dataset
Imputation Errors: Verify column data types match imputation strategy
Logs: Check phenoqc_*.log for detailed error messages