Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
Keep first-party production Rust source files at or below 500 lines. This applies
to files under `rust/bioscript-*/src/**/*.rs`.

When editing BioScript Rust, prefer adding behavior to a small, named module
whose filename describes the responsibility. If a file is approaching 500 lines,
split it along a real domain boundary before adding more code. Do not satisfy
the guard by creating arbitrary numbered chunks or `*_part_*` files.

The 500-line rule does not apply to:

- integration tests and unit-test modules
Expand All @@ -16,6 +21,5 @@ production limit measures production code, not test scaffolding. Test files
should still be split when they mix unrelated behavior or become hard to scan.

When a production file grows past 500 lines, split it before adding more
behavior. Temporary exceptions must be listed in this file under
`Current Refactor Backlog`; the source-size guard reads that list and fails when
it drifts from the code.
behavior. Keep the include list in the parent file short and logical, and leave
file names meaningful enough that future agents can find the right place to edit.
138 changes: 138 additions & 0 deletions docs/assay-schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Assay Schema

Use an assay when a named test observes one or more variants and emits custom derived report fields.

An assay is different from a panel: a panel is a collection of mostly independent observations, while an assay has its own interpretation logic. APOL1 is an assay because it observes G1/G2 sites and reports one derived APOL1 status.

## Schema Identity

```yaml
schema: "bioscript:assay:1.0"
version: "1.0"
```

## Minimal Shape

```yaml
schema: "bioscript:assay:1.0"
version: "1.0"
name: "APOL1"
label: "APOL1 Risk Assay"
tags:
- "type:risk"
- "gene:APOL1"

members:
- kind: "variant"
path: "g1-site-1.yaml"
version: "1.0"
- kind: "variant"
path: "g1-site-2.yaml"
version: "1.0"
- kind: "variant"
path: "g2-site.yaml"
version: "1.0"

analyses:
- id: "apol1_status"
kind: "bioscript"
path: "apol1.py"
output_format: "tsv"
label: "APOL1 risk genotype"
derived_from:
- "g1-site-1.yaml"
- "g1-site-2.yaml"
- "g2-site.yaml"
emits:
- key: "apol1_status"
label: "APOL1 status"
value_type: "string"
format: "badge"
logic:
source:
name: "Example derivation source"
url: "https://example.org/assay-logic"
description: >
Optional human-readable description of the derivation logic implemented by the analysis script.
```

## Members

Assay members are currently local variant YAML files:

```yaml
- kind: "variant"
path: "g1-site-1.yaml"
version: "1.0"
```

Rules:

- `kind` is required and currently must be `variant`
- `path` is required
- `version` is recommended
- keep variant identity, coordinates, alleles, findings, and provenance in the variant YAML files

## Analyses

Use `analyses` for custom output derived from the member variants. The older `interpretations` key is accepted for compatibility, but new manifests should use `analyses`.

Rules:

- `id`, `kind`, `path`, and `derived_from` are required
- `kind` is currently `bioscript`
- `path` points to a BioScript-compatible Python file
- `output_format` is optional and defaults to `tsv`; use `json` or `jsonl` when the script writes structured JSON output
- `derived_from` lists the variant YAML files used by the interpretation
- `emits` is optional but recommended so report generators know which output columns to display and how to label them
- `logic` is optional; use `logic.description` and `logic.source.url` to document where the script's derivation rules came from

## Findings

Use `findings` for evidence that binds either to a variant observation or an emitted analysis value. Keep the executable logic in `analyses`; keep PGx evidence and reporting semantics in YAML.

```yaml
findings:
- schema: "bioscript:pgx-label:1.0"
id: "clinpgx_PA166313401"
label: "ClinPGx drug label annotation PA166313401"
authority_type: "regulatory_label"
binding:
source: "analysis"
analysis_id: "apoe_epsilon"
key: "apoe_status"
operator: "equals"
value: "e4/e4"
drugs:
- name: "lecanemab"
aliases:
- "LEQEMBI"
evidence:
source: "ClinPGx"
kind: "label_annotation"
id: "PA166313401"
url: "https://www.clinpgx.org/labelAnnotation/PA166313401"
notes: "Drug label annotation applies when APOE status is e4/e4."
```

Binding rules:

- `source` is `analysis` or `variant`
- `analysis` bindings require `analysis_id`, `key`, and either `operator: equals` with `value` or `operator: in` with `values`
- `variant` bindings require `variant` or `path`, `key`, and either `equals`/`value` or `in`/`values`
- PGx label findings use `schema: "bioscript:pgx-label:1.0"` and should include `regulatory_sources`, `pgx_action_level` or `prescribing_actions` when known
- PGx summary findings use `schema: "bioscript:pgx-summary:1.0"` and should include `evidence_level`, `phenotype_categories`, and genotype-specific `effects` when known
- PGx findings should include `drugs` and should link to the exact ClinPGx/PharmGKB/ClinVar evidence page

## Inclusion In Panels

A larger panel may include an assay as a member:

```yaml
members:
- kind: "assay"
path: "../risk/APOL1/assay.yaml"
version: "1.0"
```

When a panel includes an assay, the assay's variant observations can be expanded into the panel output, while report tooling can also run the assay's interpretation and include its emitted fields.
66 changes: 66 additions & 0 deletions docs/assay-schema.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
schema: "bioscript:assay:1.0"
version: "1.0"
name: "APOL1"
label: "APOL1 Risk Assay"
summary: "APOL1 assay that observes G1 and G2 sites and emits the derived APOL1 risk genotype."
tags:
- "type:risk"
- "gene:APOL1"

members:
- kind: "variant"
path: "g1-site-1.yaml"
version: "1.0"
- kind: "variant"
path: "g1-site-2.yaml"
version: "1.0"
- kind: "variant"
path: "g2-site.yaml"
version: "1.0"

analyses:
- id: "apol1_status"
kind: "bioscript"
path: "apol1.py"
output_format: "tsv"
label: "APOL1 risk genotype"
derived_from:
- "g1-site-1.yaml"
- "g1-site-2.yaml"
- "g2-site.yaml"
emits:
- key: "apol1_status"
label: "APOL1 status"
value_type: "string"
format: "badge"
logic:
source:
name: "Example derivation source"
url: "https://example.org/assay-logic"
description: >
Optional human-readable description of the derivation logic implemented by the analysis script.

findings:
- schema: "bioscript:pgx-label:1.0"
id: "example_analysis_bound_pgx_finding"
label: "Example analysis-bound PGx finding"
authority_type: "regulatory_label"
binding:
source: "analysis"
analysis_id: "apol1_status"
key: "apol1_status"
operator: "equals"
value: "G2/G2"
drugs:
- name: "example drug"
aliases:
- "example brand"
regulatory_sources:
- "FDA"
pgx_action_level: "Actionable PGx"
evidence:
source: "ClinPGx"
kind: "label_annotation"
id: "PA..."
url: "https://www.clinpgx.org/labelAnnotation/PA..."
notes: "Findings can bind to emitted analysis keys using equals or in."
69 changes: 65 additions & 4 deletions docs/panel-schema.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Panel Schema

Use a panel when you want one manifest that points to a curated set of runnable variant records.
Use a panel when you want one manifest that points to a curated set of runnable variant records, assay manifests, and optional interpretation scripts derived from those records.

Right now the Rust runner supports variant members directly. Keep the shape simple.
The Rust runner supports variant members directly. Test tooling can also run declared interpretation scripts and add their emitted fields to the generated report.

## Schema Identity

Expand All @@ -25,9 +25,26 @@ members:
- kind: "variant"
path: "variants/rs671.yaml"
version: "1.0"
- kind: "assay"
path: "../risk/APOL1/assay.yaml"
version: "1.0"
- kind: "variant"
path: "variants/rs713598.yaml"
version: "1.0"

analyses:
- id: "taste_status"
kind: "bioscript"
path: "interpretations/taste.py"
output_format: "tsv"
label: "Taste status"
derived_from:
- "variants/rs713598.yaml"
emits:
- key: "taste_status"
label: "Taste status"
value_type: "string"
format: "badge"
```

## Purpose
Expand All @@ -37,30 +54,74 @@ A panel is:
- a selection manifest
- a stable name for a bundle of variants
- something the Rust `bioscript` command can run directly
- a way to include smaller assay manifests in a broader bundle
- a place to declare interpretation chunks that derive custom report fields from member variants

It is not:

- a full remote package manager
- a replacement for richer assay manifests
- a place to hide variant metadata inside Python when YAML can describe it

## Members

Each member must currently be:
Each member must currently be a local variant or assay:

```yaml
- kind: "variant"
path: "variants/rs671.yaml"
version: "1.0"
- kind: "assay"
path: "../risk/APOL1/assay.yaml"
version: "1.0"
```

Rules:

- `kind` is required
- exactly one of `path` or `download` is required
- current runner support is `variant` members only
- current runner support is local `variant` and `assay` members
- `version` is recommended for local members
- `sha256` is optional for local members

## Analyses

Use `analyses` when a panel needs custom derived output that is not the same thing as a single variant observation. Examples include APOE epsilon genotype from rs429358/rs7412 or APOL1 G0/G1/G2 status from three sites. The older `interpretations` key is accepted for compatibility, but new manifests should use `analyses`.

```yaml
analyses:
- id: "apoe_epsilon"
kind: "bioscript"
path: "variants/APOE/apoe.py"
output_format: "tsv"
label: "APOE epsilon genotype"
derived_from:
- "variants/APOE/rs429358.yaml"
- "variants/APOE/rs7412.yaml"
emits:
- key: "apoe_status"
label: "APOE status"
value_type: "string"
format: "badge"
logic:
source:
name: "ClinPGx / PharmGKB"
url: "https://www.clinpgx.org/variant/PA166155341/overview"
description: >
Optional human-readable description of the derivation logic implemented by the analysis script.
```

Rules:

- `id`, `kind`, `path`, and `derived_from` are required
- `kind` is currently `bioscript`
- `path` points to a BioScript-compatible Python file
- `output_format` is optional and defaults to `tsv`; use `json` or `jsonl` when the script writes structured JSON output
- `derived_from` lists the variant YAML files used by the interpretation
- `emits` is optional but recommended so report generators know which output columns to display and how to label them
- `logic` is optional; use `logic.description` and `logic.source.url` to document where the script's derivation rules came from
- keep variant identity, coordinates, alleles, findings, and provenance in YAML; keep cross-variant logic in the interpretation script

## Permissions And Downloads

Panels may declare remote downloads up front even if the current runner only executes local members.
Expand Down
23 changes: 23 additions & 0 deletions docs/panel-schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,29 @@ members:
- kind: "variant"
path: "variants/rs671.yaml"
version: "1.0"
- kind: "assay"
path: "../risk/APOL1/assay.yaml"
version: "1.0"
- kind: "variant"
path: "variants/rs713598.yaml"
version: "1.0"

analyses:
- id: "taste_status"
kind: "bioscript"
path: "interpretations/taste.py"
output_format: "tsv"
label: "Taste status"
derived_from:
- "variants/rs713598.yaml"
emits:
- key: "taste_status"
label: "Taste status"
value_type: "string"
format: "badge"
logic:
source:
name: "Example derivation source"
url: "https://example.org/panel-analysis-logic"
description: >
Optional human-readable description of the derivation logic implemented by the analysis script.
Loading
Loading