Skip to content

Use NEMAR as the default dataset download source#1085

Merged
bruAristimunha merged 19 commits into
developfrom
copilot/update-moabb-nemar-py
Jun 24, 2026
Merged

Use NEMAR as the default dataset download source#1085
bruAristimunha merged 19 commits into
developfrom
copilot/update-moabb-nemar-py

Conversation

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

MOABB datasets need to download from re-hosted NEMAR datasets by default where an equivalent NEMAR ID exists, while preserving each dataset’s existing downloader as a fallback.

  • NEMAR integration

    • Adds nemar-py as a runtime dependency.
    • Assigns NEMAR IDs directly on each dataset class via nemar_id.
    • Adds a nemar_dl() helper that downloads into MOABB’s existing dataset cache layout.
  • Download behavior

    • BaseDataset.download() now attempts NEMAR first when dataset.nemar_id is available.
    • Existing data_path() downloaders remain the fallback path on NEMAR download failure.
    • Subject IDs are formatted for BIDS/NEMAR using nemar_subject_template.
  • Coverage

    • Adds non-CI download-equivalence coverage gated behind --dl-data.
    • Adds CI-safe coverage ensuring all dataset classes expose nemar_id.
    • Adds unit coverage for NEMAR IDs, NEMAR-first behavior, and fallback behavior.

Example:

from moabb.datasets import BNCI2014_001

dataset = BNCI2014_001()
print(dataset.nemar_id)  # "nm000139"

dataset.download(subject_list=[1])  # tries NEMAR first, then existing fallback

@bruAristimunha

Copy link
Copy Markdown
Collaborator

Each nemar ID should be in each dataset class, rather than centralized in moabb/datasets/nemar.py. And one test that should be executed in the CI should ensure that all datasets have the nemar_id.

Copilot AI changed the title [WIP] Update moabb to use nemar-py for dataset downloads Use NEMAR as the default dataset download source Jun 18, 2026
Copilot AI requested a review from bruAristimunha June 18, 2026 12:20
@bruAristimunha

Copy link
Copy Markdown
Collaborator

please address my comments and the pre-commit @copilot

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

please address my comments and the pre-commit @copilot

Addressed in fc908ee: NEMAR IDs now live on dataset classes, the central mapping module was removed, CI-safe nemar_id coverage was added, and pre-commit now passes.

Set nemar_id (plus subject templates / BIDS filters where needed) on each
dataset class and load subjects from NEMAR via mne-bids with session/run
normalization. IDs verified against the live NEMAR catalog
(https://data.nemar.org). Updates download.py helper and tests accordingly.
@bruAristimunha bruAristimunha marked this pull request as ready for review June 20, 2026 17:59

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8effc0d0be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread moabb/datasets/download.py Outdated
Comment on lines +270 to +272
if force_update and target_dir.exists():
try:
shutil.rmtree(target_dir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid deleting earlier subjects during forced NEMAR downloads

When BaseDataset.download() is called with multiple subjects and force_update=True, it invokes nemar_dl() once per subject, so this block removes the whole NEMAR dataset directory before every subject download. The first subject is downloaded, then the second iteration deletes it, and the final cache contains only the last subject instead of the requested list; this breaks the documented “download all data” path whenever users force-refresh a multi-subject dataset.

Useful? React with 👍 / 👎.

Comment thread moabb/tests/test_datasets.py
Keep each NEMAR id as a single class attribute line instead of an
attribute plus a trailing blank line; consolidate PhysionetMI's
nemar_subject_template next to its nemar_id. No id values changed.
Add nemar_id to the 18 datasets still missing one, with ids verified
against the NEMAR catalog (nemar-cli --mine), including the private
deposits Beetl2021_B (nm000274), Lee2019_SSVEP (nm000273) and
Mainsah2025_A (nm000269), plus Nieto2022 (OpenNeuro ds003626). Exempt
the test fixtures and Schrag2026Pediatric (no NEMAR deposit) in
test_all_datasets_have_valid_nemar_id.
@PierreGtch

Copy link
Copy Markdown
Collaborator

Each nemar ID should be in each dataset class, rather than centralized in moabb/datasets/nemar.py. And one test that should be executed in the CI should ensure that all datasets have the nemar_id.

What should be the workflow when adding a new dataset to MOABB?
As I understand, you need to first create and merge the MOABB PR to trigger the nemar dataset creation and get a nemar id.
So the MOABB PR would have to be merged with a red test, no ?

For datasets whose NEMAR deposit is assigned but not yet public, keep
the id in a commented-out `# nemar_id = "..."` line (so it is ready to
enable on publication) and list them in NEMAR_ID_PENDING. These skip
test_all_datasets_have_valid_nemar_id, alongside the no-deposit exempt
datasets. Nieto2022 (public OpenNeuro ds003626) stays active.
Lock in that download() skips NEMAR entirely and calls the dataset's
own data_path when nemar_id is None (e.g. the pending, commented-out
deposits).
Add a "Publishing your dataset on NEMAR" section to the adding-a-dataset
tutorial: the nemar-cli upload/publish workflow, wiring nemar_id (and
nemar_subject_template), and the data_path fallback when nemar_id is unset
or a NEMAR download fails.

@PierreGtch PierreGtch left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I made 3 comments, feel free to ignore if you don't agree

Comment thread examples/tutorials/tutorial_4_adding_a_dataset.py
Comment thread examples/tutorials/tutorial_4_adding_a_dataset.py Outdated
Comment thread moabb/datasets/base.py

_summary_table: dict[str, Any]
nemar_id: str | None = None
nemar_subject_template: str | None = "{subject}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nemar_subject_template is not flexible enough. In BIDS, the subject label can be any string: https://bids-specification.readthedocs.io/en/stable/glossary.html#subject-entities
Do we really need nemar_subject_template? Can't we so something smarter to automatically infer the mapping?
The simplest solution would be if subjects in moabb were also strings and not ints.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have some details that needs this, like 001, 01, 1... this is way i went to string, but we can do better.... maybe modify at the nemar-py... thinking

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what about using strings as subject ID in MOABB too?
This way, we don’t need to convert anything. Tt would be simpler and more aligned with BIDS

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i would prefer to address this later.

bruAristimunha and others added 2 commits June 22, 2026 14:42
Co-authored-by: Pierre Guetschel <25532709+PierreGtch@users.noreply.github.com>
Signed-off-by: Bru <b.aristimunha@gmail.com>
nemar_dl removed the whole shared BIDS root on force_update, but
download() calls it once per subject, so refreshing a multi-subject
dataset wiped every subject except the last. Rely on
trust_existing=not force_update to re-fetch the requested subject and
drop the rmtree (and the now-unused shutil import).

Addresses Codex review comment on PR #1085.
@bruAristimunha bruAristimunha merged commit 4a259f5 into develop Jun 24, 2026
14 checks passed
@bruAristimunha bruAristimunha deleted the copilot/update-moabb-nemar-py branch June 24, 2026 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants