Use NEMAR as the default dataset download source#1085
Conversation
|
Each nemar ID should be in each dataset class, rather than centralized in moabb/datasets/nemar.py. And one test that should be executed in the CI should ensure that all datasets have the nemar_id. |
|
please address my comments and the pre-commit @copilot |
Addressed in |
Set nemar_id (plus subject templates / BIDS filters where needed) on each dataset class and load subjects from NEMAR via mne-bids with session/run normalization. IDs verified against the live NEMAR catalog (https://data.nemar.org). Updates download.py helper and tests accordingly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8effc0d0be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if force_update and target_dir.exists(): | ||
| try: | ||
| shutil.rmtree(target_dir) |
There was a problem hiding this comment.
Avoid deleting earlier subjects during forced NEMAR downloads
When BaseDataset.download() is called with multiple subjects and force_update=True, it invokes nemar_dl() once per subject, so this block removes the whole NEMAR dataset directory before every subject download. The first subject is downloaded, then the second iteration deletes it, and the final cache contains only the last subject instead of the requested list; this breaks the documented “download all data” path whenever users force-refresh a multi-subject dataset.
Useful? React with 👍 / 👎.
Keep each NEMAR id as a single class attribute line instead of an attribute plus a trailing blank line; consolidate PhysionetMI's nemar_subject_template next to its nemar_id. No id values changed.
Add nemar_id to the 18 datasets still missing one, with ids verified against the NEMAR catalog (nemar-cli --mine), including the private deposits Beetl2021_B (nm000274), Lee2019_SSVEP (nm000273) and Mainsah2025_A (nm000269), plus Nieto2022 (OpenNeuro ds003626). Exempt the test fixtures and Schrag2026Pediatric (no NEMAR deposit) in test_all_datasets_have_valid_nemar_id.
What should be the workflow when adding a new dataset to MOABB? |
For datasets whose NEMAR deposit is assigned but not yet public, keep the id in a commented-out `# nemar_id = "..."` line (so it is ready to enable on publication) and list them in NEMAR_ID_PENDING. These skip test_all_datasets_have_valid_nemar_id, alongside the no-deposit exempt datasets. Nieto2022 (public OpenNeuro ds003626) stays active.
Lock in that download() skips NEMAR entirely and calls the dataset's own data_path when nemar_id is None (e.g. the pending, commented-out deposits).
Add a "Publishing your dataset on NEMAR" section to the adding-a-dataset tutorial: the nemar-cli upload/publish workflow, wiring nemar_id (and nemar_subject_template), and the data_path fallback when nemar_id is unset or a NEMAR download fails.
PierreGtch
left a comment
There was a problem hiding this comment.
LGTM. I made 3 comments, feel free to ignore if you don't agree
|
|
||
| _summary_table: dict[str, Any] | ||
| nemar_id: str | None = None | ||
| nemar_subject_template: str | None = "{subject}" |
There was a problem hiding this comment.
nemar_subject_template is not flexible enough. In BIDS, the subject label can be any string: https://bids-specification.readthedocs.io/en/stable/glossary.html#subject-entities
Do we really need nemar_subject_template? Can't we so something smarter to automatically infer the mapping?
The simplest solution would be if subjects in moabb were also strings and not ints.
There was a problem hiding this comment.
I think we have some details that needs this, like 001, 01, 1... this is way i went to string, but we can do better.... maybe modify at the nemar-py... thinking
There was a problem hiding this comment.
and what about using strings as subject ID in MOABB too?
This way, we don’t need to convert anything. Tt would be simpler and more aligned with BIDS
There was a problem hiding this comment.
i think i would prefer to address this later.
Co-authored-by: Pierre Guetschel <25532709+PierreGtch@users.noreply.github.com> Signed-off-by: Bru <b.aristimunha@gmail.com>
nemar_dl removed the whole shared BIDS root on force_update, but download() calls it once per subject, so refreshing a multi-subject dataset wiped every subject except the last. Rely on trust_existing=not force_update to re-fetch the requested subject and drop the rmtree (and the now-unused shutil import). Addresses Codex review comment on PR #1085.
MOABB datasets need to download from re-hosted NEMAR datasets by default where an equivalent NEMAR ID exists, while preserving each dataset’s existing downloader as a fallback.
NEMAR integration
nemar-pyas a runtime dependency.nemar_id.nemar_dl()helper that downloads into MOABB’s existing dataset cache layout.Download behavior
BaseDataset.download()now attempts NEMAR first whendataset.nemar_idis available.data_path()downloaders remain the fallback path on NEMAR download failure.nemar_subject_template.Coverage
--dl-data.nemar_id.Example: