Consolidate lm-eval scripts: merge AnyModel auto-detection into lm_eval_hf.py#1206
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
💤 Files with no reviewable changes (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughConsolidates Puzzletron AnyModel support into the main HuggingFace lm-eval wrapper: removes the Puzzletron-specific entrypoint, adds conditional Puzzletron patching inside Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant LM_HF as lm_eval_hf.py
participant Puzzletron
participant HF as HuggingFace Loader
User->>LM_HF: call create_from_arg_obj(arg_dict) / create_from_arg_string(arg_string)
LM_HF->>LM_HF: extract `pretrained`, `trust_remote_code`
LM_HF->>Puzzletron: resolve_descriptor_from_pretrained(pretrained)
alt descriptor found
Puzzletron-->>LM_HF: descriptor
LM_HF->>LM_HF: enter deci_x_patcher context
LM_HF->>HF: load model (patched)
HF-->>LM_HF: patched model instance
LM_HF->>LM_HF: exit patcher context
else descriptor not found or Puzzletron unavailable
Puzzletron-->>LM_HF: error / none
LM_HF->>HF: load model normally
HF-->>LM_HF: standard model instance
end
LM_HF-->>User: return configured model
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/llm_eval/lm_eval_hf.py (1)
142-155: Feature gap:create_from_arg_stringlacks quantization/sparsity support.Unlike
create_from_arg_obj(which appliesquantize_modelandsparsify_model), this method only enables HuggingFace checkpointing and Puzzletron patching. If users invokelm_evalthrough its standard CLI (bypassing this script's__main__), quantization arguments would be silently ignored.If this is intentional (string-based path is for simpler use cases), consider adding a docstring note. Otherwise, consider extracting the quantization/sparsity logic into a shared helper.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_eval/lm_eval_hf.py` around lines 142 - 155, create_from_arg_string currently only enables HF checkpointing and Puzzletron patching but ignores quantization/sparsity steps applied by create_from_arg_obj; update create_from_arg_string to call the same quantize_model and sparsify_model logic (or extract that logic into a shared helper used by both) after constructing the model but before returning it so that quantization/sparsity CLI args in args (e.g., any quantize_* or sparsify_* flags) are honored; reference create_from_arg_string, create_from_arg_obj, quantize_model, and sparsify_model when locating and reusing the existing implementation, ensuring the helper is invoked inside the _anymodel_patcher_context block (after cls(**args, **args2)) and preserving mto.enable_huggingface_checkpointing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_eval/lm_eval_hf.py`:
- Around line 74-78: The check for "descriptor is None" is unreachable because
resolve_descriptor_from_pretrained raises ValueError on failure (which is
already caught), so remove the redundant guard and directly return
deci_x_patcher; specifically, in the block where
resolve_descriptor_from_pretrained is called, delete the "if descriptor is None:
return contextlib.nullcontext()" lines and leave "return
deci_x_patcher(model_descriptor=descriptor)" (referencing
resolve_descriptor_from_pretrained and deci_x_patcher to locate the code).
---
Nitpick comments:
In `@examples/llm_eval/lm_eval_hf.py`:
- Around line 142-155: create_from_arg_string currently only enables HF
checkpointing and Puzzletron patching but ignores quantization/sparsity steps
applied by create_from_arg_obj; update create_from_arg_string to call the same
quantize_model and sparsify_model logic (or extract that logic into a shared
helper used by both) after constructing the model but before returning it so
that quantization/sparsity CLI args in args (e.g., any quantize_* or sparsify_*
flags) are honored; reference create_from_arg_string, create_from_arg_obj,
quantize_model, and sparsify_model when locating and reusing the existing
implementation, ensuring the helper is invoked inside the
_anymodel_patcher_context block (after cls(**args, **args2)) and preserving
mto.enable_huggingface_checkpointing behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 90dcb473-d82b-43c9-86b9-031ab5743b41
📒 Files selected for processing (5)
.pre-commit-config.yamlexamples/llm_eval/README.mdexamples/llm_eval/lm_eval_hf.pyexamples/puzzletron/README.mdexamples/puzzletron/evaluation/lm_eval_anymodel.py
💤 Files with no reviewable changes (2)
- .pre-commit-config.yaml
- examples/puzzletron/evaluation/lm_eval_anymodel.py
There was a problem hiding this comment.
What is examples/puzzletron/evaluation/hf_deployable_anymodel.py use for? Is it for evaluation?
There was a problem hiding this comment.
This is a file that enables us to use a secondary evaluation path with nemo evaluator. Nemo evaluator, as is, builds on ray deployment via nemo export-deploy. The deployment script of export-deploy doesn't have the patcher for anymodel built in at the moment
…al_hf.py Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: jrausch <jrausch@nvidia.com>
d378155 to
bf5cbb1
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
examples/llm_eval/lm_eval_hf.py (2)
70-75: Consider broadening exception handling for robustness.Per the relevant code snippet,
resolve_descriptor_from_pretrainedcallsAutoConfig.from_pretrained(), which can raiseOSError,FileNotFoundError, or network-related exceptions beyondValueError/AttributeError. If the intent is to silently fall back to standard HF loading when Puzzletron detection fails for any reason (including transient issues), consider catchingExceptionor at leastOSError. If the intent is to fail loudly on I/O errors, the current behavior is fine.♻️ Optional: broader exception handling
try: descriptor = resolve_descriptor_from_pretrained( pretrained, trust_remote_code=trust_remote_code ) - except (ValueError, AttributeError): + except (ValueError, AttributeError, OSError): return contextlib.nullcontext() return deci_x_patcher(model_descriptor=descriptor)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_eval/lm_eval_hf.py` around lines 70 - 75, The current except block around resolve_descriptor_from_pretrained only catches ValueError and AttributeError but AutoConfig.from_pretrained (called by resolve_descriptor_from_pretrained) can raise OSError/FileNotFoundError or other I/O/network exceptions; update the handler to include those exceptions (e.g., catch OSError or Exception as appropriate) so transient I/O/network errors fall back to contextlib.nullcontext() as intended, while preserving the existing behavior for the Puzzletron-detection path.
140-154: Asymmetric behavior: missingpadding_sideand quantization/sparsity logic.Unlike
create_from_arg_obj, this function:
- Does not set
model_obj.tokenizer.padding_side = "left"- Does not apply quantization (
quant_cfg) or sparsity (sparse_cfg)If this is intentional (e.g., string-based creation is only for Puzzletron checkpoints that don't need post-load processing), please add a docstring clarification. Otherwise, consider aligning the behavior.
♻️ Minimal fix: add padding_side for consistency
with _anymodel_patcher_context(args.get("pretrained"), args.get("trust_remote_code", False)): model_obj = cls(**args, **args2) + model_obj.tokenizer.padding_side = "left" return model_obj🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_eval/lm_eval_hf.py` around lines 140 - 154, create_from_arg_string currently omits the post-load steps present in create_from_arg_obj: it doesn't set model_obj.tokenizer.padding_side = "left" nor apply quantization/sparsity (quant_cfg / sparse_cfg), causing asymmetric behavior; inside the same with _anymodel_patcher_context after instantiating model_obj (the cls(**args, **args2) call), set model_obj.tokenizer.padding_side = "left" (if tokenizer exists) and invoke the same quantization and sparsity application logic used by create_from_arg_obj (or factor that logic into a shared helper and call it here) so string-based creation mirrors object-based creation—if the omission was intentional, instead update the function docstring to explicitly state that padding_side and quant/sparse post-processing are intentionally skipped for Puzzletron checkpoints.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@examples/llm_eval/lm_eval_hf.py`:
- Around line 70-75: The current except block around
resolve_descriptor_from_pretrained only catches ValueError and AttributeError
but AutoConfig.from_pretrained (called by resolve_descriptor_from_pretrained)
can raise OSError/FileNotFoundError or other I/O/network exceptions; update the
handler to include those exceptions (e.g., catch OSError or Exception as
appropriate) so transient I/O/network errors fall back to
contextlib.nullcontext() as intended, while preserving the existing behavior for
the Puzzletron-detection path.
- Around line 140-154: create_from_arg_string currently omits the post-load
steps present in create_from_arg_obj: it doesn't set
model_obj.tokenizer.padding_side = "left" nor apply quantization/sparsity
(quant_cfg / sparse_cfg), causing asymmetric behavior; inside the same with
_anymodel_patcher_context after instantiating model_obj (the cls(**args,
**args2) call), set model_obj.tokenizer.padding_side = "left" (if tokenizer
exists) and invoke the same quantization and sparsity application logic used by
create_from_arg_obj (or factor that logic into a shared helper and call it here)
so string-based creation mirrors object-based creation—if the omission was
intentional, instead update the function docstring to explicitly state that
padding_side and quant/sparse post-processing are intentionally skipped for
Puzzletron checkpoints.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 48f2ed1d-c9e8-493d-b3cd-e10780c0c4b4
📒 Files selected for processing (2)
examples/llm_eval/README.mdexamples/llm_eval/lm_eval_hf.py
✅ Files skipped from review due to trivial changes (1)
- examples/llm_eval/README.md
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1206 +/- ##
======================================================
- Coverage 75.78% 75.71% -0.08%
======================================================
Files 446 446
Lines 47684 47684
======================================================
- Hits 36139 36102 -37
- Misses 11545 11582 +37
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
examples/puzzletron/evaluation/lm_eval_anymodel.pyinto the existingexamples/llm_eval/lm_eval_hf.pyso there is a single evaluation entry pointfor both standard HF and AnyModel/Puzzletron checkpoints.
resolve_descriptor_from_pretrained;the puzzletron extra is optional
Notes
AnyModel auto-detection uses
resolve_descriptor_from_pretrained, which currentlyrelies on a hardcoded
_MODEL_TYPE_TO_DESCRIPTORdict that must be kept in syncmanually with descriptor registrations. This should be addressed in the future.
Summary by CodeRabbit
New Features
Documentation
Chores