Skip to content

fix: comprehensive multi_patch improvements preventing byte offset corruption#3355

Open
Mustaqeem66 wants to merge 5 commits into
tailcallhq:mainfrom
Mustaqeem66:fix/multi-patch-offset-corruption
Open

fix: comprehensive multi_patch improvements preventing byte offset corruption#3355
Mustaqeem66 wants to merge 5 commits into
tailcallhq:mainfrom
Mustaqeem66:fix/multi-patch-offset-corruption

Conversation

@Mustaqeem66
Copy link
Copy Markdown

Summary

Comprehensive multi_patch improvements that fix issue #3249 and 7 related issues. This PR adds safety critical features including unique match validation, overlap detection, atomic writes, and robust matching layers.

Changes

Phase 1 - Safety Critical

  • Unique match validation: Count all matches, error if > 1 (unless replace_all)
  • Overlap detection: Validate no overlapping edits before applying
  • Atomic write: Temp file + rename for crash safety
  • Memory rollback: Restore original if verification fails
  • Better errors: Clear messages with file path and search context

Phase 2 - Robustness

  • Whitespace normalization: Line-based mapping handles LLM whitespace differences
  • Fuzzy matching: 0.90 threshold catches near-matches without false positives
  • 3-layer fallback: exact → whitespace → fuzzy

Issues Fixed

Issue Title
#3249 multi_patch corrupts file when sequential edits shift byte offsets
#3182 Apply edits in reverse order
#2815 Duplicate old_string matches cause unpredictable behavior
#2773 Silent edit drops on failure
#2997 Patches applied to wrong location
#3115 Fuzzy matching for old_string
#3291 Whitespace differences cause match failure

Testing

30+ new tests covering:

  • Overlap detection
  • Unique match validation
  • Whitespace normalization
  • Fuzzy matching
  • Atomic writes
  • Edge cases

Algorithm

  1. Read original file content
  2. Find all edit positions in ORIGINAL content (3-layer matching)
  3. Validate: no overlaps, unique matches
  4. Sort edits by position DESCENDING (bottom-to-top)
  5. Apply edits in reverse order
  6. Write atomically (temp + rename + verify)

This fix addresses .forge.db corruption issues in ForgeCode by:

1. Startup WAL Recovery:
   - Checkpoints any leftover WAL from previous crashed sessions
   - Runs database integrity check on startup
   - Ensures data is recovered before new session starts

2. Auto-Checkpoint Threshold Reduced:
   - Changed from 1000 to 100 frames (~5MB max instead of ~50MB)
   - Prevents massive WAL files during long sessions

3. Async Checkpoint Method:
   - Added checkpoint_async() for graceful shutdown scenarios
   - Uses pool-based connection (async-safe)

4. Drop Checkpoint:
   - Checkpoints WAL when DatabasePool is dropped
   - Logs warnings if fails (expected on force-kill)

5. Comprehensive Tests:
   - test_checkpoint_method_exists
   - test_drop_calls_checkpoint
   - test_in_memory_pool_has_checkpoint
   - test_checkpoint_truncates_wal
   - test_wal_recovery_on_startup
   - test_async_checkpoint_method
   - test_autocheckpoint_threshold_reduced

Fixes tailcallhq#3260 related corruption issues by preventing WAL accumulation
and ensuring data integrity on startup.

Co-authored-by: Mustaqeem66 <ageisnode@gmail.com>
Phase 1 - Safety Critical:
- Add unique match validation (count all matches, error if > 1)
- Add overlap detection with validation
- Add atomic write with temp file + rename
- Add verification and memory-based rollback
- Add better error messages with file path

Phase 2 - Robustness:
- Add line-based whitespace normalization
- Add line-window fuzzy matching with 0.90 threshold
- Add 3-layer fallback chain (exact -> whitespace -> fuzzy)

Key improvements:
- Reverse-order application (already done)
- Unique match validation prevents silent wrong replacements
- Overlap detection rejects logically impossible edits
- Atomic write prevents half-written files
- Whitespace normalization handles LLM whitespace differences
- Fuzzy matching catches near-matches
- Better error messages with file path

Tests added:
- 30+ new tests covering all features

Fixes: tailcallhq#3249, tailcallhq#3182, tailcallhq#2815, tailcallhq#2773, tailcallhq#2997, tailcallhq#3115, tailcallhq#3291

Co-authored-by: Mustaqeem66 <ageisnode@gmail.com>
@github-actions github-actions Bot added the type: fix Iterations on existing features or infrastructure. label May 18, 2026
Added line:column information to overlap error messages for better debugging.
This helps users identify exactly where overlapping edits occur in their files.

Co-authored-by: Mustaqeem66 <ageisnode@gmail.com>
Changed multi_patch edit application to use pre-computed positions
(plan.position and plan.old_len) instead of re-searching in modified content.

This ensures byte offset corruption cannot happen since we're using
exact positions from the original content rather than fresh searches.

Co-authored-by: Mustaqeem66 <ageisnode@gmail.com>
@github-actions
Copy link
Copy Markdown

Action required: PR inactive for 5 days.
Status update or closure in 10 days.

@github-actions github-actions Bot added the state: inactive No current action needed/possible; issue fixed, out of scope, or superseded. label May 24, 2026
Copy link
Copy Markdown
Author

@Mustaqeem66 Mustaqeem66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I find this issues in this repo and i have fixed this can you please review this and push to the repo so that the issue should be fixed in the repo.
Regards
Muhammad Mustaqeem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

state: inactive No current action needed/possible; issue fixed, out of scope, or superseded. type: fix Iterations on existing features or infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant