Skip to content

C++ search#210

Draft
ms609 wants to merge 593 commits intomainfrom
cpp-search
Draft

C++ search#210
ms609 wants to merge 593 commits intomainfrom
cpp-search

Conversation

@ms609
Copy link
Copy Markdown
Owner

@ms609 ms609 commented Mar 19, 2026

  • other optimizations + features

Manual testing underway; shiny app in particular has some usability issues.

@ms609 ms609 marked this pull request as draft March 25, 2026 14:21
ms609 added 29 commits March 26, 2026 09:32
Strategy tuning tasks:
- T-254: drift MPT diversity experiment (validate before reducing)
- T-255: reduce drift in default preset (blocked on T-254)
- T-256: increase sectorial search intensity
- T-257: post-ratchet sectorial search pass (blocked on T-256)
- T-258: intra-replicate fusing
- T-259: ratchet cycle count experiment
- T-260: per-evaluation overhead profiling
Compared driftCycles=0 vs driftCycles=2 on Wortley2006 (37t), Zhu2013
(75t), Geisler2001 (68t). 3 seeds × 30s+120s budgets, with and without
consensus stopping.

Key findings:
- Zero score benefit (identical best scores on all datasets/seeds)
- Zero MPT benefit (no-drift finds 4 MPTs on Wortley2006 vs 1-3 with drift)
- Zero diversity benefit (mean RF identical: 7.3 vs 7.4 on Geisler2001)
- 10-22% replicate loss from drift time consumption (15-19% of wall time)
- With consensus stopping, drift delays convergence without improving answer

Unblocks T-255 (reduce/eliminate drift in default and thorough presets).
Analysis: dev/benchmarks/drift_mpt_analysis.md
T-254 confirmed drift provides zero benefit on all metrics (score, MPT
count, topological diversity) while consuming 15-19% of wall time and
reducing replicates by 10-22%.

Changes:
- SearchControl() default: driftCycles 2 -> 0
- default preset: driftCycles 2 -> 0
- thorough preset: driftCycles 12 -> 0
- sprint and large presets: already 0, no change
- Updated roxygen docs: removed drift from strategy descriptions
- Updated test-SearchControl.R: expected default 0L

Drift search infrastructure (ts_drift.h/.cpp, SearchControl params)
remains available for explicit use via SearchControl(driftCycles = N).
devtools::check_man() loaded the old installed package via
load_installed, writing stale n_mc=5000L into PrepareDataProfile.Rd
and StepInformation.Rd. Corrected to match source (100000L).

Updated spelling wordlist: added LCM, TREE's, speedup; removed
28 stale entries.
test-ts-parallel.R 'Parallel search respects timeout' used Vinther2008
(23 tips) which completes all replicates in <1s on fast ARM64 hardware,
so timed_out was never set. Switched to Agnarsson2004 (62 tips) where
each replicate takes ~100ms, making timeout reliable.
T-243: FlatBlock metadata, flat EW indirect functions, TBR prefetch
Add intraFuse parameter to SearchControl/DrivenParams. When enabled,
each outer cycle fuses the current replicate tree against pool donors
after TBR polish, approximating TNT's within-replicate fusing pattern.

Implementation:
- run_single_replicate() now accepts const TreePool* for read-only
  pool access during intra-replicate fusing
- Fuse fires after TBR polish (step 6b) with max_rounds=3
- State arrays rebuilt after fuse via build_postorder()+reset_states()
- Disabled in parallel mode (nullptr pool; between-replicate fusing
  via ThreadSafePool is already active)

Fixed segfault: tree_fuse() internal TBR leaves state arrays in a
form incompatible with subsequent pipeline phases. Explicit
reset_states() after fuse resolves this.

Tests: 5 new tests including dataset-size-change regression test.
2 configs × 5 datasets × 5 seeds × 30s = 50 runs (~30 min).
T-248 reduced annealCycles from 3 to 1 in the large preset but missed
this test assertion. GHA 23590522833 caught it.
Hamilton HPC and r-package-profiling skills at ~/.positai/skills/
should be loaded via skill() tool before relevant work.
Top 3 hotspots at 88 tips (Dikow2009, EW, 1000 TBR passes):
1. StateSnapshot save/restore: 14.6% (memcpy ~190KB/candidate)
2. reset_states zeroing + tip reload: 9.1%
3. fitch_na_score: 29.2% (expected, core algorithm)

Non-scoring overhead = 37.8% of TBR time.
Combined fix potential ~16-19%.

Write-up: dev/benchmarks/vtune_tbr_analysis.md
Driver: dev/vtune-tbr-driver.R
T-261: Eliminate std::fill zeroing in reset_states (~3.9% savings)
T-262: Reduce load_tip_states scope (~2-3% savings)
T-263: Selective StateSnapshot save/restore (~10-12% savings)

Combined potential: ~16-19% TBR wall time reduction.
Move state_snap.save() and save_topology() from the per-candidate
path (called ~5000 times per 30s run) to the top of each while-loop
iteration (called ~1+n_accepted times, typically ~50).

After a rejected move, state_snap.restore() returns the tree to exactly
the state at the start of the pass. The per-candidate save was redundant:
consecutive rejections all restore to the same snapshot.

The while loop restarts after each acceptance, triggering a fresh save
with the new state. No behavioral change to accept/reject logic.

VTune showed save at 2.15s/30s (7% of TBR time). This reduces save calls
by ~99%, estimated savings ~2s = ~10% of TBR time on 88-tip datasets.

1424 targeted tests pass (TBR search, driven, symmetry, fuse, constraint,
parallel, anneal).
T-261: Remove all 5 std::fill(0) calls from reset_states().
Every array entry read by score_tree/fitch_na_score is written before
read: prelim by pass 1, final_ by pass 2, down2 by pass 3,
subtree_actives by pass 1+3, local_cost by pass 1.

T-262: Replace element-by-element tip copy loop with bulk memcpy
for prelim and final_ arrays (contiguous in memory).

Interleaved A/B benchmark (Dikow2009, 88t, 5 pairs):
  Baseline median: 14.92s
  Optimized median: 13.64s
  Speedup: 8.6%

1059 targeted tests pass (469 scoring/search + 590 fuse/parallel/etc).
Score verification: Vinther2008=83(NJ), Longrich2010=131, DeAssis2011=64.
T-258 added intraFuse but didn't re-run roxygenise.
Fixes codoc mismatch WARNING in R CMD check.
T-258 added intraFuse but didn't re-run roxygenise.
Fixes codoc mismatch WARNING in R CMD check.
ms609 and others added 30 commits March 29, 2026 14:23
 TBR clip-ordering strategy (SearchControl clipOrder)
When concordance mode uses the dataset (qc/mcc/spc/clc/phc) and the
plotted tree's taxa don't match the dataset's names, return NULL rather
than passing mismatched objects to QuartetConcordance/LabelSplits etc.
This prevents the `mat[i, j, ..., drop=FALSE]: subscript out of bounds`
crash when loading trees with non-overlapping taxa (T-293) or reloading
a dataset with removed taxa (T-300).

Remove T-292, T-293, T-300 from to-do.md (all fixed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant