Skip to content

Harden precipitation fetch against partial ACIS responses#82

Open
michaelmfoley wants to merge 1 commit into
mainfrom
fix/precip-partial-year-coverage
Open

Harden precipitation fetch against partial ACIS responses#82
michaelmfoley wants to merge 1 commit into
mainfrom
fix/precip-partial-year-coverage

Conversation

@michaelmfoley

Copy link
Copy Markdown
Collaborator

Summary

Two defensive fixes to the precipitation pipeline, found while investigating the dry-weather CSO cohort. Neither changes current outputs — see verification below — but both remove latent failure modes that would silently corrupt the rainfall data feeding the dashboard and any dry-weather analysis.

Changes

get_data/get_MA_precipitation.py — The station loop silently dropped any station whose row count didn't match the full-calendar-year date range. ACIS currently pads every station's response to the requested range (filling future/missing days with 'M'), so the guard never fires today — but if ACIS ever returns a short array (partial response, mid-year truncation), the old code would discard those stations wholesale with no warning, silently degrading the statewide average. Now:

  • edate is capped at today, so the current year doesn't request (or write) future-dated rows. The cached CSV currently carries 228 all-NaN rows for the rest of 2026; those go away on the next fetch.
  • Each station is reindexed to the requested range, so a partial response contributes its valid days instead of being dropped entirely.

analysis/EEA_DP_CSO_map.py — Monthly precipitation aggregation used .sum(), which returns 0 (not NaN) for a month with no data, rendering a data gap as "zero rainfall" on the dashboard's monthly volume + rainfall chart. min_count=1 makes such months honest gaps. No currently shipped month is affected (verified all months in the live chart have real values); this guards the display if the precip fetch ever falls behind the CSO data window.

Verification (no regression)

Fetched live ACIS data (2026-06-10) and processed the same responses under old and new logic, for 2022–2026:

  • Daily statewide averages: identical where both are defined (max |diff| = 0.000").
  • Station coverage: identical per year (the length guard never fires against current ACIS responses).
  • Statewide-dry CSO cohort (precip_48h < 0.05" across all 17,732 incidents): unchanged at 1,955 events / 1,987 Mgal; zero events flip classification.
  • Cached CSV values match a fresh fetch to within 0.008" on overlapping 2026 days.

Out of scope / follow-up

The larger dry-weather question is methodological, not pipeline: the statewide average smooths localized storms to near-zero, so most "dry" events have substantial operator-reported rainfall at the outfall (MAEEADP_CSO.rainfallData). Redefining the dry cohort using local precipitation is being pursued separately.

Also out of scope: the fetch script only re-fetches from the most recent cached year, so any historical correction would require forcing a full re-fetch.

🤖 Generated with Claude Code

Two defensive fixes to the precipitation pipeline. Verified against live
ACIS data (2026-06-10) that neither changes current outputs — daily
averages are identical to within rounding, and the statewide-dry CSO
cohort (precip_48h < 0.05") is unchanged at 1,955 events / 1,987 Mgal.

1. get_MA_precipitation.py: the station loop silently dropped any
   station whose row count didn't match the full-calendar-year range.
   ACIS currently pads every station to the requested range, so the
   guard never fires today — but if ACIS ever returns a short array
   (partial response, mid-year truncation), the old code would silently
   discard those stations wholesale. Now edate is capped at today and
   each station is reindexed to the requested range, so partial
   responses contribute their valid days. Also stops writing
   future-dated all-NaN rows for the remainder of the current year
   (the cached CSV carried 228 such rows).

2. EEA_DP_CSO_map.py: monthly aggregation used .sum(), which returns 0
   (not NaN) for a month with no precipitation data — rendering a
   data gap as "zero rainfall" on the dashboard. min_count=1 makes such
   months honest gaps. No currently shipped month is affected (all
   months in the live chart have data); this guards the display if the
   precip fetch ever falls behind the CSO data window.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant