Harden precipitation fetch against partial ACIS responses#82
Open
michaelmfoley wants to merge 1 commit into
Open
Harden precipitation fetch against partial ACIS responses#82michaelmfoley wants to merge 1 commit into
michaelmfoley wants to merge 1 commit into
Conversation
Two defensive fixes to the precipitation pipeline. Verified against live ACIS data (2026-06-10) that neither changes current outputs — daily averages are identical to within rounding, and the statewide-dry CSO cohort (precip_48h < 0.05") is unchanged at 1,955 events / 1,987 Mgal. 1. get_MA_precipitation.py: the station loop silently dropped any station whose row count didn't match the full-calendar-year range. ACIS currently pads every station to the requested range, so the guard never fires today — but if ACIS ever returns a short array (partial response, mid-year truncation), the old code would silently discard those stations wholesale. Now edate is capped at today and each station is reindexed to the requested range, so partial responses contribute their valid days. Also stops writing future-dated all-NaN rows for the remainder of the current year (the cached CSV carried 228 such rows). 2. EEA_DP_CSO_map.py: monthly aggregation used .sum(), which returns 0 (not NaN) for a month with no precipitation data — rendering a data gap as "zero rainfall" on the dashboard. min_count=1 makes such months honest gaps. No currently shipped month is affected (all months in the live chart have data); this guards the display if the precip fetch ever falls behind the CSO data window. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two defensive fixes to the precipitation pipeline, found while investigating the dry-weather CSO cohort. Neither changes current outputs — see verification below — but both remove latent failure modes that would silently corrupt the rainfall data feeding the dashboard and any dry-weather analysis.
Changes
get_data/get_MA_precipitation.py— The station loop silently dropped any station whose row count didn't match the full-calendar-year date range. ACIS currently pads every station's response to the requested range (filling future/missing days with'M'), so the guard never fires today — but if ACIS ever returns a short array (partial response, mid-year truncation), the old code would discard those stations wholesale with no warning, silently degrading the statewide average. Now:edateis capped at today, so the current year doesn't request (or write) future-dated rows. The cached CSV currently carries 228 all-NaN rows for the rest of 2026; those go away on the next fetch.analysis/EEA_DP_CSO_map.py— Monthly precipitation aggregation used.sum(), which returns 0 (not NaN) for a month with no data, rendering a data gap as "zero rainfall" on the dashboard's monthly volume + rainfall chart.min_count=1makes such months honest gaps. No currently shipped month is affected (verified all months in the live chart have real values); this guards the display if the precip fetch ever falls behind the CSO data window.Verification (no regression)
Fetched live ACIS data (2026-06-10) and processed the same responses under old and new logic, for 2022–2026:
precip_48h < 0.05"across all 17,732 incidents): unchanged at 1,955 events / 1,987 Mgal; zero events flip classification.Out of scope / follow-up
The larger dry-weather question is methodological, not pipeline: the statewide average smooths localized storms to near-zero, so most "dry" events have substantial operator-reported rainfall at the outfall (
MAEEADP_CSO.rainfallData). Redefining the dry cohort using local precipitation is being pursued separately.Also out of scope: the fetch script only re-fetches from the most recent cached year, so any historical correction would require forcing a full re-fetch.
🤖 Generated with Claude Code