add CalAdapt WRF download function for CMIP6 hourly met data by divine7022 · Pull Request #3967 · PecanProject/pecan

divine7022 · 2026-04-29T20:50:22Z

Description

adds download.CalAdaptWRF() to PEcAn.data.atmosphere new met driver that pulls hourly WRF dynamically downscaled CMIP6 projections from the Cal-Adapt Analytics Engine (WUS-D3 dataset, Rahimi et al. 2024). Data sits on public AWS S3, no auth needed

this is implemented as a part of CCMMF where we need future climate forcing at ~200 California sites for SIPNET runs under multiple GCMs and SSPs.

looked at how pecan handles met downloads and this follows the same pattern as CRUNCEP/GFDL, the download function does everything (fetch, extract, convert, write CF) in one shot, so we skip the met2CF and extract.nc stages. Main reason: WRF uses a Lambert Conformal grid and extract.nc / closest_xy assume lat-lon grids with NARR-style bounds, so they can't handle this projection. Adding CalAdaptWRF to skip from list in met.process.R was the ryt direction
the tricky part is that pecan's papply calls download.CalAdaptWRF once per site, it never sees the full site list. Naively that means re-reading the same WRF grid from S3 for every site. The 45 km grid is small (~20-30 MB per var per year), so we cache the full grid as rds in tempdir() on the first site. Sites 2 through N just do
readRDS() and extract their grid cell locally. For 200 sites x 8 vars x 20 years, that cuts S3 round trips from 32,000 to 160. The cache auto cleans when R exits
added caladapt_wrf column to pecan_standard_met_table
9 output variables total:
- direct (no conversion): air_temperature, air_pressure, shortwave, longwave, eastward_wind, northward_wind
- converted: precipitation_flux (mm/hr -> kg/m2/s), specific_humidity (mixing ratio -> q/(1+q))
- derived: wind_speed (sqrt(u10^2 + v10^2))
data coverage:
- 8 GCMs under SSP3-7.0: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1
- CESM2 also has SSP2-4.5 and SSP5-8.5
- 1980-2100, hourly, 45 km (d01)
- 3 models store precip as rainc + rainnc components instead of a single prec field; but caladaptR handles that transparently

pipeline that orchestrates R (caladaptaer) and .sh scripts, along with the data, is here at /projectnb/dietzelab/ccmmf/ensemble/CalAdapt_runs/
for 198 design points × 3 GCM/SSP scenarios (CESM2.ssp245, CESM2.ssp370, MPI-ESM1-2-HR.ssp370) × 2025–2045 hourly; if anyone wants to poke at the outputs

Motivation and Context

Review Time Estimate

Immediately
Within one week
When possible

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation.
My name is in the list of CITATION.cff
I agree that PEcAn Project may distribute my contribution under any or all of
- the same license as the existing code,
- and/or the BSD 3-clause license.
I have updated the CHANGELOG.md.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

This reverts commit 48742b1.

This reverts commit cab2ce8.

dlebauer

This is a great first pass. I've made a few comments; a few key points:

It seems download.CalAdaptWRF() is doing both the download and conversion standardization (q2 transformation, derived wind_speed).

It isn't clear if it is an intentional design choice, but the standard way to handle met ingest is to first download and then standardize. i.e download.CalAdaptWRF() responsible for downloading and met2CF.CalAdaptWRF() responsible for conversion to PEcAn standard met. That seems more consistent with PEcAn’s pattern of deriving standardized variables in the met conversion layer rather than in the data interface itself.

Two other points - 1) it seems like it would be useful to put conversion functions for q = f(q2), wind = f(uwind, vwind), and precip = f(rainc, rainnc) into metutils.R and call those. and 2) I think that the conversion precip = f(rainc, rainnc) can be removed from caladaptaer in order to keep that package's scope to data access (with apologies for previously suggesting otherwise, that was before we were developing a dedicated package intended for the caladaptae ecosystem).

dlebauer · 2026-05-13T23:58:17Z

+
+Availability: 1980--2100
+
+Notes: CMIP6 dynamically downscaled projections from the Cal-Adapt Analytics Engine (WUS-D3 dataset, Rahimi et al. 2024). Eight GCMs available under SSP3-7.0; CESM2 also has SSP2-4.5 and SSP5-8.5. Data is publicly available on AWS S3 (no authentication required). Requires the `caladaptaer` package from GitHub. To use this option, set `source` as `CalAdaptWRF` and specify `model` and `scenario` in the `met` section of `pecan.xml`:


add historical reanalysis

dlebauer · 2026-05-14T00:03:23Z

+</met>
+```
+
+Available GCMs: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1. See `caladaptaer::cae_models(activity = "WRF")` for the current list.


Be more specific, e.g.

Suggested change

Available GCMs: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1. See `caladaptaer::cae_models(activity = "WRF")` for the current list.

Available GCMs: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1. See `caladaptaer::cae_models(activity = "WRF")` for the current list of available climate models.

What does 'activity' do?

add reference to availability of reanalysis data

dlebauer · 2026-05-14T00:04:14Z

+#' WRF grids are cached in tempdir() so that when met.process calls this for
+#' multiple sites in the same R session, each grid is only fetched from S3 once.
+#' For 200 sites x 8 vars x 20 years that cuts S3 round trips from 32,000 to 160.
+#'


add references, including links to docs and Rahimi paper

document availablity of downscaled ERA5 reanalysis

dlebauer · 2026-05-14T00:11:41Z

+#' @param verbose Extra debug output? Default FALSE
+#' @param ... further arguments, currently ignored
+#'
+#' @return invisible data.frame with file info for BETY registration


Suggested change

#' @return invisible data.frame with file info for BETY registration

#' @return invisible data.frame with file information.

I think it can return the standardized table without implying that the only intent is for BETY registration.

dlebauer · 2026-05-14T00:11:46Z

+#' @param outfolder Directory for storing output
+#' @param start_date Start date for met data
+#' @param end_date End date for met data
+#' @param site_id BETY site id


I think this param can be defined as unique identifier for site without reference to BETYdb.
Is it still necessary / useful to explicitly support BETYdb with this?

dlebauer · 2026-05-14T01:00:41Z

+    lat_dim  <- ncdf4::ncdim_def("latitude", "degree_north",
+                                  lat.in, create_dimvar = TRUE)
+    lon_dim  <- ncdf4::ncdim_def("longitude", "degree_east",
+                                  lon.in, create_dimvar = TRUE)


these write the site lat/lon rather than the lat lon from the source data. if we want to support potentially multiple sites mapping to the same met input, should this be handled differently here?

dlebauer · 2026-05-14T01:04:04Z

+  lat.in <- as.numeric(lat.in)
+  lon.in <- as.numeric(lon.in)


is it worthwhile to add a check that these are inside domain of dataset? I'm not sure if this is worth the computational cost of a point in polygon check, but if there is a bounding box it should be efficient to do lat < maxlat & lat > minlat type checks. If there isn't a bounding box, perhaps caladaptaer should create one. Either way, out of domain lat/lon should be handled gracefully.

dlebauer · 2026-05-14T01:06:17Z

+        saveRDS(grid, cache_file)
+      }
+
+      # grab time dimension and build the projected point once


Does CalAdapt gaurantee that, for a given model/scenario/resolution combination, time_vals and grid will be consistent? is the grid the same across all data at the same resolution?

dlebauer · 2026-05-14T01:12:05Z

+  if (is.null(model))      model <- "CESM2"
+  if (is.null(scenario))   scenario <- "ssp370"
+  if (is.null(resolution)) resolution <- "d01"


Suggested change

if (is.null(model)) model <- "CESM2"

if (is.null(scenario)) scenario <- "ssp370"

if (is.null(resolution)) resolution <- "d01"

model <- model %||% "CESM2"

scenario <- scenario %||% "ssp370"

resolution <- resolution %||% "d01"

dlebauer · 2026-05-14T01:14:08Z

+#'
+#' Fetches hourly WRF dynamically downscaled data from the Cal-Adapt Analytics
+#' Engine (CADCAT S3 bucket) via caladaptaer, extracts the nearest grid cell to
+#' the site, converts units to CF-1.8, and writes one NetCDF per year.


to be strictly CF-1.8 compliant, files need a global metadata attribute Conventions = CF-1.8.

dlebauer · 2026-05-14T01:18:04Z

+          start_time = year_start,
+          end_time   = year_end
+        )
+        saveRDS(grid, cache_file)


is this safe if run in parallel?

divine7022 added 19 commits April 15, 2026 21:17

add download.CalAdaptWRF for Cal-Adapt WRF met data

2eae265

add CalAdaptWRF registration xml

a7902f7

add caladapt_wrf column to pecan_standard_met_table

5e7cad0

add CalAdaptWRF to met.process skip list

a0d783e

add caladaptR and stars to Suggests

73cd4e1

add Cal-Adapt WRF section to met driver docs

b4a1ed2

add CalAdaptWRF entry to NEWS.md

ddc97bb

add CalAdaptWRF entry to CHANGELOG.md

e0f4357

update NAMESPACE and man pages for CalAdaptWRF

ab55816

swap caladaptR for caladaptaer in deps

c22bf37

point download fn at cae_fetch

4910644

regen Rd for caladaptaer rename

06dbbe4

bump NEWS to caladaptaer

3e3e7a7

update met docs for caladaptaer

4b327db

swap rds cache for netcdf with lazy read

48742b1

let callers pick the WRF grid resolution

1380bf0

use ud_convert for hour to seconds

cab2ce8

Revert "swap rds cache for netcdf with lazy read"

849fc7e

This reverts commit 48742b1.

Revert "use ud_convert for hour to seconds"

c5189a6

This reverts commit cab2ce8.

github-actions Bot added modules documentation labels Apr 29, 2026

divine7022 requested a review from dlebauer April 29, 2026 21:38

divine7022 mentioned this pull request Apr 29, 2026

add CalAdapt WRF download function for CMIP6 hourly met data divine7022/pecan#3

Closed

14 tasks

dlebauer requested changes May 14, 2026

View reviewed changes

dlebauer reviewed May 14, 2026

View reviewed changes

anshul23102 mentioned this pull request May 16, 2026

fix(utils): accept NULL as synonym for NA in read.output start/end year #3987

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add CalAdapt WRF download function for CMIP6 hourly met data#3967

add CalAdapt WRF download function for CMIP6 hourly met data#3967
divine7022 wants to merge 19 commits into
PecanProject:developfrom
divine7022:caladapt-wrf

divine7022 commented Apr 29, 2026 •

edited

Loading

Uh oh!

dlebauer left a comment •

edited

Loading

Uh oh!

dlebauer May 13, 2026

Uh oh!

dlebauer May 14, 2026

Uh oh!

dlebauer May 14, 2026

Uh oh!

dlebauer May 14, 2026

Uh oh!

dlebauer May 14, 2026

Uh oh!

dlebauer May 14, 2026

Uh oh!

dlebauer May 14, 2026

Uh oh!

dlebauer May 14, 2026

Uh oh!

dlebauer May 14, 2026

Uh oh!

dlebauer May 14, 2026 •

edited

Loading

Uh oh!

dlebauer May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Availability: 1980--2100

		Notes: CMIP6 dynamically downscaled projections from the Cal-Adapt Analytics Engine (WUS-D3 dataset, Rahimi et al. 2024). Eight GCMs available under SSP3-7.0; CESM2 also has SSP2-4.5 and SSP5-8.5. Data is publicly available on AWS S3 (no authentication required). Requires the `caladaptaer` package from GitHub. To use this option, set `source` as `CalAdaptWRF` and specify `model` and `scenario` in the `met` section of `pecan.xml`:

	Available GCMs: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1. See `caladaptaer::cae_models(activity = "WRF")` for the current list.
	Available GCMs: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1. See `caladaptaer::cae_models(activity = "WRF")` for the current list of available climate models.

	#' @return invisible data.frame with file info for BETY registration
	#' @return invisible data.frame with file information.

Conversation

divine7022 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Review Time Estimate

Types of changes

Checklist:

Uh oh!

dlebauer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dlebauer May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

divine7022 commented Apr 29, 2026 •

edited

Loading

dlebauer left a comment •

edited

Loading

dlebauer May 14, 2026 •

edited

Loading