Skip to content

add CalAdapt WRF download function for CMIP6 hourly met data#3967

Open
divine7022 wants to merge 19 commits into
PecanProject:developfrom
divine7022:caladapt-wrf
Open

add CalAdapt WRF download function for CMIP6 hourly met data#3967
divine7022 wants to merge 19 commits into
PecanProject:developfrom
divine7022:caladapt-wrf

Conversation

@divine7022
Copy link
Copy Markdown
Member

@divine7022 divine7022 commented Apr 29, 2026

Description

adds download.CalAdaptWRF() to PEcAn.data.atmosphere new met driver that pulls hourly WRF dynamically downscaled CMIP6 projections from the Cal-Adapt Analytics Engine (WUS-D3 dataset, Rahimi et al. 2024). Data sits on public AWS S3, no auth needed

this is implemented as a part of CCMMF where we need future climate forcing at ~200 California sites for SIPNET runs under multiple GCMs and SSPs.

  • looked at how pecan handles met downloads and this follows the same pattern as CRUNCEP/GFDL, the download function does everything (fetch, extract, convert, write CF) in one shot, so we skip the met2CF and extract.nc stages. Main reason: WRF uses a Lambert Conformal grid and extract.nc / closest_xy assume lat-lon grids with NARR-style bounds, so they can't handle this projection. Adding CalAdaptWRF to skip from list in met.process.R was the ryt direction

  • the tricky part is that pecan's papply calls download.CalAdaptWRF once per site, it never sees the full site list. Naively that means re-reading the same WRF grid from S3 for every site. The 45 km grid is small (~20-30 MB per var per year), so we cache the full grid as rds in tempdir() on the first site. Sites 2 through N just do
    readRDS() and extract their grid cell locally. For 200 sites x 8 vars x 20 years, that cuts S3 round trips from 32,000 to 160. The cache auto cleans when R exits

  • added caladapt_wrf column to pecan_standard_met_table

  • 9 output variables total:

    • direct (no conversion): air_temperature, air_pressure, shortwave, longwave, eastward_wind, northward_wind
    • converted: precipitation_flux (mm/hr -> kg/m2/s), specific_humidity (mixing ratio -> q/(1+q))
    • derived: wind_speed (sqrt(u10^2 + v10^2))
  • data coverage:

    • 8 GCMs under SSP3-7.0: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1
    • CESM2 also has SSP2-4.5 and SSP5-8.5
    • 1980-2100, hourly, 45 km (d01)
    • 3 models store precip as rainc + rainnc components instead of a single prec field; but caladaptR handles that transparently

pipeline that orchestrates R (caladaptaer) and .sh scripts, along with the data, is here at /projectnb/dietzelab/ccmmf/ensemble/CalAdapt_runs/
for 198 design points × 3 GCM/SSP scenarios (CESM2.ssp245, CESM2.ssp370, MPI-ESM1-2-HR.ssp370) × 2025–2045 hourly; if anyone wants to poke at the outputs

Motivation and Context

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Copy link
Copy Markdown
Member

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great first pass. I've made a few comments; a few key points:

It seems download.CalAdaptWRF() is doing both the download and conversion standardization (q2 transformation, derived wind_speed).

It isn't clear if it is an intentional design choice, but the standard way to handle met ingest is to first download and then standardize. i.e download.CalAdaptWRF() responsible for downloading and met2CF.CalAdaptWRF() responsible for conversion to PEcAn standard met. That seems more consistent with PEcAn’s pattern of deriving standardized variables in the met conversion layer rather than in the data interface itself.

Two other points - 1) it seems like it would be useful to put conversion functions for q = f(q2), wind = f(uwind, vwind), and precip = f(rainc, rainnc) into metutils.R and call those. and 2) I think that the conversion precip = f(rainc, rainnc) can be removed from caladaptaer in order to keep that package's scope to data access (with apologies for previously suggesting otherwise, that was before we were developing a dedicated package intended for the caladaptae ecosystem).


Availability: 1980--2100

Notes: CMIP6 dynamically downscaled projections from the Cal-Adapt Analytics Engine (WUS-D3 dataset, Rahimi et al. 2024). Eight GCMs available under SSP3-7.0; CESM2 also has SSP2-4.5 and SSP5-8.5. Data is publicly available on AWS S3 (no authentication required). Requires the `caladaptaer` package from GitHub. To use this option, set `source` as `CalAdaptWRF` and specify `model` and `scenario` in the `met` section of `pecan.xml`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add historical reanalysis

</met>
```

Available GCMs: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1. See `caladaptaer::cae_models(activity = "WRF")` for the current list.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Be more specific, e.g.
Suggested change
Available GCMs: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1. See `caladaptaer::cae_models(activity = "WRF")` for the current list.
Available GCMs: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1. See `caladaptaer::cae_models(activity = "WRF")` for the current list of available climate models.
  1. What does 'activity' do?
  2. add reference to availability of reanalysis data

#' WRF grids are cached in tempdir() so that when met.process calls this for
#' multiple sites in the same R session, each grid is only fetched from S3 once.
#' For 200 sites x 8 vars x 20 years that cuts S3 round trips from 32,000 to 160.
#'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add references, including links to docs and Rahimi paper

document availablity of downscaled ERA5 reanalysis

#' @param verbose Extra debug output? Default FALSE
#' @param ... further arguments, currently ignored
#'
#' @return invisible data.frame with file info for BETY registration
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @return invisible data.frame with file info for BETY registration
#' @return invisible data.frame with file information.

I think it can return the standardized table without implying that the only intent is for BETY registration.

#' @param outfolder Directory for storing output
#' @param start_date Start date for met data
#' @param end_date End date for met data
#' @param site_id BETY site id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this param can be defined as unique identifier for site without reference to BETYdb.
Is it still necessary / useful to explicitly support BETYdb with this?

Comment on lines +213 to +216
lat_dim <- ncdf4::ncdim_def("latitude", "degree_north",
lat.in, create_dimvar = TRUE)
lon_dim <- ncdf4::ncdim_def("longitude", "degree_east",
lon.in, create_dimvar = TRUE)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these write the site lat/lon rather than the lat lon from the source data. if we want to support potentially multiple sites mapping to the same met input, should this be handled differently here?

Comment on lines +75 to +76
lat.in <- as.numeric(lat.in)
lon.in <- as.numeric(lon.in)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worthwhile to add a check that these are inside domain of dataset? I'm not sure if this is worth the computational cost of a point in polygon check, but if there is a bounding box it should be efficient to do lat < maxlat & lat > minlat type checks. If there isn't a bounding box, perhaps caladaptaer should create one. Either way, out of domain lat/lon should be handled gracefully.

saveRDS(grid, cache_file)
}

# grab time dimension and build the projected point once
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CalAdapt gaurantee that, for a given model/scenario/resolution combination, time_vals and grid will be consistent? is the grid the same across all data at the same resolution?

Comment on lines +58 to +60
if (is.null(model)) model <- "CESM2"
if (is.null(scenario)) scenario <- "ssp370"
if (is.null(resolution)) resolution <- "d01"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (is.null(model)) model <- "CESM2"
if (is.null(scenario)) scenario <- "ssp370"
if (is.null(resolution)) resolution <- "d01"
model <- model %||% "CESM2"
scenario <- scenario %||% "ssp370"
resolution <- resolution %||% "d01"

#'
#' Fetches hourly WRF dynamically downscaled data from the Cal-Adapt Analytics
#' Engine (CADCAT S3 bucket) via caladaptaer, extracts the nearest grid cell to
#' the site, converts units to CF-1.8, and writes one NetCDF per year.
Copy link
Copy Markdown
Member

@dlebauer dlebauer May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be strictly CF-1.8 compliant, files need a global metadata attribute Conventions = CF-1.8.

start_time = year_start,
end_time = year_end
)
saveRDS(grid, cache_file)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this safe if run in parallel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants