A production-grade C implementation of the PCMCI+ algorithm with Python bindings. Sub-millisecond causal graph discovery with SIMD, OpenMP, and Intel MKL acceleration.
PCMCI+ (Runge, 2020) discovers causal relationships in multivariate time series by combining:
- PC algorithm for skeleton discovery (removes spurious correlations)
- MCI (Momentary Conditional Independence) for robust causal strength estimation
This implementation achieves 100x speedup over the reference Python implementation (Tigramite) while providing multiple independence tests for both linear and nonlinear dependencies.
| Test | Function | Detects | Speed | Use Case |
|---|---|---|---|---|
| Partial Correlation | parcorr_test() |
Linear | < 1ms | Default, fast screening |
| CMI | cmi_test() |
Nonlinear | ~1ms | KSG estimator, any dependency |
| Distance Correlation | dcor_test() |
Any | ~10ms | Zero iff independent |
| GPDC | gpdc_test() |
Nonlinear conditional | ~10s | GP regression + dCor |
- Intel MKL / OpenBLAS acceleration
- OpenMP parallelization (skeleton discovery, CI tests)
- Cholesky fast-path for residualization (2-4x faster than QR)
- Lock-free parallel skeleton eliminates thread contention
- Lazy p-value computation skips expensive calculations for weak correlations
- 64-byte aligned memory for SIMD operations
| Variables | Samples | Runtime (8 cores) |
|---|---|---|
| 5 | 500 | ~1 ms |
| 5 | 5000 | ~25 ms |
| 10 | 1000 | ~180 ms |
| 20 | 500 | ~110 ms |
Intel i9-14900KF, MKL 2025, τ_max=3
- CMake 3.16+
- C11 compiler (GCC, Clang, MSVC, or Intel ICX)
- Intel oneAPI (MKL + Compiler) or OpenBLAS
- Python 3.8+ with NumPy (for bindings)
# Open Intel oneAPI Command Prompt
cd PCMCI-Plus
mkdir build && cd build
cmake .. -G "Ninja" -DCMAKE_BUILD_TYPE=Release
cmake --build .
copy pcmci.dll ..\python\# With Intel MKL
source /opt/intel/oneapi/setvars.sh
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
# With OpenBLAS
sudo apt install libopenblas-dev liblapacke-dev
cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_OPENBLAS=ON
make -j$(nproc)brew install openblas libomp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(sysctl -n hw.ncpu)See python/market_analysis.ipynb for a complete walkthrough with real market data, or python/visual_examples.py for visual demonstrations.
import numpy as np
import pcmci
# Causal discovery
data = np.random.randn(5, 1000) # (n_vars, T)
result = pcmci.run_pcmci(data, tau_max=3, alpha=0.05)
for link in result.significant_links:
print(f"X{link.source_var}(t-{link.tau}) → X{link.target_var}(t): r={link.val:.3f}")
# Independence tests
r, p = pcmci.parcorr_test(X, Y, Z) # Linear, conditional
result = pcmci.cmi_test(X, Y, Z, n_perm=100) # Nonlinear, conditional
dc = pcmci.dcor(X, Y) # Distance correlationSee examples/ folder for complete examples:
| Example | Description |
|---|---|
example_basic.c |
Minimal PCMCI+ usage |
example_synthetic.c |
Known ground truth validation |
example_cmi.c |
CMI nonlinear detection |
example_gpdc.c |
GPDC conditional independence |
#include "pcmci.h"
pcmci_dataframe_t* df = pcmci_dataframe_create(data, n_vars, T, tau_max);
pcmci_config_t config = pcmci_default_config();
config.alpha_level = 0.05;
pcmci_result_t* result = pcmci_run(df, &config);
// ... use result->graph, result->n_links ...
pcmci_result_free(result);
pcmci_dataframe_free(df);PCMCI+ answers the question: "Which variables causally influence which others, and at what time lag?"
Given a multivariate time series (e.g., 5 sensors measured over 1000 time points), PCMCI+ outputs a causal graph showing directed edges like:
Temperature(t-2) → Pressure(t) strength=0.45, p<0.001
Humidity(t-1) → Temperature(t) strength=0.32, p<0.01
This is fundamentally different from correlation matrices, which only show undirected associations and cannot distinguish:
- Causation from correlation
- Direct effects from indirect effects
- True relationships from confounded ones
Phase 1: Skeleton Discovery (PC-stable algorithm)
Start with a fully connected graph where every variable at every lag could potentially cause every other variable. Then systematically remove false links:
Initial: All possible links exist
X(t-1)→Y, X(t-2)→Y, X(t-3)→Y, Z(t-1)→Y, Z(t-2)→Y, ...
Step 1: Test each link unconditionally
Remove if X(t-τ) ⊥ Y(t) [p > α]
Step 2: Test remaining links conditioning on 1 variable
Remove if X(t-τ) ⊥ Y(t) | Z(t-1) [p > α]
Step 3: Condition on 2 variables, then 3, ...
Continue until no more links removed
Result: Sparse skeleton with only potentially causal links
The "PC-stable" variant ensures results don't depend on variable ordering.
Phase 2: Momentary Conditional Independence (MCI)
The skeleton may still contain some false positives. MCI refines each link by conditioning on the parents of both endpoints:
For link X(t-τ) → Y(t):
Parents of X: {W(t-1), V(t-2)} # Things that cause X
Parents of Y: {X(t-τ), Z(t-1)} # Things that cause Y (excluding X)
Test: X(t-τ) ⊥ Y(t) | {W(t-1), V(t-2), Z(t-1)}
This removes confounding from:
- Common causes of X and Y
- Indirect paths through other variables
Problem 1: Confounding
True structure: Z → X
Z → Y
(no X→Y link)
Correlation sees: X ↔ Y (spurious!)
PCMCI+ sees: X ⊥ Y | Z (correct: no direct link)
Problem 2: Autocorrelation
Time series are serially correlated: X(t) ≈ X(t-1) ≈ X(t-2)
Naive correlation: Everything correlates with everything
PCMCI+: Conditions on past values, tests only "new" information
Problem 3: Indirect Effects
True structure: A(t-1) → B(t) → C(t+1)
Correlation sees: A correlates with C
PCMCI+ sees: A(t-1) → B(t) → C(t+1), no direct A→C
Problem 4: Multiple Lags
Question: Does X affect Y? At what delay?
PCMCI+ tests each lag separately:
X(t-1) → Y(t): p=0.73 (not significant)
X(t-2) → Y(t): p=0.002 (significant!)
X(t-3) → Y(t): p=0.45 (not significant)
Answer: X causes Y with a 2-step delay
The core operation is testing conditional independence: X ⊥ Y | Z
| Test | Method | Strengths | Limitations |
|---|---|---|---|
| Partial Correlation | Residualize X,Y on Z, correlate residuals | Fast, well-understood | Linear only |
| CMI | k-NN entropy estimation (KSG) | Detects nonlinear, consistent | Slower, needs more data |
| Distance Correlation | Centered distance matrices | Detects any dependence | No conditional version |
| GPDC | GP regression + distance correlation | Nonlinear conditional | O(n³), slow |
When to use which:
- Start with partial correlation (default) — fast, works for most cases
- Use CMI if you suspect nonlinear relationships (Y = X², Y = sin(X))
- Use GPDC for nonlinear confounding (rare, expensive)
tau_max (maximum lag)
- How far back in time to look for causes
- Higher = more thorough but slower (tests grow as τ²)
- Domain knowledge helps: daily data with weekly cycles → τ_max=7
alpha (significance level)
- Threshold for keeping/removing links
- Lower α = fewer false positives, may miss weak true links
- Higher α = more links detected, more false positives
- Default 0.05 is usually reasonable; use 0.01 for stricter control
max_cond_dim (maximum conditioning set size)
- Limits how many variables to condition on simultaneously
- -1 = automatic (up to number of parents)
- Lower values = faster but may miss some confounders
- Usually not needed unless you have many variables (>20)
Output interpretation:
# A significant link means:
# "After controlling for all other potential causes,
# X(t-τ) still predicts Y(t) beyond chance"
for link in result.significant_links:
# link.val: Partial correlation [-1, 1]
# +0.5 = strong positive effect
# -0.3 = moderate negative effect
# ~0 = no direct effect
# link.pval: Probability this is a false positive
# <0.001 = very confident
# <0.05 = significant
# >0.05 = not significant (link removed)Finance & Economics
- Volatility spillovers between markets (does US stress predict EU stress?)
- Lead-lag relationships for trading signals
- Macroeconomic causal chains (rates → inflation → employment)
Neuroscience
- Effective connectivity between brain regions
- Neural information flow from fMRI/EEG
- Stimulus → response causal pathways
Climate Science
- Teleconnections (El Niño → regional weather patterns)
- Attribution of extreme events
- Ocean-atmosphere interactions
Engineering & IoT
- Root cause analysis in sensor networks
- Fault propagation in industrial systems
- Predictive maintenance (which sensor predicts failures?)
Biology
- Gene regulatory network inference
- Metabolic pathway discovery
- Ecological food web dynamics
-
Assumes causal sufficiency: All common causes must be measured. Unmeasured confounders can create false links.
-
Assumes stationarity: The causal structure shouldn't change over time. For regime changes, use rolling windows.
-
Assumes no instantaneous effects at sub-sampling resolution: If X→Y happens faster than your sampling rate, it appears as lag-0 (contemporaneous).
-
Faithfulness assumption: Statistical independence implies causal independence. Pathological cancellations can violate this (rare in practice).
-
Sample size requirements: Need enough data for reliable conditional independence tests. Rule of thumb: T > 50 × (number of variables) × τ_max.
| Method | Handles Lags | Handles Confounding | Nonlinear | Speed |
|---|---|---|---|---|
| Correlation | ❌ | ❌ | ❌ | ⚡ |
| Granger Causality | ✅ | ❌ | ❌ | ⚡ |
| VAR models | ✅ | ❌ | ❌ | ⚡ |
| Transfer Entropy | ✅ | Partial | ✅ | 🐢 |
| CCM (Convergent Cross Mapping) | ✅ | ✅ | ✅ | 🐢 |
| PCMCI+ | ✅ | ✅ | Optional | ⚡ |
PCMCI+ combines the speed of linear methods with the robustness of constraint-based causal discovery.
| Problem | Correlation | PCMCI+ |
|---|---|---|
| Confounding (X ← Z → Y) | Shows spurious X↔Y | Correctly finds X ⊥ Y | Z |
| Autocorrelation | Inflated significance | Conditions on past values |
| Nonlinear dependencies | Misses Y = X² | CMI/dCor detect it |
| Multiple lags | Which lag matters? | Tests each τ separately |
PCMCI-Plus/
├── include/
│ ├── pcmci.h # Main API
│ ├── pcmci_cmi.h # CMI (Conditional Mutual Information)
│ ├── pcmci_gpdc.h # GPDC (GP Distance Correlation)
│ └── pcmci_internal.h # Internal functions
├── src/
│ ├── pcmci.c # Main algorithm
│ ├── parcorr.c # Partial correlation
│ ├── skeleton.c # PC-stable skeleton
│ ├── mci.c # MCI phase
│ ├── cmi.c # CMI with KD-tree
│ └── gpdc.c # GPDC with GP regression
├── python/
│ ├── pcmci.py # High-level API
│ ├── pcmci_bindings.py # ctypes bindings
│ ├── visualize.py # Plotting utilities
│ ├── market_analysis.ipynb # Real data demo
│ ├── extended_analysis.ipynb # Advanced analysis
│ └── visual_examples.py # Showcase visualizations
├── examples/
│ ├── example_basic.c
│ ├── example_synthetic.c
│ ├── example_cmi.c
│ └── example_gpdc.c
├── CMakeLists.txt
└── pcmci.def # Windows DLL exports
export MKL_NUM_THREADS=8
export OMP_NUM_THREADS=8
export KMP_AFFINITY=granularity=fine,compact,1,0result = pcmci.run_pcmci(
data,
tau_max=5, # Maximum lag
alpha=0.05, # Significance level
var_names=['A','B'], # Variable names
use_spearman=True, # Robust to outliers
n_threads=0, # 0 = auto-detect
fdr_method=1, # Benjamini-Hochberg
)PCMCI+ Algorithm:
Runge, J. (2020). Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. UAI 2020, PMLR 124:1388-1397.
CMI Estimator (KSG):
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 066138.
Distance Correlation:
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769-2794.
Tigramite (Reference Implementation):
GNU General Public License v3.0 — see LICENSE.
- Jakob Runge for the PCMCI+ algorithm and Tigramite
- Intel MKL for high-performance linear algebra
- OpenMP for parallelization