Which historical dataset should I pick?

How to choose the equity and bond return datasets that drive Chance of Success — pick the one that matches what you actually hold.

Last updated May 2026

Why dataset choice matters

The default — US S&P 500 plus US intermediate Treasuries — bakes in 97 years of American market history. That's the most-studied equity market on the planet and a defensible default for US investors. But the US has been an outlier:

Higher real equity returns than the global average since 1900 (~7% real vs ~5% world ex-US, per the DMS Yearbook).
Lower bond volatility than Canada through the 1980 Volcker disinflation.
A weak USD-CAD correlation that doesn't show up in a US-only model.

If your equity sleeve is in XIC, XIU, ZCN, or any broad-Canadian ETF, simulating it with S&P 500 numbers overstates expected return — Canadian equity has trailed US equity over the past 15 years. If your bond sleeve is in XBB or ZAG, simulating it with US Treasury data understates the Canadian rate regime's volatility.

The cost is honest reporting: a 92% success rate computed on the wrong dataset is not the same number as 92% computed on the right one.

The equity datasets

US Large Cap (S&P 500, 1928–2024) — Shiller's canonical US dataset. Pick this for any S&P 500-tracking ETF (VOO, IVV, VFV in CAD-hedged form, etc.). 97 years of monthly data deflated by CPI-U.
International Developed (MSCI EAFE, 1970–2024) — Europe, Australasia, Far East. Pick this for VEA, XEF, ZEA, or any developed-ex-US ETF. Less return-rich than the US, similar volatility.
Emerging Markets (MSCI EM, 1988–2024) — Pick this for VEE, XEC, ZEM. Higher expected return, materially higher volatility — emerging markets are a different risk profile, not a free-lunch return premium.
Global Developed (MSCI World, 1970–2024) — A 65% US / 35% developed-ex-US blend. The right pick if your equity sleeve is XEQT, VGRO, or any global-equity ETF.
Canadian Equity (S&P/TSX Composite, 1956–2024) — Pick this for XIC, XIU, ZCN. Materially less volatile than the S&P 500, lower long-run real return post-2010.
World ex-US, very long run (DMS, 1900–2023) — Stress-test dataset. Spans two world wars, the Great Depression, and 1970s stagflation. Pick this when you want to know how a plan holds up under century-long worst-case sequences, not which dataset matches your portfolio today.

The bond datasets

US Bonds (Ibbotson intermediate Treasury, 1928–2024) — Default. The long-run academic benchmark, matching the engine's legacy bond parameters exactly.
Canadian Bonds (FTSE Canada Universe, 1980–2024) — Pick this for XBB, ZAG, VAB, or any Canadian bond ETF. Backed by an actual annual return series in the codebase, so its (μ, σ) move when the underlying data updates.

Pairing equity and bonds

Match the geography of your portfolio:

TFSA / RRSP holding XEQT + XBB → Global Developed equity / Canadian bonds.
Couch potato portfolio (VAB + VCN + VXC) → Canadian equity / Canadian bonds (the VXC slice is global, but most users are bond-heavy and the bond choice dominates).
US-focused portfolio (VFV + XBB) → US Large Cap / Canadian bonds.
Stress test → DMS World ex-US / Canadian bonds. Picks the worst-supported equity sample to see if the plan holds.

You can mix at will. The simulation correlates returns across equity and bonds using a single shared correlation matrix (-0.05 across all pairs today; the per-dataset correlation work is tracked separately).

What this changes about the success number

Switching from "default US/US" to "Canadian/Canadian" on the same plan typically:

Slightly lowers expected portfolio growth (TSX < S&P 500 since 1980).
Slightly widens the percentile bands (Canadian bonds are more volatile than US Treasuries in the 1980s sample).
Often shifts the success number by 2–5 percentage points. Sometimes more if the plan is fragile.

If the success number drops materially after switching datasets, that's the right signal — it means your prior estimate was overstating the US's historical favourability for your actual holdings.

Where to find the numbers

Every dataset chip in the Methodology dropdown shows its source URL inline (Shiller, MSCI factsheets, FTSE Russell, UBS / DMS Yearbook). The Methodology page at /app/methodology lists every dataset with its period, resolution, and citation in one place.

If the source publishes updated data, the registry updates automatically for array-backed datasets (Shiller, FTSE Canada bonds) and on a manual cadence for citation-only datasets (MSCI, DMS).

Inflation by region

The dataset choice covers the return side. The plan's inflation rate is a separate input (plan.globalInflationRate) that deflates nominal returns and grows expenses. The Methodology page lists the published long-run inflation rates by region so you can plug the right number in for your portfolio:

US CPI-U (1913–2024) — long-run arithmetic mean ~3.1%, post-1990 mean ~2.5%. Default if your portfolio is US-centric.
Canadian CPI (1915–2024) — long-run mean ~3.0%, post-1991 (Bank of Canada targeting era) mean ~1.9%. Use ~2.0% for forward-looking Canadian plans.
World inflation (DMS, 1900–2023) — long-run mean ~4.4%. Elevated by hyperinflation regimes (Weimar, 1990s former-Soviet, ongoing emerging-market shocks). Use for globally-diversified portfolios or stress-tests.

Pick the inflation rate that matches the currency your expenses are denominated in, not the dataset region — a Canadian living in Canada holding XEQT should pair Canadian inflation (~2.0%) with the Global Developed equity dataset, because their bills land in CAD.

The engine treats inflation as a single scalar today. Per-year stochastic inflation sampling (paired with correlated real-return draws) is tracked as future work.

Still stuck? Email support@havenfinance.app.