Combined Forecast + Threshold Backtest
Can 1,200+ suburb-level threshold features improve the existing capital growth forecast? Walk-forward backtested from 2013 to 2024.

1. Data Flow Diagram
The combined model merges three data systems. The existing Random Forest forecast produces suburb-level growth predictions. Univariate threshold scores from 1,221 suburb-level metrics and 137 synthetic composite indices provide additional features. Both are combined through walk-forward backtesting.

End-to-end data pipeline from raw sources through feature assembly to backtested results
Pipeline Stage Details
| Stage | Source | Input | Output | Description |
|---|---|---|---|---|
| 1. Forecast Data | fj.parquet | 4,268,395 | 1,366,457 | Filter to house 2yr, SAL-type markets, non-excluded suburbs. 6,492 unique suburbs across 216 months. |
| 2. Feature Selection | fields_ttests.parquet + selected_fields.parquet | 21,850 t-test rows | 179 features selected | 42 univariate fields from t-test significance (p < 0.05) and production selection. 137 synthetic composite fields. |
| 3. Feature Assembly | 1,221 thresh_ts + 274 synth_figures parquets | ~192K-2.1M per file | 1,614,841 x 221 | Load value and rank columns. Forward-fill sparse census features (3 dates) to monthly. Join with forecast predictions. |
| 4. Walk-Forward Backtest | Merged feature matrix | 1,366,457 x 226 | 674,868 test predictions | Two training windows (2011, 2016). Correlation filter, RF importance filter (top 50), then RandomForest(depth=6, n=200). |
The pipeline processes 4.3 million forecast rows, filtering to 1.37 million house 2-year predictions across 6,492 suburbs. It loads 42 univariate and 137 synthetic threshold features, forward-fills sparse census data (available at only 3 dates: 2011, 2016, 2021), and assembles a 221-column feature matrix for walk-forward backtesting.
2. Overall Performance Comparison

Top-N alpha: outperformance vs national average for both models
The original forecast model produces higher alpha at the top of the ranking (top 5 and top 20 picks). However, the combined model achieves a higher overall prediction correlation with actual growth (0.240 vs 0.213). This means the threshold features improve prediction accuracy across all suburbs, but the original model is better at identifying the extreme outperformers. The original forecast remains the superior stock-picking tool.
3. Rolling 12-Month Alpha

Rolling 12-month top-20 alpha: original forecast vs combined model
The original forecast (blue) consistently sits above the combined model (dark blue) across most periods. Both models track similar trends, confirming that the threshold features capture the same underlying market dynamics but with less precision at the extreme tails.
4. Decile Performance

Mean actual growth vs national average by prediction decile (1 = best predicted)
| Decile | Original | Combined |
|---|---|---|
| 1 | +2.74% | +2.50% |
| 2 | +1.16% | +1.77% |
| 3 | +0.65% | +1.19% |
| 4 | +0.26% | +0.65% |
| 5 | -0.10% | +0.21% |
| 6 | -0.31% | -0.21% |
| 7 | -0.52% | -0.65% |
| 8 | -0.70% | -1.20% |
| 9 | -1.11% | -1.73% |
| 10 | -2.07% | -2.48% |
The decile chart reveals a key pattern. The combined model has a steeper gradient from top to bottom (2.50% to -2.48%), while the original model has a flatter but higher-at-top profile (2.74% to -2.07%). The combined model better separates good from bad suburbs in the middle deciles, but the original model picks slightly better at the very top.
5. Feature Importance (Residual Model)
The residual model predicts what the forecast gets wrong. These features explain the gap between forecast predictions and actual outcomes.

Top features by Random Forest importance in the residual prediction model
| Feature | Importance |
|---|---|
| census_ferry_rnk | 0.4625 |
| prop_list_price_0_5_growth_3yr_rent_house | 0.2089 |
| prop_list_price_0_5_rent_house | 0.1095 |
| prop_list_price_0_5_growth_10yr_buy_house | 0.0938 |
| prop_list_price_0_75_growth_1yr_buy_house | 0.0377 |
| census_overseas_born_parents_val | 0.0151 |
| census_public_housing | 0.0127 |
| census_males_labour_force_15_19 | 0.0105 |
| census_speaks_indo_aryan_languages | 0.0043 |
| prop_list_price_0_5_buy_house | 0.0033 |
The census ferry rank feature dominates (0.46 importance in window 2). This percentile rank of ferry usage within each capital city region proxies for harbour-adjacent suburbs in Sydney and Melbourne. The next most important features relate to rental growth (3-year rent growth, current rent levels) and long-term price history (10-year buy growth). These threshold features capture fundamentals that the hedonic-based forecast model misses.
6. Cumulative Excess Growth

$100K invested in top-20 picks: excess growth above national average
7. Year-by-Year Comparison
| Year | Original Top 20 | Combined Top 20 | Original Top 5 | Combined Top 5 |
|---|---|---|---|---|
| 2013 | +5.3% | +5.7% | +7.0% | +5.9% |
| 2014 | +8.6% | +8.3% | +10.7% | +10.8% |
| 2015 | +7.9% | +5.8% | +7.9% | +6.9% |
| 2016 | +4.9% | +2.8% | +12.5% | +7.5% |
| 2018 | +8.1% | +5.0% | +11.0% | +7.4% |
| 2019 | +3.9% | +3.6% | +4.5% | +3.8% |
| 2020 | +4.0% | +4.2% | +5.8% | +1.9% |
| 2021 | +10.5% | +7.6% | +14.9% | +12.0% |
| 2022 | +5.9% | +6.2% | +5.8% | +6.9% |
| 2023 | +7.3% | +4.7% | +9.7% | +6.3% |
| 2024 | +6.1% | +4.9% | +8.6% | +8.1% |
8. Walk-Forward Window Results
| Window | Train | Test | Features | OOB | Corr (Orig) | Corr (Comb) | Alpha (Orig) | Alpha (Comb) |
|---|---|---|---|---|---|---|---|---|
| 2011-03 to 2013-03:2016-02 | 233,137 | 227,481 | 24 | 0.1136 | 0.2046 | 0.2289 | 7.32% | 6.41% |
| 2016-03 to 2018-03:2024-02 | 612,159 | 426,600 | 36 | 0.1796 | 0.2181 | 0.2474 | 6.88% | 5.23% |
The residual model predicts forecast errors using threshold features only (no access to the forecast prediction itself). It then adjusts the original forecast up or down based on what it learns. OOB scores of 0.11-0.18 show that threshold features explain 11-18% of the variance in forecast errors. This is meaningful signal that could be used to flag suburbs where the forecast may be too optimistic or too pessimistic.
Methodology
Walk-forward backtesting with two training windows (train to 2011-03 and 2016-03). Features selected within each training window to prevent look-ahead bias. Missing values imputed with training set medians only.
Feature selection: (1) drop features with more than 50% missing, (2) correlation filter removing one of pairs with |r| greater than 0.95, (3) RF importance filter keeping top 50 features.
Two model approaches tested per window: (A) Direct model predicting y using forecast pred + threshold features, (B) Residual model predicting (y - pred) using threshold features only. The residual (boosted) approach is used as the combined model.
Growth values are log growth relative to the national average. Alpha = exp(mean(y)) - 1 for top-N picks. Backtest period: March 2013 to February 2024. 654,081 valid test observations across 6,492 suburbs.
Microburbs Suburb Forecasting
Capital growth predictions powered by 1,200+ suburb-level indicators