Combined Forecast + Threshold Backtest

Can 1,200+ suburb-level threshold features improve the existing capital growth forecast? Walk-forward backtested from 2013 to 2024.

7.0%

Original Top-20 Alpha

5.6%

Combined Top-20 Alpha

+0.027

Correlation Improvement

221

Threshold Features

Luke Metcalfe

Founder & Chief Data Scientist

15+ years in property data analytics

1. Data Flow Diagram

The combined model merges three data systems. The existing Random Forest forecast produces suburb-level growth predictions. Univariate threshold scores from 1,221 suburb-level metrics and 137 synthetic composite indices provide additional features. Both are combined through walk-forward backtesting.

End-to-end data pipeline from raw sources through feature assembly to backtested results

Pipeline Stage Details

Stage	Source	Input	Output	Description
1. Forecast Data	fj.parquet	4,268,395	1,366,457	Filter to house 2yr, SAL-type markets, non-excluded suburbs. 6,492 unique suburbs across 216 months.
2. Feature Selection	fields_ttests.parquet + selected_fields.parquet	21,850 t-test rows	179 features selected	42 univariate fields from t-test significance (p < 0.05) and production selection. 137 synthetic composite fields.
3. Feature Assembly	1,221 thresh_ts + 274 synth_figures parquets	~192K-2.1M per file	1,614,841 x 221	Load value and rank columns. Forward-fill sparse census features (3 dates) to monthly. Join with forecast predictions.
4. Walk-Forward Backtest	Merged feature matrix	1,366,457 x 226	674,868 test predictions	Two training windows (2011, 2016). Correlation filter, RF importance filter (top 50), then RandomForest(depth=6, n=200).

The pipeline processes 4.3 million forecast rows, filtering to 1.37 million house 2-year predictions across 6,492 suburbs. It loads 42 univariate and 137 synthetic threshold features, forward-fills sparse census data (available at only 3 dates: 2011, 2016, 2021), and assembles a 221-column feature matrix for walk-forward backtesting.

2. Overall Performance Comparison

9.3%

Original Top-5 Alpha

6.9%

Combined Top-5 Alpha

7.0%

Original Top-20 Alpha

5.6%

Combined Top-20 Alpha

81%

Original Hit Rate

79%

Combined Hit Rate

0.213

Original Correlation

0.240

Combined Correlation

Top-N alpha: outperformance vs national average for both models

The original forecast model produces higher alpha at the top of the ranking (top 5 and top 20 picks). However, the combined model achieves a higher overall prediction correlation with actual growth (0.240 vs 0.213). This means the threshold features improve prediction accuracy across all suburbs, but the original model is better at identifying the extreme outperformers. The original forecast remains the superior stock-picking tool.

3. Rolling 12-Month Alpha

Rolling 12-month top-20 alpha: original forecast vs combined model

The original forecast (blue) consistently sits above the combined model (dark blue) across most periods. Both models track similar trends, confirming that the threshold features capture the same underlying market dynamics but with less precision at the extreme tails.

4. Decile Performance

Mean actual growth vs national average by prediction decile (1 = best predicted)

Decile	Original	Combined
1	+2.74%	+2.50%
2	+1.16%	+1.77%
3	+0.65%	+1.19%
4	+0.26%	+0.65%
5	-0.10%	+0.21%
6	-0.31%	-0.21%
7	-0.52%	-0.65%
8	-0.70%	-1.20%
9	-1.11%	-1.73%
10	-2.07%	-2.48%

The decile chart reveals a key pattern. The combined model has a steeper gradient from top to bottom (2.50% to -2.48%), while the original model has a flatter but higher-at-top profile (2.74% to -2.07%). The combined model better separates good from bad suburbs in the middle deciles, but the original model picks slightly better at the very top.

5. Feature Importance (Residual Model)

The residual model predicts what the forecast gets wrong. These features explain the gap between forecast predictions and actual outcomes.

Top features by Random Forest importance in the residual prediction model

Feature	Importance
census_ferry_rnk	0.4625
prop_list_price_0_5_growth_3yr_rent_house	0.2089
prop_list_price_0_5_rent_house	0.1095
prop_list_price_0_5_growth_10yr_buy_house	0.0938
prop_list_price_0_75_growth_1yr_buy_house	0.0377
census_overseas_born_parents_val	0.0151
census_public_housing	0.0127
census_males_labour_force_15_19	0.0105
census_speaks_indo_aryan_languages	0.0043
prop_list_price_0_5_buy_house	0.0033

The census ferry rank feature dominates (0.46 importance in window 2). This percentile rank of ferry usage within each capital city region proxies for harbour-adjacent suburbs in Sydney and Melbourne. The next most important features relate to rental growth (3-year rent growth, current rent levels) and long-term price history (10-year buy growth). These threshold features capture fundamentals that the hedonic-based forecast model misses.

6. Cumulative Excess Growth

$100K invested in top-20 picks: excess growth above national average

7. Year-by-Year Comparison

Year	Original Top 20	Combined Top 20	Original Top 5	Combined Top 5
2013	+5.3%	+5.7%	+7.0%	+5.9%
2014	+8.6%	+8.3%	+10.7%	+10.8%
2015	+7.9%	+5.8%	+7.9%	+6.9%
2016	+4.9%	+2.8%	+12.5%	+7.5%
2018	+8.1%	+5.0%	+11.0%	+7.4%
2019	+3.9%	+3.6%	+4.5%	+3.8%
2020	+4.0%	+4.2%	+5.8%	+1.9%
2021	+10.5%	+7.6%	+14.9%	+12.0%
2022	+5.9%	+6.2%	+5.8%	+6.9%
2023	+7.3%	+4.7%	+9.7%	+6.3%
2024	+6.1%	+4.9%	+8.6%	+8.1%

8. Walk-Forward Window Results

Window	Train	Test	Features	OOB	Corr (Orig)	Corr (Comb)	Alpha (Orig)	Alpha (Comb)
2011-03 to 2013-03:2016-02	233,137	227,481	24	0.1136	0.2046	0.2289	7.32%	6.41%
2016-03 to 2018-03:2024-02	612,159	426,600	36	0.1796	0.2181	0.2474	6.88%	5.23%

The residual model predicts forecast errors using threshold features only (no access to the forecast prediction itself). It then adjusts the original forecast up or down based on what it learns. OOB scores of 0.11-0.18 show that threshold features explain 11-18% of the variance in forecast errors. This is meaningful signal that could be used to flag suburbs where the forecast may be too optimistic or too pessimistic.

Methodology

Walk-forward backtesting with two training windows (train to 2011-03 and 2016-03). Features selected within each training window to prevent look-ahead bias. Missing values imputed with training set medians only.

Feature selection: (1) drop features with more than 50% missing, (2) correlation filter removing one of pairs with |r| greater than 0.95, (3) RF importance filter keeping top 50 features.

Two model approaches tested per window: (A) Direct model predicting y using forecast pred + threshold features, (B) Residual model predicting (y - pred) using threshold features only. The residual (boosted) approach is used as the combined model.

Growth values are log growth relative to the national average. Alpha = exp(mean(y)) - 1 for top-N picks. Backtest period: March 2013 to February 2024. 654,081 valid test observations across 6,492 suburbs.

Microburbs Suburb Forecasting

Capital growth predictions powered by 1,200+ suburb-level indicators

Explore Microburbs Original Backtest Report

Combined Forecast + Threshold Backtest

Can 1,200+ suburb-level threshold features improve the existing capital growth forecast? Walk-forward backtested from 2013 to 2024.

7.0%

Original Top-20 Alpha

5.6%

Combined Top-20 Alpha

+0.027

Correlation Improvement

221

Threshold Features

Luke Metcalfe

Founder & Chief Data Scientist

15+ years in property data analytics

1. Data Flow Diagram

End-to-end data pipeline from raw sources through feature assembly to backtested results

Pipeline Stage Details

Stage	Source	Input	Output	Description
1. Forecast Data	fj.parquet	4,268,395	1,366,457	Filter to house 2yr, SAL-type markets, non-excluded suburbs. 6,492 unique suburbs across 216 months.
2. Feature Selection	fields_ttests.parquet + selected_fields.parquet	21,850 t-test rows	179 features selected	42 univariate fields from t-test significance (p < 0.05) and production selection. 137 synthetic composite fields.
3. Feature Assembly	1,221 thresh_ts + 274 synth_figures parquets	~192K-2.1M per file	1,614,841 x 221	Load value and rank columns. Forward-fill sparse census features (3 dates) to monthly. Join with forecast predictions.
4. Walk-Forward Backtest	Merged feature matrix	1,366,457 x 226	674,868 test predictions	Two training windows (2011, 2016). Correlation filter, RF importance filter (top 50), then RandomForest(depth=6, n=200).

2. Overall Performance Comparison

9.3%

Original Top-5 Alpha

6.9%

Combined Top-5 Alpha

7.0%

Original Top-20 Alpha

5.6%

Combined Top-20 Alpha

81%

Original Hit Rate

79%

Combined Hit Rate

0.213

Original Correlation

0.240

Combined Correlation

Top-N alpha: outperformance vs national average for both models

3. Rolling 12-Month Alpha

Rolling 12-month top-20 alpha: original forecast vs combined model

4. Decile Performance

Mean actual growth vs national average by prediction decile (1 = best predicted)

Decile	Original	Combined
1	+2.74%	+2.50%
2	+1.16%	+1.77%
3	+0.65%	+1.19%
4	+0.26%	+0.65%
5	-0.10%	+0.21%
6	-0.31%	-0.21%
7	-0.52%	-0.65%
8	-0.70%	-1.20%
9	-1.11%	-1.73%
10	-2.07%	-2.48%

5. Feature Importance (Residual Model)

The residual model predicts what the forecast gets wrong. These features explain the gap between forecast predictions and actual outcomes.

Top features by Random Forest importance in the residual prediction model

Feature	Importance
census_ferry_rnk	0.4625
prop_list_price_0_5_growth_3yr_rent_house	0.2089
prop_list_price_0_5_rent_house	0.1095
prop_list_price_0_5_growth_10yr_buy_house	0.0938
prop_list_price_0_75_growth_1yr_buy_house	0.0377
census_overseas_born_parents_val	0.0151
census_public_housing	0.0127
census_males_labour_force_15_19	0.0105
census_speaks_indo_aryan_languages	0.0043
prop_list_price_0_5_buy_house	0.0033

6. Cumulative Excess Growth

$100K invested in top-20 picks: excess growth above national average

7. Year-by-Year Comparison

Year	Original Top 20	Combined Top 20	Original Top 5	Combined Top 5
2013	+5.3%	+5.7%	+7.0%	+5.9%
2014	+8.6%	+8.3%	+10.7%	+10.8%
2015	+7.9%	+5.8%	+7.9%	+6.9%
2016	+4.9%	+2.8%	+12.5%	+7.5%
2018	+8.1%	+5.0%	+11.0%	+7.4%
2019	+3.9%	+3.6%	+4.5%	+3.8%
2020	+4.0%	+4.2%	+5.8%	+1.9%
2021	+10.5%	+7.6%	+14.9%	+12.0%
2022	+5.9%	+6.2%	+5.8%	+6.9%
2023	+7.3%	+4.7%	+9.7%	+6.3%
2024	+6.1%	+4.9%	+8.6%	+8.1%

8. Walk-Forward Window Results

Window	Train	Test	Features	OOB	Corr (Orig)	Corr (Comb)	Alpha (Orig)	Alpha (Comb)
2011-03 to 2013-03:2016-02	233,137	227,481	24	0.1136	0.2046	0.2289	7.32%	6.41%
2016-03 to 2018-03:2024-02	612,159	426,600	36	0.1796	0.2181	0.2474	6.88%	5.23%

Methodology

Feature selection: (1) drop features with more than 50% missing, (2) correlation filter removing one of pairs with |r| greater than 0.95, (3) RF importance filter keeping top 50 features.

Microburbs Suburb Forecasting

Capital growth predictions powered by 1,200+ suburb-level indicators

Explore Microburbs Original Backtest Report