ML Meta-Model Optimisation

Systematic hyperparameter and feature search using OOB (out-of-bag) predictions as the honest out-of-sample objective

120 HP combos, 18 feature subsets, 20-step forward selection. 1,178,824 training rows.

Luke Metcalfe

Founder & Chief Data Scientist

15+ years in property data analytics

OOB Top-20 Alpha

9.27%

Honest out-of-sample

OOB Top-5 Alpha

11.51%

Out-of-bag predictions

OOB Hit Rate

92.0%

Top-20 picks above national

Overfitting Gap

0.31pp

In-sample 9.58% vs OOB 9.27%

Why OOB matters. Random Forest's out-of-bag predictions give each sample a prediction from only trees that did not see it during training. This is a genuine out-of-sample metric, unlike the previous in-sample results (where RF Deep showed 10.81% alpha but was heavily overfitted). The optimised model achieves 9.27% OOB alpha-20 with only a 0.31 percentage point gap to in-sample, confirming minimal overfitting.

Improvement over baselines. Original forecast: 5.99% OOB top-20 alpha. Simple sum model (10 fields): 7.17%. Optimised RF: 9.27%. The ML model adds 3.28pp of genuine out-of-sample alpha over the original forecast and 2.10pp over the simple sum model.

Overfitting Gap

The gap between in-sample alpha and OOB alpha measures how much the model memorises training data rather than learning genuine patterns. A gap of 0.31pp (9.58% vs 9.27%) is very small, confirming the model generalises well. By contrast, the previous RF Deep model (depth 10) had an in-sample alpha of 10.81% but its OOB R squared of 0.20 suggests its true alpha would be much lower.

Hyperparameter Grid Search

120 combinations of max_depth (3-8), min_samples_leaf (20-500), and max_features (0.3-1.0) were tested. Depth 8 dominates the top positions. Using 70% of features per split (max_features=0.7) achieves the best OOB alpha.

Top 10 Configurations by OOB Alpha-20

Depth	Min Leaf	Max Feat	OOB R2	Top-5	Top-20	Hit Rate	Corr
8	20	0.7	0.1779	10.58%	8.64%	90.7%	0.4266
8	50	0.7	0.1769	10.23%	8.55%	90.5%	0.4252
8	20	0.5	0.1794	10.11%	8.50%	90.1%	0.4316
8	50	1.0	0.1699	10.25%	8.44%	90.5%	0.4136
8	20	1.0	0.1712	10.67%	8.43%	90.3%	0.4152
8	100	0.7	0.1756	9.93%	8.42%	90.2%	0.4236
7	20	0.7	0.1638	8.42%	7.87%	88.2%	0.4107
7	50	0.7	0.1636	8.30%	7.80%	87.8%	0.4104
6	20	0.7	0.1498	7.34%	7.28%	87.4%	0.3946
6	20	0.5	0.1498	7.17%	7.10%	86.4%	0.3944

Feature Subset Search

Tested 18 feature subsets: top-N by importance, domain-grouped (price, growth, signals), and exclusion sets. The top-10 features by importance achieve the best OOB alpha at 8.72%, slightly above all 20 features at 8.64%. Removing synthetic features drops alpha by 0.47pp, confirming they provide genuine orthogonal signal.

Subset	Features	OOB R2	OOB Alpha-20	Hit Rate
top-10 (by importance)	10	0.1714	8.72%	90.9%
top-20 (all)	20	0.1774	8.64%	90.6%
no_low_coverage	20	0.1779	8.64%	90.7%
top-12	12	0.1751	8.60%	90.4%
top-15	15	0.1775	8.56%	90.3%
no_census	18	0.1715	8.43%	90.4%
top-7	7	0.1616	8.19%	89.5%
no_synth / univariate_only	15	0.1735	8.17%	88.5%
top-5	5	0.1541	7.98%	88.6%
top-4	4	0.1419	7.11%	85.1%
signals+forecast	6	0.0922	7.00%	85.2%
price+forecast	7	0.1521	6.76%	83.6%
growth+forecast	5	0.1444	6.54%	83.1%
forecast_only	1	0.0512	5.82%	78.3%
top-3	3	0.1289	5.73%	79.5%

Greedy Forward Selection (OOB Alpha)

Starting from forecast_pred alone, features were added one at a time, always picking the feature that maximises OOB top-20 alpha. Peak performance at 13 features (9.24%). The forward selection chose very different features from the importance ranking: synthetic composites (transport, urban heat, cultural integration) were selected early despite low standalone importance, because they provide orthogonal signal that the forecast does not capture.

Step	Feature Added	Total	OOB R2	OOB Alpha-20	Hit Rate
0	(forecast_pred only)	1	0.0512	5.82%	78.3%
1	synth_transport_ecosystem	2	0.0860	6.90%	86.8%
2	synth_environment_urban_heat	3	0.0974	7.82%	89.7%
3	buy_3yr_growth	4	0.1027	8.31%	89.9%
4	buy_price	5	0.1098	8.73%	91.1%
5	census_public_housing	6	0.1155	8.84%	90.9%
6	house_vacancy_rate	7	0.1201	9.02%	91.1%
7	synth_cultural_integration	8	0.1206	9.16%	91.9%
8	stock_on_market	9	0.1209	9.19%	92.4%
9	rent_price	10	0.1227	9.20%	92.3%
10	census_overseas_born	11	0.1256	9.21%	91.8%
11	synth_business_innovation	12	0.1255	9.19%	91.8%
12	months_of_supply	13	0.1253	9.24%	92.0%
13	mib_perc_renters	14	0.1289	9.06%	91.3%
14	owner_occupied	15	0.1292	8.99%	90.9%
15	rent_3yr_growth	16	0.1411	8.79%	91.0%
16	pct_sold_at_loss	17	0.1561	8.78%	90.5%
17	buy_10yr_growth	18	0.1785	8.89%	90.9%
18	buy_1yr_growth_75	19	0.1778	8.76%	90.9%
19	synth_dev_infrastructure	20	0.1774	8.63%	90.5%

Note on OOB R squared vs OOB alpha divergence. After step 12 (13 features), OOB R squared keeps rising (from 0.13 to 0.18 at all 20 features) while OOB alpha declines (from 9.24% to 8.63%). This happens because the high-importance growth features (10yr growth, rent growth, distress) improve overall R squared across all suburbs but reduce the model's ability to identify the very best suburbs. They add noise to the top-N ranking even though they improve average prediction accuracy.

Optimised Model: Feature Importance

In the final 13-feature model, the forecast prediction accounts for 44% of importance. Synthetic transport ecosystem (20%) and urban heat (11%) are the second and third most important. These spatial composites capture suburb-level characteristics that the pure price/growth features miss.

Forecast

Univariate

Synthetic

forecast_pred

44.1%

transport_ecosystem_complete

19.9%

environment_urban_heat

10.6%

price_0_5_buy_house

5.4%

price_0_5_growth_3_year_buy_house

4.4%

house_vacancy_rate

4.2%

public_housing

3.5%

price_0_5_rent_house

2.9%

overseas_born_parents

2.2%

cultural_integration_ecosystem

1.2%

business_innovation_capacity

1.2%

stock_on_market

0.4%

months_of_supply

0.1%

Tree Count Sensitivity

Diminishing returns beyond 200 trees. OOB R squared plateaus at 0.178 and alpha-20 stabilises around 8.7%. 300 trees is a good balance of accuracy and training speed.

Trees	OOB R2	OOB Top-5	OOB Top-20	Hit Rate
50	0.1761	10.57%	8.44%	90.1%
100	0.1776	10.65%	8.64%	90.5%
150	0.1778	10.67%	8.63%	90.6%
200	0.1779	10.58%	8.64%	90.7%
300	0.1781	10.70%	8.74%	91.0%
400	0.1782	10.68%	8.74%	91.0%
500	0.1782	10.69%	8.76%	91.0%

Decile Performance (OOB)

Suburbs ranked by OOB predictions (predictions from trees that did not train on each suburb). A clear monotonic gradient from bottom to top decile confirms the model's out-of-sample ranking ability.

Rolling Alpha (OOB vs Original)

12-month rolling average of OOB top-20 alpha vs the original forecast. The optimised model consistently outperforms the original across all time periods.

Methodology

Optimisation Objective

Primary metric: OOB top-20 alpha (out-of-bag predictions ranked per date)
OOB: each sample predicted only by trees that did not see it in training
Genuinely out-of-sample, no separate validation set needed
Also tracked: OOB R squared, OOB hit rate, OOB correlation

Search Strategy

Search 1: 120-combo HP grid (depth x leaf x max_features)
Search 2: 18 feature subsets (importance, domain, exclusion)
Search 3: n_estimators sensitivity (50 to 500)
Search 4: Greedy forward feature selection (20 steps)

Optimised Model

Random Forest: depth=8, leaf=20, max_features=0.7, 500 trees
13 features (forward selection optimum)
OOB R squared: 0.1251 (on 13-feature subset)
Overfitting gap: 0.31pp (in-sample 9.58% vs OOB 9.27%)

Data

1,178,824 training rows (house, 2-year, SAL markets with actuals)
6,492 suburbs across 192 forecast dates
Target: log growth relative to national average
Missing values filled with column median

Next Step: Walk-Forward Backtesting

OOB provides a good estimate of generalisation. Walk-forward backtesting with true temporal holdout will give the definitive out-of-sample result.

View Simple Models View Backtested Model

ML Meta-Model Optimisation

Systematic hyperparameter and feature search using OOB (out-of-bag) predictions as the honest out-of-sample objective

120 HP combos, 18 feature subsets, 20-step forward selection. 1,178,824 training rows.

Luke Metcalfe

Founder & Chief Data Scientist

15+ years in property data analytics

OOB Top-20 Alpha

9.27%

Honest out-of-sample

OOB Top-5 Alpha

11.51%

Out-of-bag predictions

OOB Hit Rate

92.0%

Top-20 picks above national

Overfitting Gap

0.31pp

In-sample 9.58% vs OOB 9.27%

Overfitting Gap

Hyperparameter Grid Search

Top 10 Configurations by OOB Alpha-20

Depth	Min Leaf	Max Feat	OOB R2	Top-5	Top-20	Hit Rate	Corr
8	20	0.7	0.1779	10.58%	8.64%	90.7%	0.4266
8	50	0.7	0.1769	10.23%	8.55%	90.5%	0.4252
8	20	0.5	0.1794	10.11%	8.50%	90.1%	0.4316
8	50	1.0	0.1699	10.25%	8.44%	90.5%	0.4136
8	20	1.0	0.1712	10.67%	8.43%	90.3%	0.4152
8	100	0.7	0.1756	9.93%	8.42%	90.2%	0.4236
7	20	0.7	0.1638	8.42%	7.87%	88.2%	0.4107
7	50	0.7	0.1636	8.30%	7.80%	87.8%	0.4104
6	20	0.7	0.1498	7.34%	7.28%	87.4%	0.3946
6	20	0.5	0.1498	7.17%	7.10%	86.4%	0.3944

Feature Subset Search

Subset	Features	OOB R2	OOB Alpha-20	Hit Rate
top-10 (by importance)	10	0.1714	8.72%	90.9%
top-20 (all)	20	0.1774	8.64%	90.6%
no_low_coverage	20	0.1779	8.64%	90.7%
top-12	12	0.1751	8.60%	90.4%
top-15	15	0.1775	8.56%	90.3%
no_census	18	0.1715	8.43%	90.4%
top-7	7	0.1616	8.19%	89.5%
no_synth / univariate_only	15	0.1735	8.17%	88.5%
top-5	5	0.1541	7.98%	88.6%
top-4	4	0.1419	7.11%	85.1%
signals+forecast	6	0.0922	7.00%	85.2%
price+forecast	7	0.1521	6.76%	83.6%
growth+forecast	5	0.1444	6.54%	83.1%
forecast_only	1	0.0512	5.82%	78.3%
top-3	3	0.1289	5.73%	79.5%

Greedy Forward Selection (OOB Alpha)

Step	Feature Added	Total	OOB R2	OOB Alpha-20	Hit Rate
0	(forecast_pred only)	1	0.0512	5.82%	78.3%
1	synth_transport_ecosystem	2	0.0860	6.90%	86.8%
2	synth_environment_urban_heat	3	0.0974	7.82%	89.7%
3	buy_3yr_growth	4	0.1027	8.31%	89.9%
4	buy_price	5	0.1098	8.73%	91.1%
5	census_public_housing	6	0.1155	8.84%	90.9%
6	house_vacancy_rate	7	0.1201	9.02%	91.1%
7	synth_cultural_integration	8	0.1206	9.16%	91.9%
8	stock_on_market	9	0.1209	9.19%	92.4%
9	rent_price	10	0.1227	9.20%	92.3%
10	census_overseas_born	11	0.1256	9.21%	91.8%
11	synth_business_innovation	12	0.1255	9.19%	91.8%
12	months_of_supply	13	0.1253	9.24%	92.0%
13	mib_perc_renters	14	0.1289	9.06%	91.3%
14	owner_occupied	15	0.1292	8.99%	90.9%
15	rent_3yr_growth	16	0.1411	8.79%	91.0%
16	pct_sold_at_loss	17	0.1561	8.78%	90.5%
17	buy_10yr_growth	18	0.1785	8.89%	90.9%
18	buy_1yr_growth_75	19	0.1778	8.76%	90.9%
19	synth_dev_infrastructure	20	0.1774	8.63%	90.5%

Optimised Model: Feature Importance

Forecast

Univariate

Synthetic

forecast_pred

44.1%

transport_ecosystem_complete

19.9%

environment_urban_heat

10.6%

price_0_5_buy_house

5.4%

price_0_5_growth_3_year_buy_house

4.4%

house_vacancy_rate

4.2%

public_housing

3.5%

price_0_5_rent_house

2.9%

overseas_born_parents

2.2%

cultural_integration_ecosystem

1.2%

business_innovation_capacity

1.2%

stock_on_market

0.4%

months_of_supply

0.1%

Tree Count Sensitivity

Diminishing returns beyond 200 trees. OOB R squared plateaus at 0.178 and alpha-20 stabilises around 8.7%. 300 trees is a good balance of accuracy and training speed.

Trees	OOB R2	OOB Top-5	OOB Top-20	Hit Rate
50	0.1761	10.57%	8.44%	90.1%
100	0.1776	10.65%	8.64%	90.5%
150	0.1778	10.67%	8.63%	90.6%
200	0.1779	10.58%	8.64%	90.7%
300	0.1781	10.70%	8.74%	91.0%
400	0.1782	10.68%	8.74%	91.0%
500	0.1782	10.69%	8.76%	91.0%

Decile Performance (OOB)

Suburbs ranked by OOB predictions (predictions from trees that did not train on each suburb). A clear monotonic gradient from bottom to top decile confirms the model's out-of-sample ranking ability.

Rolling Alpha (OOB vs Original)

12-month rolling average of OOB top-20 alpha vs the original forecast. The optimised model consistently outperforms the original across all time periods.

Methodology

Optimisation Objective

Primary metric: OOB top-20 alpha (out-of-bag predictions ranked per date)
OOB: each sample predicted only by trees that did not see it in training
Genuinely out-of-sample, no separate validation set needed
Also tracked: OOB R squared, OOB hit rate, OOB correlation

Search Strategy

Search 1: 120-combo HP grid (depth x leaf x max_features)
Search 2: 18 feature subsets (importance, domain, exclusion)
Search 3: n_estimators sensitivity (50 to 500)
Search 4: Greedy forward feature selection (20 steps)

Optimised Model

Random Forest: depth=8, leaf=20, max_features=0.7, 500 trees
13 features (forward selection optimum)
OOB R squared: 0.1251 (on 13-feature subset)
Overfitting gap: 0.31pp (in-sample 9.58% vs OOB 9.27%)

Data

1,178,824 training rows (house, 2-year, SAL markets with actuals)
6,492 suburbs across 192 forecast dates
Target: log growth relative to national average
Missing values filled with column median

Next Step: Walk-Forward Backtesting

OOB provides a good estimate of generalisation. Walk-forward backtesting with true temporal holdout will give the definitive out-of-sample result.

View Simple Models View Backtested Model