Microburbs
Subscriptions

ML Meta-Model Optimisation

Systematic hyperparameter and feature search using OOB (out-of-bag) predictions as the honest out-of-sample objective

120 HP combos, 18 feature subsets, 20-step forward selection. 1,178,824 training rows.

Luke Metcalfe
Luke Metcalfe
Founder & Chief Data Scientist
15+ years in property data analytics

OOB Top-20 Alpha

9.27%

Honest out-of-sample

OOB Top-5 Alpha

11.51%

Out-of-bag predictions

OOB Hit Rate

92.0%

Top-20 picks above national

Overfitting Gap

0.31pp

In-sample 9.58% vs OOB 9.27%

Why OOB matters. Random Forest's out-of-bag predictions give each sample a prediction from only trees that did not see it during training. This is a genuine out-of-sample metric, unlike the previous in-sample results (where RF Deep showed 10.81% alpha but was heavily overfitted). The optimised model achieves 9.27% OOB alpha-20 with only a 0.31 percentage point gap to in-sample, confirming minimal overfitting.

Improvement over baselines. Original forecast: 5.99% OOB top-20 alpha. Simple sum model (10 fields): 7.17%. Optimised RF: 9.27%. The ML model adds 3.28pp of genuine out-of-sample alpha over the original forecast and 2.10pp over the simple sum model.

Overfitting Gap

The gap between in-sample alpha and OOB alpha measures how much the model memorises training data rather than learning genuine patterns. A gap of 0.31pp (9.58% vs 9.27%) is very small, confirming the model generalises well. By contrast, the previous RF Deep model (depth 10) had an in-sample alpha of 10.81% but its OOB R squared of 0.20 suggests its true alpha would be much lower.

Overfitting gap comparison

Hyperparameter Grid Search

120 combinations of max_depth (3-8), min_samples_leaf (20-500), and max_features (0.3-1.0) were tested. Depth 8 dominates the top positions. Using 70% of features per split (max_features=0.7) achieves the best OOB alpha.

Hyperparameter heatmap

Top 10 Configurations by OOB Alpha-20

DepthMin LeafMax FeatOOB R2Top-5Top-20Hit RateCorr
8200.70.177910.58%8.64%90.7%0.4266
8500.70.176910.23%8.55%90.5%0.4252
8200.50.179410.11%8.50%90.1%0.4316
8501.00.169910.25%8.44%90.5%0.4136
8201.00.171210.67%8.43%90.3%0.4152
81000.70.17569.93%8.42%90.2%0.4236
7200.70.16388.42%7.87%88.2%0.4107
7500.70.16368.30%7.80%87.8%0.4104
6200.70.14987.34%7.28%87.4%0.3946
6200.50.14987.17%7.10%86.4%0.3944
max_features sensitivity

Feature Subset Search

Tested 18 feature subsets: top-N by importance, domain-grouped (price, growth, signals), and exclusion sets. The top-10 features by importance achieve the best OOB alpha at 8.72%, slightly above all 20 features at 8.64%. Removing synthetic features drops alpha by 0.47pp, confirming they provide genuine orthogonal signal.

Feature subset comparison
SubsetFeaturesOOB R2OOB Alpha-20Hit Rate
top-10 (by importance)100.17148.72%90.9%
top-20 (all)200.17748.64%90.6%
no_low_coverage200.17798.64%90.7%
top-12120.17518.60%90.4%
top-15150.17758.56%90.3%
no_census180.17158.43%90.4%
top-770.16168.19%89.5%
no_synth / univariate_only150.17358.17%88.5%
top-550.15417.98%88.6%
top-440.14197.11%85.1%
signals+forecast60.09227.00%85.2%
price+forecast70.15216.76%83.6%
growth+forecast50.14446.54%83.1%
forecast_only10.05125.82%78.3%
top-330.12895.73%79.5%

Greedy Forward Selection (OOB Alpha)

Starting from forecast_pred alone, features were added one at a time, always picking the feature that maximises OOB top-20 alpha. Peak performance at 13 features (9.24%). The forward selection chose very different features from the importance ranking: synthetic composites (transport, urban heat, cultural integration) were selected early despite low standalone importance, because they provide orthogonal signal that the forecast does not capture.

Forward selection path
StepFeature AddedTotalOOB R2OOB Alpha-20Hit Rate
0(forecast_pred only)10.05125.82%78.3%
1synth_transport_ecosystem20.08606.90%86.8%
2synth_environment_urban_heat30.09747.82%89.7%
3buy_3yr_growth40.10278.31%89.9%
4buy_price50.10988.73%91.1%
5census_public_housing60.11558.84%90.9%
6house_vacancy_rate70.12019.02%91.1%
7synth_cultural_integration80.12069.16%91.9%
8stock_on_market90.12099.19%92.4%
9rent_price100.12279.20%92.3%
10census_overseas_born110.12569.21%91.8%
11synth_business_innovation120.12559.19%91.8%
12months_of_supply130.12539.24%92.0%
13mib_perc_renters140.12899.06%91.3%
14owner_occupied150.12928.99%90.9%
15rent_3yr_growth160.14118.79%91.0%
16pct_sold_at_loss170.15618.78%90.5%
17buy_10yr_growth180.17858.89%90.9%
18buy_1yr_growth_75190.17788.76%90.9%
19synth_dev_infrastructure200.17748.63%90.5%

Note on OOB R squared vs OOB alpha divergence. After step 12 (13 features), OOB R squared keeps rising (from 0.13 to 0.18 at all 20 features) while OOB alpha declines (from 9.24% to 8.63%). This happens because the high-importance growth features (10yr growth, rent growth, distress) improve overall R squared across all suburbs but reduce the model's ability to identify the very best suburbs. They add noise to the top-N ranking even though they improve average prediction accuracy.

Optimised Model: Feature Importance

In the final 13-feature model, the forecast prediction accounts for 44% of importance. Synthetic transport ecosystem (20%) and urban heat (11%) are the second and third most important. These spatial composites capture suburb-level characteristics that the pure price/growth features miss.

Forecast
Univariate
Synthetic
forecast_pred
44.1%
transport_ecosystem_complete
19.9%
environment_urban_heat
10.6%
price_0_5_buy_house
5.4%
price_0_5_growth_3_year_buy_house
4.4%
house_vacancy_rate
4.2%
public_housing
3.5%
price_0_5_rent_house
2.9%
overseas_born_parents
2.2%
cultural_integration_ecosystem
1.2%
business_innovation_capacity
1.2%
stock_on_market
0.4%
months_of_supply
0.1%
Feature importance

Tree Count Sensitivity

Diminishing returns beyond 200 trees. OOB R squared plateaus at 0.178 and alpha-20 stabilises around 8.7%. 300 trees is a good balance of accuracy and training speed.

TreesOOB R2OOB Top-5OOB Top-20Hit Rate
500.176110.57%8.44%90.1%
1000.177610.65%8.64%90.5%
1500.177810.67%8.63%90.6%
2000.177910.58%8.64%90.7%
3000.178110.70%8.74%91.0%
4000.178210.68%8.74%91.0%
5000.178210.69%8.76%91.0%

Decile Performance (OOB)

Suburbs ranked by OOB predictions (predictions from trees that did not train on each suburb). A clear monotonic gradient from bottom to top decile confirms the model's out-of-sample ranking ability.

Decile performance

Rolling Alpha (OOB vs Original)

12-month rolling average of OOB top-20 alpha vs the original forecast. The optimised model consistently outperforms the original across all time periods.

Rolling alpha

Methodology

Optimisation Objective

  • Primary metric: OOB top-20 alpha (out-of-bag predictions ranked per date)
  • OOB: each sample predicted only by trees that did not see it in training
  • Genuinely out-of-sample, no separate validation set needed
  • Also tracked: OOB R squared, OOB hit rate, OOB correlation

Search Strategy

  • Search 1: 120-combo HP grid (depth x leaf x max_features)
  • Search 2: 18 feature subsets (importance, domain, exclusion)
  • Search 3: n_estimators sensitivity (50 to 500)
  • Search 4: Greedy forward feature selection (20 steps)

Optimised Model

  • Random Forest: depth=8, leaf=20, max_features=0.7, 500 trees
  • 13 features (forward selection optimum)
  • OOB R squared: 0.1251 (on 13-feature subset)
  • Overfitting gap: 0.31pp (in-sample 9.58% vs OOB 9.27%)

Data

  • 1,178,824 training rows (house, 2-year, SAL markets with actuals)
  • 6,492 suburbs across 192 forecast dates
  • Target: log growth relative to national average
  • Missing values filled with column median

Next Step: Walk-Forward Backtesting

OOB provides a good estimate of generalisation. Walk-forward backtesting with true temporal holdout will give the definitive out-of-sample result.

View Simple ModelsView Backtested Model
Microburbs

Australia's most comprehensive property data platform.

Explore

  • Suburb Reports
  • Region Reports
  • Property Reports
  • AI Property Finder
  • Suburb Finder

Resources

  • Blog
  • Academy
  • Podcast
  • Data Definitions
  • FAQ

About

  • About Microburbs
  • Contact Us
  • Careers

Legal

  • Terms of Use
  • Privacy Policy
  • Disclaimer

© 2026 Microburbs. All rights reserved.