Microburbs
Subscriptions

Simple Ensemble Models

Finding the optimal subset of threshold fields to combine with the suburb forecasting model. No machine learning. No backtesting. Sum of growth predictions, ranked per date.

Luke Metcalfe
Luke Metcalfe
Founder & Chief Data Scientist
15+ years in property data analytics

Optimal Combination: 10 Fields

Forward selection found that 10 of 23 threshold fields produce the highest top-20 alpha when summed with the forecast prediction. Adding more fields beyond 10 dilutes the signal.

7.78%
Top-5 Alpha
vs 7.17% original
7.17%
Top-20 Alpha
vs 6.20% original
85.5%
Hit Rate (T20)
vs 79.4% original
7.88%
Late Period T20
vs 6.97% original

Tested 7 search strategies including greedy forward/backward selection, 2000 random subsets, and exhaustive leave-N-out. Forward selection and backward elimination both converged on 10 fields.

Performance Comparison

ModelTop-5Top-20Top-50Hit RateCorrelation
Original Forecast7.17%6.20%5.76%79.4%0.2247
All 23 Fields (Sum)7.24%6.34%5.56%83.9%0.3303
Optimal 10 Fields (Sum)7.78%7.17%6.16%85.5%0.2658
Key finding: The optimal 10-field combination beats the original forecast on every metric. Top-20 alpha improves from 6.20% to 7.17% (+0.97pp). The hit rate rises from 79.4% to 85.5%. Using all 23 fields only achieves 6.34% because 13 weaker fields dilute the signal.

Search Strategy Results

Best combination from each search strategy
StrategyFieldsTop-20 AlphaTop-5 Alpha
Forward selection (10 fields)107.17%7.78%
Backward elimination (10 fields)107.13%7.67%
Best random (13 fields)136.93%7.63%
Leave-3-out (20 fields)206.62%7.71%
Leave-2-out (21 fields)216.59%7.66%
All 23 fields236.35%7.23%
Forecast only (no thresholds)05.88%6.33%

Individual Field Value

Each field tested alone with the forecast. Green bars beat the forecast-only baseline (5.88%).

Individual field value when combined with forecast

Selection Paths

Backward elimination and forward selection paths
Convergence: Both greedy algorithms converge on 10 fields as the optimal count. Backward elimination peaks at 7.13% (dropping from 23 to 10). Forward selection peaks at 7.17% (building from 0 to 10). Adding field 11 reduces alpha in both directions. The two approaches select slightly different field sets but agree on the optimal count.

Forward Selection: Step by Step

Each step adds the single field that most improves top-20 alpha.

StepField AddedTotalTop-20Top-5Hit Rate
0(forecast only)05.88%6.33%79.4%
1pct_sold_at_loss16.27%6.99%81.7%
21yr_buy_growth_7526.55%7.31%83.0%
3distress36.72%7.39%83.5%
4mib_perc_renters46.83%7.47%83.9%
5dev_infrastructure56.91%7.54%84.3%
6business_innovation67.03%7.62%84.7%
7overseas_born_parents77.09%7.68%85.0%
8buy_price87.14%7.73%85.2%
9stock_on_market97.16%7.76%85.4%
10months_of_supply107.17%7.78%85.5%

All 23 Fields: In vs Out

FieldSourceIn Optimal 10?
prop_list_pct_of_sold_at_loss_houseProductionYes
prop_list_price_0_75_growth_1_year_buy_houseProductionYes
distressCustomYes
mib_perc_rentersProductionYes
development_infrastructure_investmentSyntheticYes
business_innovation_capacitySyntheticYes
census_overseas_born_parentsProductionYes
prop_list_price_0_5_buy_houseProductionYes
prop_list_stock_on_marketProductionYes
prop_list_months_of_supplyProductionYes
census_public_housingProductionNo
prop_list_house_vacancy_rateProductionNo
prop_list_price_0_5_growth_10_year_buy_houseProductionNo
prop_list_price_0_5_growth_3_year_buy_houseProductionNo
prop_list_price_0_5_growth_3_year_rent_houseProductionNo
prop_list_price_0_5_rent_houseProductionNo
rental_growthCustomNo
mean_reversionCustomNo
mean_reversion_8yCustomNo
owner_occupiedCustomNo
cultural_integration_ecosystemSyntheticNo
transport_ecosystem_completeSyntheticNo
environment_urban_heatSyntheticNo

Dropped fields fall into three categories:

  • Growth rate fields (3yr buy, 3yr rent, 10yr buy, 1yr buy price, rent price): these overlap with the forecast model which already uses hedonic price features
  • Custom mean reversion fields (mean_reversion, mean_reversion_8y, rental_growth, owner_occupied): these use the same underlying data as production fields already in the model
  • Weaker synthetics (cultural_integration, transport, urban_heat): these add noise without enough signal to offset

Time Period Breakdown

PeriodDatesRowsOriginalAll 23Optimal 10
Early (first half)96605,8395.52%5.52%6.61%
Late (second half)96572,9856.97%7.18%7.88%
Recent performance: The optimal 10-field combination delivers 7.88% top-20 alpha in the late period (recent data), compared to 6.97% for the original forecast. This is a 0.91pp improvement in the most relevant time window, suggesting the threshold signals add genuine value in the current market.

Methodology

Model A (Sum)

For each suburb at each date, sum the growth predictions from all selected threshold fields plus the forecast model prediction. Rank suburbs within each date by this total score. Growth predictions come from value-based bin matching (production fields), direct threshold cutoffs (custom fields), or normalised percentile scores (synthetic composites).

Field Combination Search

  • 7 search strategies tested across 4,300+ combinations
  • Greedy forward selection: start with forecast, add the best field at each step
  • Greedy backward elimination: start with all 23, remove the worst field at each step
  • 2,000 random subsets of sizes 3-23
  • Exhaustive leave-2-out (253 combinations) and leave-3-out (1,771 combinations)

Important Caveats

  • No backtesting applied. Threshold bins use all available data, introducing potential look-ahead bias.
  • The field selection itself is not cross-validated. The optimal 10 fields were selected on the full dataset.
  • Correlation drops from 0.330 (all 23) to 0.266 (optimal 10) while alpha increases. The model sacrifices broad correlation for sharper top-N picks.

Explore the Forecasting Model

See the original backtested forecast model and the combined Random Forest approach.

Original Forecast ModelCombined Backtest Model
Microburbs

Australia's most comprehensive property data platform.

Explore

  • Suburb Reports
  • Region Reports
  • Property Reports
  • AI Property Finder
  • Suburb Finder

Resources

  • Blog
  • Academy
  • Podcast
  • Data Definitions
  • FAQ

About

  • About Microburbs
  • Contact Us
  • Careers

Legal

  • Terms of Use
  • Privacy Policy
  • Disclaimer

© 2026 Microburbs. All rights reserved.