Simple Ensemble Models
Finding the optimal subset of threshold fields to combine with the suburb forecasting model. No machine learning. No backtesting. Sum of growth predictions, ranked per date.

Optimal Combination: 10 Fields
Forward selection found that 10 of 23 threshold fields produce the highest top-20 alpha when summed with the forecast prediction. Adding more fields beyond 10 dilutes the signal.
Tested 7 search strategies including greedy forward/backward selection, 2000 random subsets, and exhaustive leave-N-out. Forward selection and backward elimination both converged on 10 fields.
Performance Comparison
| Model | Top-5 | Top-20 | Top-50 | Hit Rate | Correlation |
|---|---|---|---|---|---|
| Original Forecast | 7.17% | 6.20% | 5.76% | 79.4% | 0.2247 |
| All 23 Fields (Sum) | 7.24% | 6.34% | 5.56% | 83.9% | 0.3303 |
| Optimal 10 Fields (Sum) | 7.78% | 7.17% | 6.16% | 85.5% | 0.2658 |
Search Strategy Results

| Strategy | Fields | Top-20 Alpha | Top-5 Alpha |
|---|---|---|---|
| Forward selection (10 fields) | 10 | 7.17% | 7.78% |
| Backward elimination (10 fields) | 10 | 7.13% | 7.67% |
| Best random (13 fields) | 13 | 6.93% | 7.63% |
| Leave-3-out (20 fields) | 20 | 6.62% | 7.71% |
| Leave-2-out (21 fields) | 21 | 6.59% | 7.66% |
| All 23 fields | 23 | 6.35% | 7.23% |
| Forecast only (no thresholds) | 0 | 5.88% | 6.33% |
Individual Field Value
Each field tested alone with the forecast. Green bars beat the forecast-only baseline (5.88%).

Selection Paths

Forward Selection: Step by Step
Each step adds the single field that most improves top-20 alpha.
| Step | Field Added | Total | Top-20 | Top-5 | Hit Rate |
|---|---|---|---|---|---|
| 0 | (forecast only) | 0 | 5.88% | 6.33% | 79.4% |
| 1 | pct_sold_at_loss | 1 | 6.27% | 6.99% | 81.7% |
| 2 | 1yr_buy_growth_75 | 2 | 6.55% | 7.31% | 83.0% |
| 3 | distress | 3 | 6.72% | 7.39% | 83.5% |
| 4 | mib_perc_renters | 4 | 6.83% | 7.47% | 83.9% |
| 5 | dev_infrastructure | 5 | 6.91% | 7.54% | 84.3% |
| 6 | business_innovation | 6 | 7.03% | 7.62% | 84.7% |
| 7 | overseas_born_parents | 7 | 7.09% | 7.68% | 85.0% |
| 8 | buy_price | 8 | 7.14% | 7.73% | 85.2% |
| 9 | stock_on_market | 9 | 7.16% | 7.76% | 85.4% |
| 10 | months_of_supply | 10 | 7.17% | 7.78% | 85.5% |
All 23 Fields: In vs Out
| Field | Source | In Optimal 10? |
|---|---|---|
| prop_list_pct_of_sold_at_loss_house | Production | Yes |
| prop_list_price_0_75_growth_1_year_buy_house | Production | Yes |
| distress | Custom | Yes |
| mib_perc_renters | Production | Yes |
| development_infrastructure_investment | Synthetic | Yes |
| business_innovation_capacity | Synthetic | Yes |
| census_overseas_born_parents | Production | Yes |
| prop_list_price_0_5_buy_house | Production | Yes |
| prop_list_stock_on_market | Production | Yes |
| prop_list_months_of_supply | Production | Yes |
| census_public_housing | Production | No |
| prop_list_house_vacancy_rate | Production | No |
| prop_list_price_0_5_growth_10_year_buy_house | Production | No |
| prop_list_price_0_5_growth_3_year_buy_house | Production | No |
| prop_list_price_0_5_growth_3_year_rent_house | Production | No |
| prop_list_price_0_5_rent_house | Production | No |
| rental_growth | Custom | No |
| mean_reversion | Custom | No |
| mean_reversion_8y | Custom | No |
| owner_occupied | Custom | No |
| cultural_integration_ecosystem | Synthetic | No |
| transport_ecosystem_complete | Synthetic | No |
| environment_urban_heat | Synthetic | No |
Dropped fields fall into three categories:
- Growth rate fields (3yr buy, 3yr rent, 10yr buy, 1yr buy price, rent price): these overlap with the forecast model which already uses hedonic price features
- Custom mean reversion fields (mean_reversion, mean_reversion_8y, rental_growth, owner_occupied): these use the same underlying data as production fields already in the model
- Weaker synthetics (cultural_integration, transport, urban_heat): these add noise without enough signal to offset
Time Period Breakdown
| Period | Dates | Rows | Original | All 23 | Optimal 10 |
|---|---|---|---|---|---|
| Early (first half) | 96 | 605,839 | 5.52% | 5.52% | 6.61% |
| Late (second half) | 96 | 572,985 | 6.97% | 7.18% | 7.88% |
Methodology
Model A (Sum)
For each suburb at each date, sum the growth predictions from all selected threshold fields plus the forecast model prediction. Rank suburbs within each date by this total score. Growth predictions come from value-based bin matching (production fields), direct threshold cutoffs (custom fields), or normalised percentile scores (synthetic composites).
Field Combination Search
- 7 search strategies tested across 4,300+ combinations
- Greedy forward selection: start with forecast, add the best field at each step
- Greedy backward elimination: start with all 23, remove the worst field at each step
- 2,000 random subsets of sizes 3-23
- Exhaustive leave-2-out (253 combinations) and leave-3-out (1,771 combinations)
Important Caveats
- No backtesting applied. Threshold bins use all available data, introducing potential look-ahead bias.
- The field selection itself is not cross-validated. The optimal 10 fields were selected on the full dataset.
- Correlation drops from 0.330 (all 23) to 0.266 (optimal 10) while alpha increases. The model sacrifices broad correlation for sharper top-N picks.
Explore the Forecasting Model
See the original backtested forecast model and the combined Random Forest approach.