Research Whitepaper

Transport Ecosystem Index: Technical Whitepaper

Full statistical methodology, bin performance analysis, temporal consistency testing, and regional robustness results across 272,958 property sales.

+4.0% p.a.

Annual Spread

p = 3.2e-142

Top Bin Significance

147/163

Sample Dates Positive

272,958

Total Sales Tested

Luke Metcalfe

Founder & Chief Data Scientist

15+ years in property data analytics

1. Abstract
2. Methodology
3. Bin Performance
4. Temporal Analysis
5. Regional Robustness
6. Defence of Method
7. Limitations

1. Abstract

This paper presents a composite index that captures how residents travel within Australian suburbs. The index combines six government data transport variables into a single score. The counter-intuitive finding is that suburbs where residents rely more on private transport, in spacious, car-friendly areas, outperform denser suburbs with extensive public transport networks.

We trained a predictive model on historical property sales data to predict 4-year annualised growth rates. The model was evaluated out of sample and achieved an R-squared of 0.1761. The overall model R-squared was 0.213 with train R-squared of 0.1694. The top-scoring suburbs outperformed the national median by +4.0 percentage points over 4 years. The bottom-scoring suburbs underperformed by -3.3 percentage points.

The signal was tested across 163 quarterly periods and 13 GCCSA regions. The top bin was statistically significant at 147 of 163 dates. The above-threshold suburbs beat the below-threshold suburbs in 122 of 163 quarters (75%). The signal produced a positive spread in 8 of 13 regions. These results indicate a persistent, geographically broad relationship between travel mode patterns and subsequent property growth.

2. Methodology

2.1 Feature Construction

The Transport Ecosystem Index is built from six government data-derived travel mode variables. These variables capture commuting patterns and travel mode preferences at the suburb level. We do not disclose the specific census field names, as the combination itself represents proprietary research.

Each variable was standardised and combined into a single composite score using a weighted formula. The weights were derived from a model trained to predict 4-year annualised property growth relative to the national median.

2.2 Model Training

The model was trained on property sales data spanning 2008 to 2021. Each observation is a single property sale. The target variable is the annualised 4-year growth rate from the sale date. The features are the six transport-derived variables for the suburb where the sale occurred.

The model was trained using a strict out-of-sample framework. Training and test sets were split by time period to prevent data leakage. Properties sold in overlapping windows were excluded from the test set to ensure clean separation.

2.3 Performance Metric

The primary metric is the difference in median annualised 4-year growth between each bin and the national median. Statistical significance is assessed using a two-sided t-test against the null hypothesis that the bin's mean growth equals the national mean.

diff = median_growth(bin) - median_growth(national)
t-statistic = (mean(bin) - mean(national)) / SE(bin)
p-value from two-sided t-test

2.4 Out-of-Sample R-squared

The model achieved an out-of-sample R-squared of 0.1761. The training R-squared was 0.1694 and the overall R-squared was 0.213. This means 17.6% of variance in 4-year suburb-level growth can be explained by the six travel mode variables alone. The close alignment between train and test R-squared confirms the model is not overfitted.

Note on R-squared: Property prices are influenced by hundreds of factors, including interest rates, infrastructure, zoning, supply, local employment, and macroeconomic conditions. No single thematic index will produce a very high R-squared. The relevant question is whether the signal is statistically significant and consistent, not whether it explains most of the variance. An out-of-sample R-squared of 0.176 is notably strong for a single thematic factor.

3. Bin Performance

The model sorts suburbs into three bins based on their Transport Ecosystem score. Each bin has a distinct growth profile. The table below shows the full results.

Top Bin

+4.0%

p = 3.2e-142 N = 38,548 sales Range: 86 to 100

Middle Bin

+0.7%

p = 4.2e-10 N = 146,319 sales Range: 32 to 86

Bottom Bin

-3.3%

p = 8.3e-174 N = 88,091 sales Range: 0 to 32

Bin	Score Range	Diff vs National	p-value	N (Sales)	Significant
Top	86 to 100	+4.0%	3.2e-142	38,548	Yes
Middle	32 to 86	+0.7%	4.2e-10	146,319	Yes
Bottom	0 to 32	-3.3%	8.3e-174	88,091	Yes

Key observation: All three bins produce statistically significant results. The spread between top and bottom is 7.3 percentage points over 4 years. The monotonic ordering (top strongly positive, middle mildly positive, bottom strongly negative) confirms the index captures a real gradient in growth outcomes. The bottom bin's p-value of 8.3e-174 is one of the strongest signals across all Microburbs indices.

4. Temporal Analysis

A signal that works at one point in time could be a fluke. We tested the Transport Ecosystem Index across every quarter from 2008-Q1 to 2021-Q3.

The above-threshold suburbs (blue) sit above the below-threshold suburbs (red) in 122 of 163 quarters (75%). The separation is strongest during 2012 to 2016, where the above-threshold suburbs consistently outperform by over 3 percentage points. In 2014-Q3, the gap peaked at 4.57 percentage points (+2.46% top vs -2.11% bottom). The signal narrows after 2017 as market conditions shifted, but the above-threshold suburbs remain higher for most of the post-2017 period as well.

4.1 Date-by-Date Consistency

The top bin was statistically significant at 147 of 163 individual dates. Below is a consistency summary showing how the signal performed across different time windows. Of 163 dates tested, the above-threshold suburbs beat the below-threshold suburbs at 122 dates (75%).

Sample Window	Quarters in Period	Top > Bottom	Consistency
2000s
Jan 2008 → Oct 2013	8	3 of 8	38%
2010s
Jan 2010 → Oct 2015	8	4 of 8	50%
Jan 2012 → Oct 2017	8	8 of 8	100%
Jan 2014 → Oct 2019	8	8 of 8	100%
Jan 2016 → Oct 2021	8	8 of 8	100%
Jan 2018 → Oct 2023	8	8 of 8	100%
2020s
Jan 2020 → Jul 2025	7	5 of 7	71%

Pattern: The signal is most consistent during the 2012 to 2019 core period. The early years (2008-2011) coincide with the post-GFC recovery, when distressed sales across all suburb types compressed growth differentials. The 2020-2021 tail weakened as the pandemic property boom lifted all suburbs regardless of travel mode profile. From 2012 onward, the above-threshold suburbs outperformed the below-threshold suburbs in every single quarter through to 2019-Q4.

5. Regional Robustness

A signal that works only in one city is less useful than one that works nationally. We tested the Transport Ecosystem Index across all 13 GCCSA (Capital City Statistical Area) regions in Australia.

The signal produces a positive spread in 8 of 13 regions. Darwin leads with a +3.23% spread. Melbourne follows at +3.02%. Five regions show a negative spread. The weakest region is Rest of NT at -4.24%, where the small sample of 147 sales limits reliability.

5.1 Full Regional Table

All growth rates are annualised over 4 years. The spread column shows the difference between the above-threshold and below-threshold growth rates.

Region (GCCSA)	City	Top Tier Growth	Bottom Tier Growth	Spread	N (Sales)
Darwin	Darwin	-3.42%	-6.65%	+3.23%	367
Melbourne	Melbourne	-0.13%	-3.15%	+3.02%	4,945
Rest of WA	Regional WA	-0.77%	-3.14%	+2.37%	3,603
Rest of Qld	Regional Qld	+0.86%	-1.20%	+2.06%	13,184
Rest of NSW	Regional NSW	+2.45%	+1.03%	+1.42%	14,351
Sydney	Sydney	-0.93%	-1.24%	+0.31%	4,576
Adelaide	Adelaide	+0.15%	-0.08%	+0.23%	3,216
Rest of Vic.	Regional Vic.	+2.30%	+2.19%	+0.11%	7,443
ACT	ACT	+0.15%	+0.70%	-0.55%	1,129
Brisbane	Brisbane	+0.07%	+0.71%	-0.64%	1,854
Perth	Perth	-3.35%	-2.41%	-0.94%	2,684
Rest of SA	Regional SA	-1.40%	-0.46%	-0.94%	2,342
Rest of NT	Regional NT	-6.97%	-2.73%	-4.24%	147

Strongest regions: Darwin (+3.23% spread across 367 sales) and Melbourne (+3.02% spread across 4,945 sales). Even in declining markets like Darwin and Rest of WA, top-scoring suburbs fell less than bottom-scoring ones. The five negative-spread regions include Rest of NT (147 sales, too small to be reliable), Perth, Rest of SA, Brisbane, and ACT. In the ACT, government employment patterns dominate suburb-level variation.

6. Defence of Method

6.1 Why the R-squared Is Noteworthy

An out-of-sample R-squared of 0.176 means the model explains 17.6% of variance in 4-year suburb-level growth. This is a strong result for a single thematic factor. Property prices are shaped by hundreds of variables. Interest rate cycles, infrastructure investment, zoning changes, population growth, local employment, and macroeconomic conditions all play a role. No single index will capture the majority of variance.

The close alignment between train R-squared (0.1694) and test R-squared (0.1761) confirms the model is not overfitted. The signal generalises to unseen data. A model that explained 50% of property growth from six government data variables would be implausible and almost certainly overfitted.

6.2 Statistical Significance

The top bin's outperformance has a p-value of 3.2e-142. This is not borderline. The probability of observing a +4.0% difference across 38,548 sales by random chance is effectively zero. For context, a p-value below 0.05 is the standard threshold for statistical significance. The Transport Ecosystem Index exceeds this by 140 orders of magnitude.

The bottom bin is even more significant at p = 8.3e-174 across 88,091 sales. Both tails of the distribution carry strong statistical weight.

6.3 Consistency Over Time

The top bin was significant at 147 of 163 dates. The above-threshold suburbs beat the below-threshold suburbs in 122 of 163 quarters (75%). A signal that works three-quarters of the time across a full market cycle spanning booms, corrections, and the COVID pandemic is robust. The 25% of quarters where the signal did not hold are clustered around the GFC recovery (2008-2011) and the pandemic boom (2020-2021), which are identifiable and understandable market conditions.

6.4 Geographic Breadth

The spread is positive in 8 of 13 GCCSA regions. It works in declining markets (Darwin at +3.23%, Rest of WA at +2.37%) and in growing markets (Rest of Queensland at +2.06%, Rest of NSW at +1.42%). The five negative-spread regions include three with small sample sizes (Rest of NT at 147 sales) or unique market dynamics (ACT, Brisbane).

6.5 Practical Use

Investors do not need a model to predict exact prices. They need a signal to tilt the odds in their favour across a portfolio of purchases. A 4.0 percentage point advantage per purchase, compounded over time, is highly meaningful. Combined with other Microburbs signals, the Transport Ecosystem Index forms one layer in a multi-factor approach to suburb selection.

Analogy: A weather model that explains 17.6% of daily temperature variation would be a useful tool for identifying warm and cold regions. The Transport Ecosystem Index reliably identifies which suburbs grow faster on average, across 13 years of data and 13 geographic zones. It describes a property "climate" pattern. It is not a daily forecast for any single suburb.

7. Limitations

7.1 Government Data Is Point-in-Time

The index relies on government data sources including the Australian Census from 2016 and 2021. Census data is collected every five years. Transport usage patterns can shift between census dates. A suburb that scored highly in 2016 may have changed by 2021 due to new infrastructure or shifting commute patterns. The model cannot capture these shifts in real time.

7.2 Backward-Looking Model

The model was trained on historical data from 2008 to 2021. Past patterns do not guarantee future results. The relationship between travel mode patterns and property growth could weaken or reverse if underlying market dynamics change. New transport infrastructure or shifts in commuting preferences may alter suburb rankings in ways not captured by the current model.

7.3 Individual Suburb Variation

Even within the top bin, individual suburb outcomes vary widely. The index provides a statistical edge across large numbers of purchases, not a guarantee for any single suburb.

7.4 Five Negative-Spread Regions

The signal inverts in 5 of 13 regions. Brisbane (-0.64%), ACT (-0.55%), Perth (-0.94%), Rest of SA (-0.94%), and Rest of NT (-4.24%) all show negative spreads. Investors in these regions should treat the Transport Ecosystem Index with caution. The signal is strongest in Melbourne, Darwin, and regional areas of WA, Queensland, and NSW.

7.5 Sample Size Variability

Some regions have small sample sizes. Rest of NT contributed only 147 sales. Darwin contributed 367 sales. Results in small-sample regions carry wider confidence intervals and should be interpreted with caution.

7.6 No Causal Claim

This paper documents a correlation, not a causal mechanism. We hypothesise that spacious, car-friendly suburbs offer larger lots, lower density, and lifestyle appeal that sustains buyer demand over time. But the data does not prove causation. Other unmeasured variables (such as land supply constraints, household income, or proximity to employment centres) may explain part or all of the observed relationship.

Summary of limitations: The Transport Ecosystem Index is a statistical tool, not a crystal ball. It identifies a persistent pattern across 272,958 sales, 13 years, and 13 regions. But individual outcomes will vary. Census data updates infrequently. The model is backward-looking. The signal does not hold in all regions. Use this index as one factor in a broader investment framework.

Access Suburb-Level Scores

Get Transport Ecosystem scores for every suburb in Australia. Combine with other Microburbs signals to build a shortlist backed by data.

Explore on Microburbs Back to Overview

Part of the Threshold Signals research programme