Justin | Portfolio

Singapore's public housing market is large, opaque, and data-rich. Over 227,000 HDB resale transactions have been registered since 2017 — but buyers and sellers still have to piece together comparable prices manually from listings and government portals. This project turns that data into an instant price estimate with a ±5% confidence range, served through a web interface anyone can use without an account.

What It Does

You type a block number and street name. The form auto-fills the town, lease commencement year, and typical floor area from historical transaction data for that exact block. Fill in the remaining details — storey range, flat type, flat model, floor area, transaction month — and submit.

Within a second you get a predicted resale price alongside a breakdown of the key factors: distance to the CBD, nearest MRT, schools, remaining lease, building age, and estate maturity. Location access lets the browser pre-fill the town based on your coordinates.

Tech Stack

The backend is FastAPI in a Docker container deployed on Render, with XGBoost 2.1.0 as the model runtime. The frontend is a single HTML file with vanilla JS — no framework, no build step. Rate limiting is handled by slowapi (10 predictions/min per IP).

How It Predicts

When you submit, the API geocodes your address against a pre-built cache of 227,000+ HDB addresses — no live API call needed. It then computes distances to the CBD, regional MRT hubs, all 171 stations, schools, hawker centres, malls, and parks using the Haversine formula. These features, along with lease decay curves and market volume indicators, are fed into the model to produce a prediction.

One non-obvious decision: using 4 regional MRT hub distances instead of all 171 stations actually improved accuracy. With a dense transit network, distance to the nearest station collapses to near-zero for almost every flat — it loses all discriminating power. The four hubs (Ang Mo Kio, Woodlands, Jurong East, Tampines) act as North/South/East/West centrality anchors instead.

Deployment Constraints

Running on Render's free tier means 512MB RAM. The full 5-fold ensemble consistently OOM-crashed on startup, so only the best-performing fold (fold 3, R² = 0.9799) is loaded in production. Switching to XGBoost's binary .ubj format over text JSON cut the per-model memory spike from ~313MB to ~269MB — the difference between booting and crashing.

Results

Metric	Value
Deployed model R²	0.9799
Test RMSE (full ensemble)	~$33–35k
Baseline RMSE	$55,116
RMSE reduction	36–40%

The ~$33–35k RMSE is roughly 5–7% of the median resale price, achieved through feature engineering alone — no additional data sources beyond publicly available amenity data and the geocoded address cache.