Signal Homes — Signal Alpha · Market Intelligence

— how it works

A four-stage pipeline from raw signal to ranked return.

Two classes of data flow in. Derived features are computed per segment. An ensemble of supervised and time-series models produces a risk-adjusted return score with confidence intervals and per-prediction attribution. Output ships as scorecards and an API.

— architecture · fig. 1

Signal in, ranked return out.

Simplified architecture. Public-record ingestion is operational from day one; proprietary telemetry compounds with usage. SHAP-style feature attribution accompanies each segment score so every prediction can be explained, not just consumed.

— 01 · ingestion

Public + proprietary.

Two classes of data, fused at the segment level. The public layer is competitive parity. The proprietary layer is the wedge.

Public

ACRISdeeds, mortgages, transfer history
DOBpermits, violations, work-type density
311 complaintsper-building and per-block density
RLS · StreetEasyclosed and active comps
NYC Open Datazoning and rezoning actions
Census · ACSdemographic shifts
MTA ridershipper-station weekly trends
Tax rollsassessed value trajectories

Proprietary

Swipe telemetryswipe · dwell · skip
Visit notesground-truth observations
Building sentimentscored per-building
Off-market sourcingseller-intent signals
Search-to-tour conversionby segment

— 02 · features

Derived per segment.

Raw inputs become structured features. Each segment carries a moving panel of metrics that feed both supervised and time-series models.

Standard metrics

Absorption rateunits cleared per period
Price velocitylist-to-close drift
Supply / demand imbalanceactive inventory vs. tour volume
Days-on-marketfull distribution, not just median
Concession-adjusted effective pricenet of credits, buy-downs, included furnishings

Proprietary metric

Demand Intent Indexcomposite of swipe telemetry, dwell time, search-to-tour conversion, and visit-note sentiment. The signal that leads closed comps.

— 03 · modeling

Ensemble + leading indicator.

No single model carries the prediction. The output is an ensemble vote with a forecast horizon, a confidence interval, and a feature-attribution trace.

What runs

Gradient-boosted ensemblesXGBoost / LightGBM for segment-level price and appreciation
Time-series forecastingabsorption and demand trajectory
Demand leading-indicator modelswipe-level intent empirically leads closed comps — a forecast window measured in weeks
Geospatial clusteringsegments defined dynamically by neighborhood × building type × price band, not fixed zip codes
Risk-adjusted return scoreprojected appreciation discounted by predicted volatility, carrying cost, and expected time-on-market
SHAP-style attributionevery segment score ships with the why, not just the number

— 04 · output

Ranked + explainable.

The output is built to be acted on by an operator who has to defend the call. No black box, no single number without context.

Surface

Segment scorecardsranked, with confidence intervals
Opportunity alertstriggered when leading indicators diverge from closed comps
Driver breakdowntop features moving each score, per-prediction
Dashboard + APIread access, programmatic queries, exports

Built for review

Every score is auditableinputs, model version, attribution trace, timestamp

— the moat

v1 is useful. The flywheel makes it unreplicable.

Public-data v1 is already a sharper read than what most NY operators run on. The proprietary demand layer is what makes the system get better the more it gets used — a compounding dataset no portal can replicate.

The asymmetry is structural. Listing portals are seller-side businesses. Their revenue comes from agents and brokerages paying to surface inventory. They have no operational reason to collect demand-side ground truth — and if they tried, their seller customers would object to the asymmetry it would create. Demand telemetry is not a feature they are choosing not to build. It is data their business model prevents them from having.

Signal is buyer-side from inception. Every user interaction produces signal. Every visit produces ground truth. Every swipe sharpens the Demand Intent Index. The graph cannot be replicated retroactively — competitors arriving later face a dataset deficit that grows daily.

The flywheel. More users → more demand telemetry → better predictions → better outcomes for users → more users. The dataset is the moat. The moat compounds.

Don't just find a home.
Find the return.

Closed comps are a lagging indicator.