Back to story
Methodology

How the remittance case study was built.

The story combines public remittance statistics, provider disclosures, classified customer pain signals, and a constrained semantic SQL layer. It is designed to be understandable, auditable, and modest about its limits.

01 / Sources

Official flows and prices

World Bank WDI, Remittance Prices Worldwide, Knomad bilateral remittance estimates, UN DESA migrant stock, and IMF WEO macro proxies.

Feeds: corridor flows, receive-market dependence, corridor cost, FX and inflation context

Provider and market evidence

Wise, Remitly, Western Union, MoneyGram, Revolut filings, GSMA mobile money reports, Ripple public pages, and selected partner material.

Feeds: provider type, disclosures, licence footprint, digital revenue, market structure

Customer and regulatory text

CFPB complaints, app-store reviews, regulator releases, GDELT headlines, Chainalysis reports, and curated qualitative excerpts.

Feeds: pain classes, regulatory events, stablecoin adoption evidence, caveats

02 / L2 transformation

Step 1

Ingest

Pull public CSV/XLSX/API/PDF/text sources into source-specific staging files.

Step 2

Normalize

Standardize ISO3 countries, provider aliases, instruments, time grains, money, and percentages.

Step 3

Classify

Use LLM-assisted, few-shot classification for complaint, review, and regulatory text.

Step 4

Validate

Keep confidence, hashes, source IDs, unmapped rows, and human-readable caveats.

Step 5

Aggregate

Build facts, dimensions, scorecard views, and semantic-layer metadata for SQL.

The L2 layer uses consistent table naming, ISO3 country keys, canonical provider aliases, frozen instrument categories, decimal percentages, full USD values, source IDs, confidence scores, and unmapped-row logs instead of silent drops.

03 / AI-assisted text pipeline

LLMs classify text; SQL does the counting.

Complaints, reviews, and regulatory releases are converted into structured labels such as delay, fraud, fee dispute, KYC hold, app UX, enforcement action, or AML rule change. The app uses aggregated labels and rates, not raw private narratives.

Temperature set low for classification-style tasks.
Few-shot labels use frozen pain and event schemas.
Raw complaint and review text is not exposed in the app.
Outputs keep hashes, confidence, source IDs, and aggregate rates.
The Text-to-SQL assistant can only query tables and joins listed in the semantic layer.

04 / Corridor score

Headroom

Recent corridor cost compared with a low-cost digital floor.

Volume

Bilateral flow, log-scaled so mega-corridors do not dominate.

Challengers

Digital MTO and neobank coverage. More saturation lowers the score.

Regulatory friction

Recent enforcement or rule-change signals in the receive market.

FX instability

Exchange-rate volatility first, CPI instability as a documented fallback.

Hard-currency demand

Small positive bonus where USD-like receive demand sharpens the wedge.

The v4 score normalizes inputs across corridors. Headroom and volume lift a corridor; challenger density, regulatory friction, and FX instability lower it; hard-currency demand adds a small bonus.

05 / Caveats

Public data is uneven

Coverage varies by country, corridor, provider, source, and time period.

Reviews are sampled

App review evidence is US-heavy and Android-heavy, so it supports the risk story rather than replacing the warehouse.

This is hypothetical

The case study is for discussion only, not a recommendation or investment advice.