Data engineering · Legal · PropTech

Connecticut Foreclosure Intelligence

Legal Tech · PropTech (United States)

B2B platform that turns Connecticut judicial e-filing data into actionable foreclosure intelligence — web app, Excel reports, and a production-grade Python ETL pipeline.

Data pipeline and critical backend evolution since mid-2025 (~9 months). ETL feeds a Django web app used daily by legal and investment analysts.

PythonPostgreSQLpandasPrefectETLDjangoOCRLegal Tech

2025–2026

30+

E-filing tables

PostgreSQL source

423

ZIP codes mapped

County crosswalk

28k+

Cases in pipeline

Typical validation volume

Delivery channels

Email · cloud · storage

The Problem

Teams tracking Connecticut foreclosure cases must reconcile e-filing records (parties, addresses, documents, key dates), equity figures buried in PDF worksheets, market valuations from listing sites, and a complex priority of procedural statuses — mediation, default, strict foreclosure, bankruptcy, withdrawn, and more. Manual Excel consolidation does not scale and is error-prone.

The Solution

We built the data backbone of an enterprise platform: a Python 3.12 pipeline that extracts from a 30+ table PostgreSQL e-filing warehouse, applies Connecticut-specific business rules (forensic business days, county/ZIP geography, status priority engine), enriches cases with PDF equity and property links, and loads production application tables via explicit-column PostgreSQL COPY. Parallel QA tables capture how each status was triggered for audit. Outputs include multi-sheet Excel with conditional formatting and a web UI (login, operational canvas, advanced filters) backed by a Django API.

Engineering Highlights

30+

Source tables

423

CT ZIP crosswalk

28k+

Typical case rows

~9 mo

Pipeline runtime

Key deliverables

Production ETL from 30+ e-filing tables with CTE-based extraction
Priority status classification with forensic-day change detection
PDF equity extraction and property listing enrichment with caching
Explicit-column PostgreSQL COPY into application tables
Parallel QA validation table for status audit evidence
Excel reports plus web app (filters, canvas, role-based login)
Prefect scheduling, dry-run flags, and automated regression tests

Stack

Python 3.12ETL orchestration and transforms

pandas + SQLAlchemyAnalytical SQL and DataFrames

PostgreSQLSource warehouse and app schema loads

PrefectScheduled production runs

openpyxl / xlsxwriterMulti-tab Excel reports

pdfplumber + TesseractPDF equity extraction with cache

usaddressNormalized property addresses

DjangoProduction API and web app (ecosystem)

Google / Microsoft / S3Drive, SharePoint, object storage delivery

Priority status engine under court rule changes

Each case maps to legal buckets (mediation ended, default filed, strict foreclosure, bankruptcy, no longer in foreclosure, etc.) with configurable overrides when new document codes appear in Connecticut courts.

Recent-change detection on forensic business days

Highlights use Connecticut forensic calendars — weekends plus federal and state holidays — not calendar days alone, so analysts see truly recent movement in Excel and the app.

Safe PostgreSQL COPY into existing app schemas

Loads use explicit column lists aligned to live tables, eliminating positional drift that had misaligned equity and defendant counts. In-memory DataFrame loads skip unnecessary Excel round-trips.

App vs. QA data separation

Audit metadata (trigger documents, dock codes, status sources) lands in a parallel validation table; the dashboard table stays lean for performance and UX.

Multi-channel delivery and resilient scraping

Reports reach analysts via email OAuth, cloud drives, and object storage. SOCKS proxy watchdogs and alerts keep PDF retrieval reliable behind anti-bot constraints.

Have a similar project?

Let's build it right.

Start a Project