Portfolio
Data engineering · Legal · PropTech

Connecticut Foreclosure Intelligence

Legal Tech · PropTech (United States)

B2B platform that turns Connecticut judicial e-filing data into actionable foreclosure intelligence — web app, Excel reports, and a production-grade Python ETL pipeline.

Data pipeline and critical backend evolution since mid-2025 (~9 months). ETL feeds a Django web app used daily by legal and investment analysts.

PythonPostgreSQLpandasPrefectETLDjangoOCRLegal Tech
2025–2026
30+
E-filing tables
PostgreSQL source
423
ZIP codes mapped
County crosswalk
28k+
Cases in pipeline
Typical validation volume
4+
Delivery channels
Email · cloud · storage
The Problem

Teams tracking Connecticut foreclosure cases must reconcile e-filing records (parties, addresses, documents, key dates), equity figures buried in PDF worksheets, market valuations from listing sites, and a complex priority of procedural statuses — mediation, default, strict foreclosure, bankruptcy, withdrawn, and more. Manual Excel consolidation does not scale and is error-prone.

The Solution

We built the data backbone of an enterprise platform: a Python 3.12 pipeline that extracts from a 30+ table PostgreSQL e-filing warehouse, applies Connecticut-specific business rules (forensic business days, county/ZIP geography, status priority engine), enriches cases with PDF equity and property links, and loads production application tables via explicit-column PostgreSQL COPY. Parallel QA tables capture how each status was triggered for audit. Outputs include multi-sheet Excel with conditional formatting and a web UI (login, operational canvas, advanced filters) backed by a Django API.

Engineering Highlights
30+
Source tables
423
CT ZIP crosswalk
28k+
Typical case rows
~9 mo
Pipeline runtime
Key deliverables
  • Production ETL from 30+ e-filing tables with CTE-based extraction
  • Priority status classification with forensic-day change detection
  • PDF equity extraction and property listing enrichment with caching
  • Explicit-column PostgreSQL COPY into application tables
  • Parallel QA validation table for status audit evidence
  • Excel reports plus web app (filters, canvas, role-based login)
  • Prefect scheduling, dry-run flags, and automated regression tests
Stack
Python 3.12ETL orchestration and transforms
pandas + SQLAlchemyAnalytical SQL and DataFrames
PostgreSQLSource warehouse and app schema loads
PrefectScheduled production runs
openpyxl / xlsxwriterMulti-tab Excel reports
pdfplumber + TesseractPDF equity extraction with cache
usaddressNormalized property addresses
DjangoProduction API and web app (ecosystem)
Google / Microsoft / S3Drive, SharePoint, object storage delivery

Priority status engine under court rule changes

Each case maps to legal buckets (mediation ended, default filed, strict foreclosure, bankruptcy, no longer in foreclosure, etc.) with configurable overrides when new document codes appear in Connecticut courts.

Recent-change detection on forensic business days

Highlights use Connecticut forensic calendars — weekends plus federal and state holidays — not calendar days alone, so analysts see truly recent movement in Excel and the app.

Safe PostgreSQL COPY into existing app schemas

Loads use explicit column lists aligned to live tables, eliminating positional drift that had misaligned equity and defendant counts. In-memory DataFrame loads skip unnecessary Excel round-trips.

App vs. QA data separation

Audit metadata (trigger documents, dock codes, status sources) lands in a parallel validation table; the dashboard table stays lean for performance and UX.

Multi-channel delivery and resilient scraping

Reports reach analysts via email OAuth, cloud drives, and object storage. SOCKS proxy watchdogs and alerts keep PDF retrieval reliable behind anti-bot constraints.

Have a similar project?

Let's build it right.

Start a Project