We design and run big-data pipelines that ingest, standardize, and harmonize high-volume healthcare datasets (EHR, claims, registries, imaging metadata, etc.) into analytics-ready structures, most commonly OMOP CDM. Includes distributed ETL, scalable storage/compute design, deduplication, quality checks, and performance tuning for repeatable refreshes and multi-source integration.