PostHog Tracks
Healthy
Degraded / under review
Broken
Frozen / historical
Consumer
🪣 Bucket / raw files (parquet)
🧭 DAG
🧱 dbt model
📊 BI / dashboard
⚠️ known issue / degraded
PostHog batch export drops hourly parquet files in GCS. The hourly
posthog-tracks DAG loads them into posthog.raw_tracks,
then a dbt incremental MERGE (posthog_tracks) folds new rows into
the consumer-facing posthog.tracks (~3.1B rows, all history).
flowchart LR
subgraph SRC["PostHog batch export"]
direction LR
PH["PostHog
product analytics SaaS"] BUCKET["🪣 GCS
gs://data-platform-dataflow/
{env}/posthog_batch/events/
{date}/{hour}/*.parquet.sz"] end subgraph DAGSG["🧭 posthog-tracks DAG (@hourly)"] direction LR SENSOR["wait_for_posthog_batch
GCS sensor, deferrable"] LOADER["raw_tracks
load_posthog_tracks.py
parquet → Pydantic → BQ"] FRESH["dbt source freshness
raw_tracks (2h warn / 4h err)"] MODEL["🧱 posthog_tracks
dbt incremental MERGE
unique_key=event_id"] GATE["dbt test
06:00 / 18:00 UTC"] end RAW["posthog.raw_tracks
BQ · partition created_at
2026-03-12 → present"] TRACKS["⚠️ posthog.tracks
BQ · partition created_at
~3.1B rows, all history"] subgraph BFSG["Backfill (manual, complete)"] direction LR BACKFILL["posthog.backfill_tracks
2025-05 → 2026-03-11"] MERGEBF["merge_backfill_to_tracks.py
insert-only, day-by-day"] end subgraph CONS["Consumers"] direction LR MKT["🧱 installs_subscribers_by_channel_platform
marketing · tag:marketing"] GROWTH["🧱 int_events_growth_union_events
growth events_growth"] end TABLEAU["📊 Product / Growth dashboards"] PH --> BUCKET BUCKET --> SENSOR SENSOR --> LOADER LOADER --> RAW RAW --> FRESH FRESH --> MODEL MODEL --> TRACKS MODEL --> GATE BACKFILL --> MERGEBF MERGEBF -.-> TRACKS TRACKS --> MKT TRACKS --> GROWTH MKT -.-> TABLEAU GROWTH -.-> TABLEAU classDef file fill:#f6f8fa,stroke:#6e7781,color:#000 classDef focus fill:#fff8c5,stroke:#9a6700,color:#000,stroke-width:2px classDef issue fill:#ffebe9,stroke:#cf222e,color:#000,stroke-width:2px classDef frozen fill:#eaeef2,stroke:#8c959f,color:#57606a classDef consumer fill:#d6e0ff,stroke:#0969da,color:#000,stroke-width:2px class BUCKET file class PH,SENSOR,LOADER,FRESH,MODEL,GATE,RAW focus class TRACKS issue class BACKFILL,MERGEBF frozen class MKT,GROWTH,TABLEAU consumer click BUCKET "tracks.html#pipeline" "Open posthog.tracks source page" click LOADER "tracks.html#pipeline" "Open posthog.tracks source page" click MODEL "tracks.html#pipeline" "Open posthog.tracks source page" click TRACKS "tracks.html#pipeline" "Open posthog.tracks source page" click MERGEBF "tracks_duplicate_event_id.html#root-cause" "Open 2026-06-12 posthog.tracks duplicate event_id" linkStyle default stroke:#d8dee4,stroke-width:1.2px,color:#8c959f linkStyle 0,1,2,3,4,5,6,7 stroke:#9a6700,stroke-width:2.5px,color:#000
product analytics SaaS"] BUCKET["🪣 GCS
gs://data-platform-dataflow/
{env}/posthog_batch/events/
{date}/{hour}/*.parquet.sz"] end subgraph DAGSG["🧭 posthog-tracks DAG (@hourly)"] direction LR SENSOR["wait_for_posthog_batch
GCS sensor, deferrable"] LOADER["raw_tracks
load_posthog_tracks.py
parquet → Pydantic → BQ"] FRESH["dbt source freshness
raw_tracks (2h warn / 4h err)"] MODEL["🧱 posthog_tracks
dbt incremental MERGE
unique_key=event_id"] GATE["dbt test
06:00 / 18:00 UTC"] end RAW["posthog.raw_tracks
BQ · partition created_at
2026-03-12 → present"] TRACKS["⚠️ posthog.tracks
BQ · partition created_at
~3.1B rows, all history"] subgraph BFSG["Backfill (manual, complete)"] direction LR BACKFILL["posthog.backfill_tracks
2025-05 → 2026-03-11"] MERGEBF["merge_backfill_to_tracks.py
insert-only, day-by-day"] end subgraph CONS["Consumers"] direction LR MKT["🧱 installs_subscribers_by_channel_platform
marketing · tag:marketing"] GROWTH["🧱 int_events_growth_union_events
growth events_growth"] end TABLEAU["📊 Product / Growth dashboards"] PH --> BUCKET BUCKET --> SENSOR SENSOR --> LOADER LOADER --> RAW RAW --> FRESH FRESH --> MODEL MODEL --> TRACKS MODEL --> GATE BACKFILL --> MERGEBF MERGEBF -.-> TRACKS TRACKS --> MKT TRACKS --> GROWTH MKT -.-> TABLEAU GROWTH -.-> TABLEAU classDef file fill:#f6f8fa,stroke:#6e7781,color:#000 classDef focus fill:#fff8c5,stroke:#9a6700,color:#000,stroke-width:2px classDef issue fill:#ffebe9,stroke:#cf222e,color:#000,stroke-width:2px classDef frozen fill:#eaeef2,stroke:#8c959f,color:#57606a classDef consumer fill:#d6e0ff,stroke:#0969da,color:#000,stroke-width:2px class BUCKET file class PH,SENSOR,LOADER,FRESH,MODEL,GATE,RAW focus class TRACKS issue class BACKFILL,MERGEBF frozen class MKT,GROWTH,TABLEAU consumer click BUCKET "tracks.html#pipeline" "Open posthog.tracks source page" click LOADER "tracks.html#pipeline" "Open posthog.tracks source page" click MODEL "tracks.html#pipeline" "Open posthog.tracks source page" click TRACKS "tracks.html#pipeline" "Open posthog.tracks source page" click MERGEBF "tracks_duplicate_event_id.html#root-cause" "Open 2026-06-12 posthog.tracks duplicate event_id" linkStyle default stroke:#d8dee4,stroke-width:1.2px,color:#8c959f linkStyle 0,1,2,3,4,5,6,7 stroke:#9a6700,stroke-width:2.5px,color:#000
Models & tables
| Object | Type | Partition / cluster | Range | Role |
|---|---|---|---|---|
posthog.raw_tracks |
BQ table (loader output) | part created_at (DAY) · cluster event_name, user_id |
2026-03-12 → present | Landing zone for hourly parquet, written by load_posthog_tracks.py (DELETE-before-APPEND on window_start). |
posthog_tracks → posthog.tracks |
dbt incremental MERGE | part created_at (DAY) · cluster event_name, user_id |
All history → present | Consumer table (~3.1B rows). unique_key=event_id, full_refresh=false; only processes created_at >= 2026-03-04. |
posthog.backfill_tracks |
BQ table (backfill output) | part created_at (DAY) · cluster event_name, user_id |
2025-05-01 → 2026-03-11 | Historical gap-fill (Dataflow outage). Merged into tracks once via merge_backfill_to_tracks.py; backfill complete. |
Key columns
| Column | Type | Notes |
|---|---|---|
event_id | STRING | PostHog event UUID, stable across retries. unique_key for the MERGE. Currently duplicated in tracks (see report). |
created_at | TIMESTAMP | Event timestamp (UTC). Partition key. Re-written on re-delivery by the MERGE today (secondary bug). |
event_name | STRING | PostHog event name, e.g. purchase_flow:subscription_purchase, library:page_view. Cluster key. |
user_id | STRING | PostHog distinct_id mapped to Moises user. Cluster key. |
person_id / session_id | STRING | PostHog person UUID / session identifier. |
client / lib_values | STRING / JSON | Platform marker (web / ios / android). Growth derives platform from lib_values.$lib. |
event_payload | JSON | Full PostHog properties payload, preserved for downstream analysis. |
bq_ingested_timestamp | TIMESTAMP | Loader stamp (one datetime.now() per run). MERGE dedup tiebreaker. A column value, not an insert time — see report pitfall. |
event_inserted_at | TIMESTAMP | Original upstream insert timestamp from the export payload. |
Consumers
- Marketing —
installs_subscribers_by_channel_platform(tag:marketing) readssource('posthog','tracks'). Adding this dependency on 2026-06-11 is what surfaced the duplicate-event_idissue (see report). - Growth —
int_events_growth_union_events(events_growth) readssource('posthog','tracks'), deriving platform fromlib_values.$lib.
Source status & changelog
| Source path | Status | Current role | Changelog / known issue | Detail |
|---|---|---|---|---|
| posthog.tracks | Degraded | Event-level product analytics, consumed by marketing & growth dbt models. | 2026-06-11: source unique test on event_id (scoped created_at >= 2026-03-04) fails — ~28.2M duplicate ids / ~28.8M excess rows (0.92%). Surfaced when the marketing model added a tracks dependency. Dedup in progress. |
posthog.tracks source · 2026-06-12 posthog.tracks duplicate event_id |
| posthog.raw_tracks | Healthy | Hourly landing table for PostHog batch parquet. | Source freshness checked hourly (warn 2h / error 4h on bq_ingested_timestamp). |
Loader page not curated yet |
| posthog.backfill_tracks | Frozen | Historical backfill (May 2025 → Mar 2026), already merged into tracks. |
Backfill complete. Contributed part of the legacy duplicate pool via day-scoped MERGE. | Not curated as a standalone page |
Report pages
- 2026-06-12 posthog.tracks duplicate event_id — ~28.8M excess rows; legacy backfill pool refreshed (not created) by the hourly MERGE; dedup plan.
Update rules
- For a source change, update the source row above first.
- If the status changes, update the Mermaid node color in the graph.
- If the change explains a downstream test/dashboard break, add a dated changelog note and link the detailed report page.
- Keep detailed per-model debugging on the source/report pages; keep this home page as graph + source status + changelog.