PostHog Tracks

Event-level product analytics from PostHog, landed in BigQuery as posthog.tracks. This home page is the curated map: pipeline, models, key columns, consumers, source status, and lineage.
Last update: 2026-06-12

Healthy Degraded / under review Broken Frozen / historical Consumer
🪣 Bucket / raw files (parquet) 🧭 DAG 🧱 dbt model 📊 BI / dashboard ⚠️ known issue / degraded

PostHog batch export drops hourly parquet files in GCS. The hourly posthog-tracks DAG loads them into posthog.raw_tracks, then a dbt incremental MERGE (posthog_tracks) folds new rows into the consumer-facing posthog.tracks (~3.1B rows, all history).

Scroll to zoom · drag to pan · click linked nodes for details
flowchart LR subgraph SRC["PostHog batch export"] direction LR PH["PostHog
product analytics SaaS"] BUCKET["🪣 GCS
gs://data-platform-dataflow/
{env}/posthog_batch/events/
{date}/{hour}/*.parquet.sz"] end subgraph DAGSG["🧭 posthog-tracks DAG (@hourly)"] direction LR SENSOR["wait_for_posthog_batch
GCS sensor, deferrable"] LOADER["raw_tracks
load_posthog_tracks.py
parquet → Pydantic → BQ
"] FRESH["dbt source freshness
raw_tracks (2h warn / 4h err)"] MODEL["🧱 posthog_tracks
dbt incremental MERGE
unique_key=event_id
"] GATE["dbt test
06:00 / 18:00 UTC"] end RAW["posthog.raw_tracks
BQ · partition created_at
2026-03-12 → present
"] TRACKS["⚠️ posthog.tracks
BQ · partition created_at
~3.1B rows, all history
"] subgraph BFSG["Backfill (manual, complete)"] direction LR BACKFILL["posthog.backfill_tracks
2025-05 → 2026-03-11"] MERGEBF["merge_backfill_to_tracks.py
insert-only, day-by-day"] end subgraph CONS["Consumers"] direction LR MKT["🧱 installs_subscribers_by_channel_platform
marketing · tag:marketing"] GROWTH["🧱 int_events_growth_union_events
growth events_growth"] end TABLEAU["📊 Product / Growth dashboards"] PH --> BUCKET BUCKET --> SENSOR SENSOR --> LOADER LOADER --> RAW RAW --> FRESH FRESH --> MODEL MODEL --> TRACKS MODEL --> GATE BACKFILL --> MERGEBF MERGEBF -.-> TRACKS TRACKS --> MKT TRACKS --> GROWTH MKT -.-> TABLEAU GROWTH -.-> TABLEAU classDef file fill:#f6f8fa,stroke:#6e7781,color:#000 classDef focus fill:#fff8c5,stroke:#9a6700,color:#000,stroke-width:2px classDef issue fill:#ffebe9,stroke:#cf222e,color:#000,stroke-width:2px classDef frozen fill:#eaeef2,stroke:#8c959f,color:#57606a classDef consumer fill:#d6e0ff,stroke:#0969da,color:#000,stroke-width:2px class BUCKET file class PH,SENSOR,LOADER,FRESH,MODEL,GATE,RAW focus class TRACKS issue class BACKFILL,MERGEBF frozen class MKT,GROWTH,TABLEAU consumer click BUCKET "tracks.html#pipeline" "Open posthog.tracks source page" click LOADER "tracks.html#pipeline" "Open posthog.tracks source page" click MODEL "tracks.html#pipeline" "Open posthog.tracks source page" click TRACKS "tracks.html#pipeline" "Open posthog.tracks source page" click MERGEBF "tracks_duplicate_event_id.html#root-cause" "Open 2026-06-12 posthog.tracks duplicate event_id" linkStyle default stroke:#d8dee4,stroke-width:1.2px,color:#8c959f linkStyle 0,1,2,3,4,5,6,7 stroke:#9a6700,stroke-width:2.5px,color:#000

Models & tables

ObjectTypePartition / clusterRangeRole
posthog.raw_tracks BQ table (loader output) part created_at (DAY) · cluster event_name, user_id 2026-03-12 → present Landing zone for hourly parquet, written by load_posthog_tracks.py (DELETE-before-APPEND on window_start).
posthog_tracksposthog.tracks dbt incremental MERGE part created_at (DAY) · cluster event_name, user_id All history → present Consumer table (~3.1B rows). unique_key=event_id, full_refresh=false; only processes created_at >= 2026-03-04.
posthog.backfill_tracks BQ table (backfill output) part created_at (DAY) · cluster event_name, user_id 2025-05-01 → 2026-03-11 Historical gap-fill (Dataflow outage). Merged into tracks once via merge_backfill_to_tracks.py; backfill complete.

Key columns

ColumnTypeNotes
event_idSTRINGPostHog event UUID, stable across retries. unique_key for the MERGE. Currently duplicated in tracks (see report).
created_atTIMESTAMPEvent timestamp (UTC). Partition key. Re-written on re-delivery by the MERGE today (secondary bug).
event_nameSTRINGPostHog event name, e.g. purchase_flow:subscription_purchase, library:page_view. Cluster key.
user_idSTRINGPostHog distinct_id mapped to Moises user. Cluster key.
person_id / session_idSTRINGPostHog person UUID / session identifier.
client / lib_valuesSTRING / JSONPlatform marker (web / ios / android). Growth derives platform from lib_values.$lib.
event_payloadJSONFull PostHog properties payload, preserved for downstream analysis.
bq_ingested_timestampTIMESTAMPLoader stamp (one datetime.now() per run). MERGE dedup tiebreaker. A column value, not an insert time — see report pitfall.
event_inserted_atTIMESTAMPOriginal upstream insert timestamp from the export payload.

Consumers

Source status & changelog

Source pathStatusCurrent roleChangelog / known issueDetail
posthog.tracks Degraded Event-level product analytics, consumed by marketing & growth dbt models. 2026-06-11: source unique test on event_id (scoped created_at >= 2026-03-04) fails — ~28.2M duplicate ids / ~28.8M excess rows (0.92%). Surfaced when the marketing model added a tracks dependency. Dedup in progress. posthog.tracks source · 2026-06-12 posthog.tracks duplicate event_id
posthog.raw_tracks Healthy Hourly landing table for PostHog batch parquet. Source freshness checked hourly (warn 2h / error 4h on bq_ingested_timestamp). Loader page not curated yet
posthog.backfill_tracks Frozen Historical backfill (May 2025 → Mar 2026), already merged into tracks. Backfill complete. Contributed part of the legacy duplicate pool via day-scoped MERGE. Not curated as a standalone page

Report pages

Update rules

  1. For a source change, update the source row above first.
  2. If the status changes, update the Mermaid node color in the graph.
  3. If the change explains a downstream test/dashboard break, add a dated changelog note and link the detailed report page.
  4. Keep detailed per-model debugging on the source/report pages; keep this home page as graph + source status + changelog.