Skip to main content

Overview

Set up S3 Batch routing to export your first-party event data to your AWS S3 bucket on a scheduled 24-hour cycle. Choose between JSON (GZIP compressed) or Parquet (Snappy compressed) format based on your data processing needs.
Prerequisites:
  • AWS account with S3 bucket creation permissions
  • IAM permissions to create users and policies
  • S3 bucket in the appropriate AWS region
  • Secure method to share AWS credentials with Permutive
  • Understanding of your data processing requirements (JSON vs Parquet)

When to Choose S3 Batch

Best for:
  • Organizations needing scheduled daily exports instead of real-time streaming
  • Teams ingesting data into data warehouses on a batch schedule
  • Publishers requiring Parquet format for efficient columnar storage
  • Organizations preferring predictable export schedules over continuous streaming
Consider alternatives if:
  • You need near real-time data access (use S3 Streaming, BigQuery, or Snowflake)
  • You require sub-hourly data updates
  • You prefer automatic schema management in a database

Setup Steps

S3 Batch routing requires coordination with Permutive support.
1

Choose Your Data Format

Decide between JSON and Parquet formats:
FormatBest For
JSON (GZIP)Human-readable, easier debugging, broader tool compatibility
Parquet (Snappy)Analytics workloads, better compression, data warehouse ingestion
Most organizations choose Parquet for production data pipelines due to better compression and query performance.
2

Prepare Your AWS Environment

Ensure you have an S3 bucket ready with public access blocked.
3

Contact Permutive Support

Email [email protected] with:
  • Bucket Name
  • Bucket Prefix (optional, e.g., permutive/)
  • Format: JSON or Parquet
Permutive will provide you with a bucket policy to attach to your S3 bucket.
4

Attach Bucket Policy

Attach the Permutive-provided bucket policy to your S3 bucket’s permissions, then notify Permutive support to confirm.
5

Setup Completion

Permutive will complete the integration setup and notify you when data begins flowing on the next batch export cycle.

Understanding S3 Batch Data Structure

S3 Batch exports use Hive-style partitioning organized by event type and date:

Folder Structure

s3://bucket/prefix/data/
├── pageview_events/
│   ├── year=2026/month=1/day=15/
│   │   └── data-000000000000.json.gz
│   └── year=2026/month=1/day=16/
├── videoview_events/
│   └── year=2026/month=1/day=15/
├── aliases/
│   └── year=2026/month=1/day=15/
├── domains/
│   └── data-000000000000.json.gz
└── segment_metadata/
    └── data-000000000000.json.gz

Data Types and Sync Modes

Data TypeDescriptionSync Mode
Events (e.g., pageview_events)User behavioral eventsIncremental (append)
aliasesIdentity data and alias mappingsIncremental (append)
domainsDomain-level metadataSnapshot (full replace)
segment_metadataSegment definitions and metadataSnapshot (full replace)
Incremental tables append new data each export cycle. Snapshot tables are fully replaced with each export to ensure the latest reference data.
Snapshot tables (segment_metadata, domains) are exported without date partitioning since they represent current state rather than time-series data.
See the S3 integration documentation for detailed schema information.

Export Schedule and Timing

Batch Export Characteristics:
  • Frequency: 24-hour cycles
  • Scope: Organization-level (includes all workspaces)
  • Timing: Contact your Customer Success Manager for specific schedule
  • Partitioning: Daily partitions by event type

Common Considerations

Export Timing: Batch exports run on 24-hour cycles. The exact timing is configured during setup. Contact your Customer Success Manager for your specific schedule.
Incremental vs Snapshot: Understand the difference between incremental tables (events, aliases) that append data and snapshot tables (segment_metadata, domains) that replace data. Design your data pipelines accordingly.
Organization-Level Scope: S3 Batch routing operates at the organization level, exporting data for all workspaces within your organization, unlike streaming routing which is workspace-specific.

What Happens After Setup

Once batch routing is active:
  1. Daily exports run automatically on the configured 24-hour schedule
  2. Event data is partitioned by event type and date
  3. Snapshot tables are replaced each export cycle
  4. Files are written in your chosen format (JSON or Parquet)

Next Steps