Setting Up AWS S3 Batch Routing

Overview

Set up S3 Batch routing to export your first-party event data to your AWS S3 bucket on a scheduled 24-hour cycle. Choose between JSON (GZIP compressed) or Parquet (Snappy compressed) format based on your data processing needs.

Prerequisites:

AWS account with S3 bucket creation permissions
IAM permissions to create users and policies
S3 bucket in the appropriate AWS region
Secure method to share AWS credentials with Permutive
Understanding of your data processing requirements (JSON vs Parquet)

When to Choose S3 Batch

Best for:

Organizations needing scheduled daily exports instead of real-time streaming
Teams ingesting data into data warehouses on a batch schedule
Publishers requiring Parquet format for efficient columnar storage
Organizations preferring predictable export schedules over continuous streaming

Consider alternatives if:

You need near real-time data access (use S3 Streaming, BigQuery, or Snowflake)
You require sub-hourly data updates
You prefer automatic schema management in a database

Setup Steps

S3 Batch routing requires coordination with Permutive support.

Choose Your Data Format

Decide between JSON and Parquet formats:

Format	Best For
JSON (GZIP)	Human-readable, easier debugging, broader tool compatibility
Parquet (Snappy)	Analytics workloads, better compression, data warehouse ingestion

Most organizations choose Parquet for production data pipelines due to better compression and query performance.

Prepare Your AWS Environment

Ensure you have an S3 bucket ready with public access blocked.

Contact Permutive Support

Email [email protected] with:

Bucket Name
Bucket Prefix (optional, e.g., permutive/)
Format: JSON or Parquet

Permutive will provide you with a bucket policy to attach to your S3 bucket.

Attach Bucket Policy

Attach the Permutive-provided bucket policy to your S3 bucket’s permissions, then notify Permutive support to confirm.

Setup Completion

Permutive will complete the integration setup and notify you when data begins flowing on the next batch export cycle.

Understanding S3 Batch Data Structure

S3 Batch exports use Hive-style partitioning organized by event type and date:

Folder Structure

s3://bucket/prefix/data/
├── pageview_events/
│   ├── year=2026/month=1/day=15/
│   │   └── data-000000000000.json.gz
│   └── year=2026/month=1/day=16/
├── videoview_events/
│   └── year=2026/month=1/day=15/
├── aliases/
│   └── year=2026/month=1/day=15/
├── domains/
│   └── data-000000000000.json.gz
└── segment_metadata/
    └── data-000000000000.json.gz

Data Types and Sync Modes

Data Type	Description	Sync Mode
Events (e.g., `pageview_events`)	User behavioral events	Incremental (append)
`aliases`	Identity data and alias mappings	Incremental (append)
`domains`	Domain-level metadata	Snapshot (full replace)
`segment_metadata`	Segment definitions and metadata	Snapshot (full replace)

Incremental tables append new data each export cycle. Snapshot tables are fully replaced with each export to ensure the latest reference data.

Snapshot tables (segment_metadata, domains) are exported without date partitioning since they represent current state rather than time-series data.

See the S3 integration documentation for detailed schema information.

Export Schedule and Timing

Batch Export Characteristics:

Frequency: 24-hour cycles
Scope: Organization-level (includes all workspaces)
Timing: Contact your Customer Success Manager for specific schedule
Partitioning: Daily partitions by event type

Common Considerations

Export Timing: Batch exports run on 24-hour cycles. The exact timing is configured during setup. Contact your Customer Success Manager for your specific schedule.

Incremental vs Snapshot: Understand the difference between incremental tables (events, aliases) that append data and snapshot tables (segment_metadata, domains) that replace data. Design your data pipelines accordingly.

Organization-Level Scope: S3 Batch routing operates at the organization level, exporting data for all workspaces within your organization, unlike streaming routing which is workspace-specific.

What Happens After Setup

Once batch routing is active:

Daily exports run automatically on the configured 24-hour schedule
Event data is partitioned by event type and date
Snapshot tables are replaced each export cycle
Files are written in your chosen format (JSON or Parquet)

Next Steps

S3 Integration

View full integration documentation

Back to Routing

Return to Routing overview

Guides

​Overview

​When to Choose S3 Batch

​Setup Steps

​Understanding S3 Batch Data Structure

​Folder Structure

​Data Types and Sync Modes

​Export Schedule and Timing

​Common Considerations

​What Happens After Setup

​Next Steps

S3 Integration

Back to Routing

Overview

When to Choose S3 Batch

Setup Steps

Understanding S3 Batch Data Structure

Folder Structure

Data Types and Sync Modes

Export Schedule and Timing

Common Considerations

What Happens After Setup

Next Steps