Overview
Set up S3 Batch routing to export your first-party event data to your AWS S3 bucket on a scheduled 24-hour cycle. Choose between JSON (GZIP compressed) or Parquet (Snappy compressed) format based on your data processing needs.Prerequisites:
- AWS account with S3 bucket creation permissions
- IAM permissions to create users and policies
- S3 bucket in the appropriate AWS region
- Secure method to share AWS credentials with Permutive
- Understanding of your data processing requirements (JSON vs Parquet)
When to Choose S3 Batch
Best for:- Organizations needing scheduled daily exports instead of real-time streaming
- Teams ingesting data into data warehouses on a batch schedule
- Publishers requiring Parquet format for efficient columnar storage
- Organizations preferring predictable export schedules over continuous streaming
- You need near real-time data access (use S3 Streaming, BigQuery, or Snowflake)
- You require sub-hourly data updates
- You prefer automatic schema management in a database
Setup Steps
S3 Batch routing requires coordination with Permutive support.Choose Your Data Format
Decide between JSON and Parquet formats:
| Format | Best For |
|---|---|
| JSON (GZIP) | Human-readable, easier debugging, broader tool compatibility |
| Parquet (Snappy) | Analytics workloads, better compression, data warehouse ingestion |
Most organizations choose Parquet for production data pipelines due to better compression and query performance.
Contact Permutive Support
Email [email protected] with:
- Bucket Name
- Bucket Prefix (optional, e.g.,
permutive/) - Format: JSON or Parquet
Attach Bucket Policy
Attach the Permutive-provided bucket policy to your S3 bucket’s permissions, then notify Permutive support to confirm.
Understanding S3 Batch Data Structure
S3 Batch exports use Hive-style partitioning organized by event type and date:Folder Structure
Data Types and Sync Modes
| Data Type | Description | Sync Mode |
|---|---|---|
Events (e.g., pageview_events) | User behavioral events | Incremental (append) |
aliases | Identity data and alias mappings | Incremental (append) |
domains | Domain-level metadata | Snapshot (full replace) |
segment_metadata | Segment definitions and metadata | Snapshot (full replace) |
Snapshot tables (segment_metadata, domains) are exported without date partitioning since they represent current state rather than time-series data.
Export Schedule and Timing
Batch Export Characteristics:- Frequency: 24-hour cycles
- Scope: Organization-level (includes all workspaces)
- Timing: Contact your Customer Success Manager for specific schedule
- Partitioning: Daily partitions by event type
Common Considerations
Export Timing: Batch exports run on 24-hour cycles. The exact timing is configured during setup. Contact your Customer Success Manager for your specific schedule.
What Happens After Setup
Once batch routing is active:- Daily exports run automatically on the configured 24-hour schedule
- Event data is partitioned by event type and date
- Snapshot tables are replaced each export cycle
- Files are written in your chosen format (JSON or Parquet)