Overview
Set up S3 Streaming routing to export your first-party event data to your AWS S3 bucket in near real-time as GZIP-compressed JSONL files. This guide provides context for choosing S3 Streaming and what to expect after setup.Prerequisites:
- AWS account with S3 bucket creation permissions
- IAM permissions to create users and policies
- S3 bucket in the appropriate AWS region
- Secure method to share AWS credentials with Permutive
When to Choose S3 Streaming
Best for:- Organizations using AWS as their primary cloud provider
- Teams needing raw event files for custom processing pipelines
- Publishers requiring data in S3 for ingestion into other AWS services (Athena, Redshift, EMR)
- Organizations preferring file-based data over database connections
- You prefer automatic schema management in a database (consider BigQuery or Snowflake)
- You need immediate SQL query access without additional setup
Setup Steps
S3 Streaming routing requires coordination with Permutive support.Prepare Your AWS Environment
Before contacting Permutive, ensure you have:
- An S3 bucket with public access blocked
- A dedicated IAM user with programmatic access
- The IAM user granted
s3:List*,s3:Get*,s3:Delete*, ands3:Put*permissions on the bucket - Access Key credentials for the IAM user
Use a region-specific location (e.g.,
us-east-1, eu-west-1) rather than generic regions.Contact Permutive Support
Email [email protected] with:
- Bucket Name
- Bucket Region (e.g.,
us-east-1) - Bucket Prefix (optional, e.g.,
permutive/) - Access Key ID
- Secret Access Key (use encrypted channel)
- Routing Mode: Streaming
Use 1Password shared vaults or GPG-encrypted emails to securely transmit credentials.
Understanding S3 Streaming Data Structure
S3 Streaming uses Hive-style partitioning to organize data efficiently:Folder Structure
File Format
- Format: Newline-delimited JSON (JSONL)
- Compression: GZIP (
.gz) - Extension:
.jsonl.gz - Encoding: UTF-8
Data Types Exported
| Data Type | Description | Partitioned |
|---|---|---|
events | User behavioral events | Yes (hourly) |
sync_aliases | Identity synchronization data | Yes (hourly) |
segment | Segment metadata snapshots | No |
Common Considerations
Latency: S3 Streaming has approximately 5-minute latency from event collection to file availability in S3.
What Happens After Setup
Once routing is active:- Files stream to S3 in near real-time with approximately 5-minute latency
- Hive-style partitions are created automatically by hour
- Event data is written as GZIP-compressed JSONL files
- File naming follows the pattern
{timestamp}-{hash}-{worker_id}.jsonl.gz