Skip to main content

Overview

Set up S3 Streaming routing to export your first-party event data to your AWS S3 bucket in near real-time as GZIP-compressed JSONL files. This guide provides context for choosing S3 Streaming and what to expect after setup.
Prerequisites:
  • AWS account with S3 bucket creation permissions
  • IAM permissions to create users and policies
  • S3 bucket in the appropriate AWS region
  • Secure method to share AWS credentials with Permutive

When to Choose S3 Streaming

Best for:
  • Organizations using AWS as their primary cloud provider
  • Teams needing raw event files for custom processing pipelines
  • Publishers requiring data in S3 for ingestion into other AWS services (Athena, Redshift, EMR)
  • Organizations preferring file-based data over database connections
Consider alternatives if:
  • You prefer automatic schema management in a database (consider BigQuery or Snowflake)
  • You need immediate SQL query access without additional setup

Setup Steps

S3 Streaming routing requires coordination with Permutive support.
1

Prepare Your AWS Environment

Before contacting Permutive, ensure you have:
  • An S3 bucket with public access blocked
  • A dedicated IAM user with programmatic access
  • The IAM user granted s3:List*, s3:Get*, s3:Delete*, and s3:Put* permissions on the bucket
  • Access Key credentials for the IAM user
Use a region-specific location (e.g., us-east-1, eu-west-1) rather than generic regions.
2

Contact Permutive Support

Email [email protected] with:
  • Bucket Name
  • Bucket Region (e.g., us-east-1)
  • Bucket Prefix (optional, e.g., permutive/)
  • Access Key ID
  • Secret Access Key (use encrypted channel)
  • Routing Mode: Streaming
Use 1Password shared vaults or GPG-encrypted emails to securely transmit credentials.
3

Setup Completion

Permutive will configure your routing instance and notify you when the integration is live.

Understanding S3 Streaming Data Structure

S3 Streaming uses Hive-style partitioning to organize data efficiently:

Folder Structure

s3://bucket/prefix/
├── type=events/
│   ├── year=2026/
│   │   ├── month=01/
│   │   │   ├── day=15/
│   │   │   │   ├── hour=14/
│   │   │   │   │   └── 2026-01-15T14:00:00.000000Z-abc123-worker1.jsonl.gz
│   │   │   │   └── hour=15/
├── type=sync_aliases/
│   └── year=2026/month=01/day=15/hour=14/...
└── type=segment/
    └── timestamp-hash-worker.jsonl.gz

File Format

  • Format: Newline-delimited JSON (JSONL)
  • Compression: GZIP (.gz)
  • Extension: .jsonl.gz
  • Encoding: UTF-8

Data Types Exported

Data TypeDescriptionPartitioned
eventsUser behavioral eventsYes (hourly)
sync_aliasesIdentity synchronization dataYes (hourly)
segmentSegment metadata snapshotsNo
See the S3 integration documentation for detailed schema information.

Common Considerations

Latency: S3 Streaming has approximately 5-minute latency from event collection to file availability in S3.
Bucket Prefix: Use a bucket prefix (e.g., permutive/) to organize Permutive data separately from other data in your bucket. The prefix should NOT include a leading / or the bucket name.
KMS Encryption: If using customer-managed KMS keys for S3 encryption, ensure the Permutive IAM user has appropriate KMS permissions (kms:Encrypt, kms:Decrypt, kms:GenerateDataKey). Contact Technical Services for KMS requirements.

What Happens After Setup

Once routing is active:
  1. Files stream to S3 in near real-time with approximately 5-minute latency
  2. Hive-style partitions are created automatically by hour
  3. Event data is written as GZIP-compressed JSONL files
  4. File naming follows the pattern {timestamp}-{hash}-{worker_id}.jsonl.gz

Next Steps