Skip to main content

AWS S3

AWS S3

DirectionBidirectional
Environment
WebiOSAndroidCTVAPI Direct
Capability
ConnectivityRouting
SDK RequiredNo
Product(s) Required
Core PlatformRouting

AWS S3 allows publishers to securely store and manage large volumes of advertising and audience data in the cloud.

Overview

The AWS S3 integration enables publishers to leverage Permutive’s bi-directional data capabilities with their S3 storage. This integration operates in two modes: Routing (Destination): Export first-party event data from Permutive to S3 buckets. Permutive offers two distinct routing modes:
  • S3 Streaming: Near real-time streaming, ideal for low-latency data pipelines and analytics
  • S3 Batch: Daily scheduled exports, suitable for data warehouse ingestion and batch processing workflows
Read more in Routing documentation.
Routing capability requires the Routing package in addition to Core Platform. Contact your Customer Success Manager to enable Routing.
Connectivity (Source): Import audience data from your S3 storage into Permutive for cohort building and activation across your publisher inventory. Both Routing modes support exporting event data, identity data (aliases), and segment metadata to customer-owned S3 buckets with Hive-style partitioning and compression.

Environment Compatibility

EnvironmentSupportedNotes
WebYes
iOSYes
AndroidYes
CTVYes
API DirectYes

Prerequisites

For Routing (exporting data to S3):
  • AWS account with permissions to create S3 buckets and IAM users/policies
  • S3 bucket created in the appropriate AWS region
  • IAM user with programmatic access credentials (Access Key ID and Secret Access Key)
  • Ability to configure S3 bucket policies with specific permissions
  • Secure method to share AWS credentials with Permutive (1Password or GPG encryption recommended)

Setup

S3 Streaming routing exports your first-party event data to an S3 bucket in near real-time as GZIP-compressed JSONL files with Hive-style partitioning. Data arrives with approximately 5-minute latency, making it ideal for low-latency data pipelines and ingestion into AWS services such as Athena, Redshift, or EMR.Setup requires coordination with Permutive support. You will need to prepare your AWS environment (S3 bucket, IAM user with programmatic access) and then share your bucket details and credentials with the Permutive team.

Prerequisites

  • AWS account with permissions to create S3 buckets and IAM users/policies
  • An S3 bucket in a region-specific location (e.g., us-east-1, eu-west-1) with public access blocked
  • A dedicated IAM user with s3:List*, s3:Get*, s3:Delete*, and s3:Put* permissions on the bucket
  • A secure method to share AWS credentials with Permutive (1Password or GPG encryption recommended)
For complete setup steps, see Setting up S3 Streaming Routing.

What Happens After Setup

Once routing is active:
  1. Files stream to S3 in near real-time with approximately 5-minute latency
  2. Hive-style partitions are created automatically by hour
  3. Event data is written as GZIP-compressed JSONL files
  4. File naming follows the pattern {timestamp}-{hash}-{worker_id}.jsonl.gz
See the Streaming Schema section below for detailed schema information.

Data Types

Streaming Schema

Events in S3 Streaming are exported in newline-delimited JSON format with the following structure:
time
string
required
Unix timestamp in milliseconds as a string
event_id
string
required
Unique identifier for this event
user_id
string
required
Permutive user ID
event_name
string
required
Name of the event (e.g., Pageview, slotclicked)
organization_id
string
required
Organization identifier
project_id
string
required
Workspace/project identifier
session_id
string
Session identifier (optional)
view_id
string
Page view identifier (optional)
source_url
string
Source URL (optional)
segments
array[integer]
Array of segment IDs the user belongs to
properties
object
Custom event properties as key-value pairs

Example Event

{
  "time": "1665851625945",
  "event_id": "c0b8266d-3c4d-43d6-8855-6f42d657adda",
  "user_id": "87bcd76b-5eb6-4c46-afa8-017d1e7148ca",
  "event_name": "slotclicked",
  "organization_id": "be668577-07f5-444d-98e0-222b990951b1",
  "project_id": "72f6d4b5-1e85-4c79-b4f9-da2dd1f3be6d",
  "session_id": "4a96de87-f8b1-4240-a1a8-7b9c6cff569a",
  "view_id": "16f2af62-f38d-44d1-bcea-ba5b4da39be2",
  "source_url": null,
  "segments": [],
  "properties": {
    "campaign_id": 2387641642,
    "line_item_id": 4792767025
  }
}
Identity synchronization events contain cross-device and identity mapping data.
time
string
required
Unix timestamp in milliseconds as a string
user_id
string
required
Permutive user ID
organization_id
string
required
Organization identifier
project_id
string
required
Workspace/project identifier
aliases
array
required
Array of alias objects, each containing:
  • id: The alias identifier value
  • tag: The alias type (e.g., email_sha256, device_id)

Example Sync Alias

{
  "time": "1665663771749",
  "user_id": "b5653712-26ee-41a8-8b30-c128092df93b",
  "organization_id": "be668577-07f5-444d-98e0-222b990951b1",
  "project_id": "be668577-07f5-444d-98e0-222b990951b1",
  "aliases": [
    {"id": "a1b2c3d4e5f6...", "tag": "email_sha256"},
    {"id": "device_12345", "tag": "device_id"}
  ]
}
Segment metadata snapshots containing segment definitions. These files are NOT date-partitioned.
id
string
required
Segment UUID
code
integer
required
Segment number/ID used in the segments array of events
name
string
required
Human-readable segment name
tags
array[string]
Array of tags associated with the segment
metadata
object
Additional segment metadata
workspace
string
required
Workspace identifier
ancestors
array[string]
Array of ancestor workspace/organization IDs
workspaceState
string
State of the workspace (e.g., “Active”, “Deleted”)
deleted
boolean
Whether the segment has been deleted

Example Segment

{
  "id": "5289b895-4ee7-44f8-81a6-1899142ed2d2",
  "code": 1057,
  "name": "High Value Users",
  "tags": [],
  "metadata": {},
  "workspace": "45582cb9-bb5c-4eb4-9c0d-7a2cebf4eeb1",
  "ancestors": ["45582cb9-bb5c-4eb4-9c0d-7a2cebf4eeb1", "be668577-07f5-444d-98e0-222b990951b1"],
  "workspaceState": "Active",
  "deleted": false
}

Batch Schema

Batch exports create separate tables for each event type (e.g., pageview_events, videoview_events). All event tables share a common structure:
time
timestamp
Timestamp for when the event was received by Permutive (in UTC)
event_id
string
Unique identifier for each individual event
user_id
string
Identifier unique to a particular user
session_id
string
Identifier unique to a user’s session. Sessions last 30 minutes unless a user stays on site
view_id
string
Identifier unique to a particular page or screen view
workspace_id
string
Identifier for the workspace which the event originated from
segments
array[integer]
A list of all segment IDs the user was in when the event fired
cohorts
array[string]
A list of all cohort codes the user was in when the event fired
properties
object
Event-specific properties as a nested object. Structure varies by event type.

Example Pageview Event

{
  "time": "2026-01-15T14:30:00Z",
  "event_id": "c0b8266d-3c4d-43d6-8855-6f42d657adda",
  "user_id": "87bcd76b-5eb6-4c46-afa8-017d1e7148ca",
  "session_id": "4a96de87-f8b1-4240-a1a8-7b9c6cff569a",
  "view_id": "16f2af62-f38d-44d1-bcea-ba5b4da39be2",
  "workspace_id": "72f6d4b5-1e85-4c79-b4f9-da2dd1f3be6d",
  "segments": [123, 456],
  "cohorts": ["abc123", "def456"],
  "properties": {
    "client": {
      "domain": "example.com",
      "type": "web",
      "url": "https://example.com/article",
      "referrer": "https://google.com",
      "title": "Example Article",
      "user_agent": "Mozilla/5.0..."
    }
  }
}
Identity data and alias mappings for cross-device tracking.
time
timestamp
required
Timestamp when the alias was captured
event_type
string
Type of alias event
permutive_id
string
required
Permutive user identifier
id
string
required
External identity value (e.g., hashed email, device ID)
tag
string
required
Identity tag or namespace (e.g., email_sha256, device_id)
workspace_id
string
Workspace identifier

Example Alias

{
  "time": "2026-01-15T14:30:00Z",
  "event_type": "alias_sync",
  "permutive_id": "87bcd76b-5eb6-4c46-afa8-017d1e7148ca",
  "id": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2",
  "tag": "email_sha256",
  "workspace_id": "72f6d4b5-1e85-4c79-b4f9-da2dd1f3be6d"
}
Domain-level metadata. This is a snapshot table that is fully replaced with each export.
name
string
required
Domain name
workspace_id
string
Workspace identifier

Example Domain

{
  "name": "example.com",
  "workspace_id": "72f6d4b5-1e85-4c79-b4f9-da2dd1f3be6d"
}
Segment definitions and metadata. This is a snapshot table that is fully replaced with each export.
number
integer
required
Segment ID number
name
string
required
Segment name
tags
array[string]
Array of tags associated with the segment
metadata
string
JSON string containing additional segment metadata
workspace_id
string
Workspace identifier

Example Segment Metadata

{
  "number": 123,
  "name": "High Value Users",
  "tags": ["advertising", "premium"],
  "metadata": "{\"description\": \"Users with high engagement\"}",
  "workspace_id": "72f6d4b5-1e85-4c79-b4f9-da2dd1f3be6d"
}

File Formats and Compression

  • Format: Newline-delimited JSON (.jsonl)
  • Compression: GZIP (.gz)
  • File Extension: .jsonl.gz
  • Character Encoding: UTF-8

JSON Format

  • Format: Newline-delimited JSON
  • Compression: GZIP
  • File Extension: .json.gz
  • Character Encoding: UTF-8

Parquet Format

  • Format: Apache Parquet columnar format
  • Compression: Snappy
  • File Extension: .snappy.parquet
  • Schema: Derived from BigQuery table structure
Parquet format is recommended for data warehouse ingestion and analytics workloads due to better compression and query performance.

Troubleshooting

Symptom: Files are not appearing in S3 bucket, or logs show permission errors.Solution:
  1. Verify the IAM user has all required permissions:
    • s3:PutObject
    • s3:GetObject
    • s3:DeleteObject
    • s3:ListBucket
  2. Check that the bucket policy includes the correct bucket ARN:
    "Resource": [
      "arn:aws:s3:::YOUR_BUCKET_NAME/*",
      "arn:aws:s3:::YOUR_BUCKET_NAME"
    ]
    
  3. Verify the bucket-owner-full-control ACL condition is correctly configured
  4. Ensure the IAM user credentials (Access Key ID and Secret Access Key) are current and not expired
If you recently rotated AWS credentials, contact Permutive support at [email protected] to update the stored credentials.
Symptom: No files appearing in S3 bucket after setup, or files stopped appearing.Solution:
  1. Verify the Permutive SDK is properly deployed and events are being collected (check Event Inspector in the Dashboard)
  2. Low-traffic sites may see longer delays between files due to batch size thresholds
  3. Verify the bucket region matches the configured region:
    • Region must be specific (e.g., eu-central-1, not just EU)
  4. Verify bucket path structure is correct:
    s3://{bucket}/{prefix}/type=events/year=YYYY/month=MM/day=DD/hour=HH/
    
  5. If issues persist, contact Permutive support at [email protected] with your integration details
Symptom: Daily batch exports are missing or delayed.Solution:
  1. Batch exports run on 24-hour cycles. Check if sufficient time has passed since the last export window.
  2. Verify the Permutive SDK is properly deployed and events are being collected
  3. Contact Permutive support at [email protected] to check batch export job logs and status
Symptom: Files appearing in unexpected locations or wrong folder structure.Solution:
  1. Verify the bucketPrefix configuration:
    • Should NOT include leading / unless intentional
    • Should NOT include bucket name
    • Example: permutive/ not /permutive/ or s3://bucket/permutive/
  2. For Streaming, data uses Hive-style partitioning:
    • type=events/year=2026/month=01/day=15/hour=14/
    • This is expected behavior and cannot be customized
  3. For Batch, data is organized by table name:
    • data/{table_name}/year=2026/month=1/day=15/
    • This is expected behavior and cannot be customized
Symptom: AWS returns validation errors when applying bucket policy.Solution:
  1. Ensure the bucket policy JSON is valid:
    • Check for missing commas, brackets, or quotes
    • Use AWS Policy Generator or an online JSON validator
  2. Verify ARN format is correct:
    • Bucket ARN: arn:aws:s3:::BUCKET_NAME
    • Object ARN: arn:aws:s3:::BUCKET_NAME/*
    • Note the three colons ::: before bucket name
  3. Confirm the StringEquals condition is correctly formatted:
    "Condition": {
      "StringEquals": {"s3:x-amz-acl": "bucket-owner-full-control"}
    }
    
Symptom: Some event types or fields are not appearing in exported data.Solution:
  1. Verify events are being collected in Permutive:
    • Check Event Inspector in the Dashboard to confirm events are tracked
    • Use browser developer console to verify SDK is firing events
  2. Check event schema matches expected structure:
    • Events must include required fields: event_id, user_id, event_name, etc.
    • Custom properties are in the properties object
  3. Schema changes may require integration reconfiguration:
    • Contact Permutive support if you’ve made significant schema changes
Symptom: Errors related to KMS encryption when writing to S3.Solution:
  1. If using customer-managed KMS keys, verify the Permutive IAM user has KMS permissions:
    {
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:Encrypt",
        "kms:GenerateDataKey"
      ],
      "Resource": "arn:aws:kms:REGION:ACCOUNT_ID:key/KEY_ID"
    }
    
  2. Confirm the KMS key policy allows the Permutive IAM user to use the key
  3. Verify the S3 bucket’s default encryption settings are compatible
AWS-managed S3 encryption (SSE-S3) is supported by default. Customer-managed KMS keys require additional configuration. Contact Permutive support for KMS requirements.

Changelog

No changes listed yet. For detailed changelog information, visit our Changelog.