> ## Documentation Index
> Fetch the complete documentation index at: https://docs.permutive.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Setting Up AWS S3 Batch Routing

> Configure scheduled daily exports to Amazon S3 in JSON or Parquet format

## Overview

Set up S3 Batch routing to export your first-party event data to your AWS S3 bucket on a scheduled 24-hour cycle. Choose between JSON (GZIP compressed) or Parquet (Snappy compressed) format based on your data processing needs.

<Info>
  **Prerequisites:**

  * AWS account with S3 bucket creation permissions
  * IAM permissions to create users and policies
  * S3 bucket in the appropriate AWS region
  * Secure method to share AWS credentials with Permutive
  * Understanding of your data processing requirements (JSON vs Parquet)
</Info>

## When to Choose S3 Batch

**Best for:**

* Organizations needing scheduled daily exports instead of real-time streaming
* Teams ingesting data into data warehouses on a batch schedule
* Publishers requiring Parquet format for efficient columnar storage
* Organizations preferring predictable export schedules over continuous streaming

**Consider alternatives if:**

* You need near real-time data access (use S3 Streaming, BigQuery, or Snowflake)
* You require sub-hourly data updates
* You prefer automatic schema management in a database

## Setup Steps

S3 Batch routing requires coordination with Permutive support.

<Steps>
  <Step title="Choose Your Data Format">
    Decide between JSON and Parquet formats:

    | Format               | Best For                                                          |
    | :------------------- | :---------------------------------------------------------------- |
    | **JSON** (GZIP)      | Human-readable, easier debugging, broader tool compatibility      |
    | **Parquet** (Snappy) | Analytics workloads, better compression, data warehouse ingestion |

    <Note>
      Most organizations choose Parquet for production data pipelines due to better compression and query performance.
    </Note>
  </Step>

  <Step title="Prepare Your AWS Environment">
    Ensure you have an S3 bucket ready with public access blocked.
  </Step>

  <Step title="Contact Permutive Support">
    Email [technical-services@permutive.com](mailto:technical-services@permutive.com) with:

    * **Bucket Name**
    * **Bucket Prefix** (optional, e.g., `permutive/`)
    * **Format:** JSON or Parquet

    Permutive will provide you with a bucket policy to attach to your S3 bucket.
  </Step>

  <Step title="Attach Bucket Policy">
    Attach the Permutive-provided bucket policy to your S3 bucket's permissions, then notify Permutive support to confirm.
  </Step>

  <Step title="Setup Completion">
    Permutive will complete the integration setup and notify you when data begins flowing on the next batch export cycle.
  </Step>
</Steps>

## Understanding S3 Batch Data Structure

S3 Batch exports use Hive-style partitioning organized by event type and date:

### Folder Structure

```
s3://bucket/prefix/data/
├── pageview_events/
│   ├── year=2026/month=1/day=15/
│   │   └── data-000000000000.json.gz
│   └── year=2026/month=1/day=16/
├── videoview_events/
│   └── year=2026/month=1/day=15/
├── aliases/
│   └── year=2026/month=1/day=15/
├── domains/
│   └── data-000000000000.json.gz
└── segment_metadata/
    └── data-000000000000.json.gz
```

### Data Types and Sync Modes

| Data Type                        | Description                      | Sync Mode               |
| -------------------------------- | -------------------------------- | ----------------------- |
| Events (e.g., `pageview_events`) | User behavioral events           | Incremental (append)    |
| `aliases`                        | Identity data and alias mappings | Incremental (append)    |
| `domains`                        | Domain-level metadata            | Snapshot (full replace) |
| `segment_metadata`               | Segment definitions and metadata | Snapshot (full replace) |

**Incremental tables** append new data each export cycle. **Snapshot tables** are fully replaced with each export to ensure the latest reference data.

<Note>
  Snapshot tables (segment\_metadata, domains) are exported without date partitioning since they represent current state rather than time-series data.
</Note>

See the [S3 integration documentation](/integrations/data-collaboration/data-warehouses/aws-s3#batch-schema) for detailed schema information.

## Export Schedule and Timing

**Batch Export Characteristics:**

* **Frequency:** 24-hour cycles
* **Scope:** Organization-level (includes all workspaces)
* **Timing:** Contact your Customer Success Manager for specific schedule
* **Partitioning:** Daily partitions by event type

## Common Considerations

<Note>
  **Export Timing:** Batch exports run on 24-hour cycles. The exact timing is configured during setup. Contact your Customer Success Manager for your specific schedule.
</Note>

<Tip>
  **Incremental vs Snapshot:** Understand the difference between incremental tables (events, aliases) that append data and snapshot tables (segment\_metadata, domains) that replace data. Design your data pipelines accordingly.
</Tip>

<Warning>
  **Organization-Level Scope:** S3 Batch routing operates at the organization level, exporting data for all workspaces within your organization, unlike streaming routing which is workspace-specific.
</Warning>

## What Happens After Setup

Once batch routing is active:

1. **Daily exports run automatically** on the configured 24-hour schedule
2. **Event data is partitioned** by event type and date
3. **Snapshot tables are replaced** each export cycle
4. **Files are written** in your chosen format (JSON or Parquet)

## Next Steps

<CardGroup cols={2}>
  <Card title="S3 Integration" icon="aws" href="/integrations/data-collaboration/data-warehouses/aws-s3">
    View full integration documentation
  </Card>

  <Card title="Back to Routing" icon="arrow-left" href="/products/connectivity/routing">
    Return to Routing overview
  </Card>
</CardGroup>
