> ## Documentation Index
> Fetch the complete documentation index at: https://docs.permutive.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Connecting to Amazon S3

> How to set up a connection to Amazon S3 to import data into Permutive

## Overview

This guide walks you through connecting your Amazon S3 bucket to Permutive so you can import data for audience building and activation. You'll learn how to structure your S3 bucket, configure the required permissions, and create a connection in the Permutive dashboard.

<Info>
  **Prerequisites:**

  * An AWS account with access to the S3 bucket you want to connect
  * Permission to modify the S3 bucket policy
  * Your data organized in the required directory structure (see below)
</Info>

## Key Concepts

Before setting up your connection, familiarize yourself with these terms:

| Term           | Description                                                                                         |
| :------------- | :-------------------------------------------------------------------------------------------------- |
| Bucket Root    | The root name of the bucket without the `s3://` prefix and without any trailing slashes or prefixes |
| S3 Prefix      | A path within your S3 bucket where tables are stored                                                |
| Schema         | A group of tables, represented by an S3 prefix location                                             |
| Table          | A single table within Permutive, represented by a prefix under the schema prefix                    |
| Data file      | The files containing your data (CSV or Parquet format)                                              |
| Hive Partition | An S3 prefix in Hive partition format (e.g., `date=2025-01-01` or `region=EU`)                      |

## Step 1: Set Up Your Bucket Structure

Permutive uses the concept of a **Schema** (containing multiple **Tables**) to organize your data. Since S3 doesn't have native schema or table concepts, you'll need to structure your bucket in a specific way.

### Schema Directory Structure

Organize your bucket so that each table is a directory under your schema prefix:

```
s3://<bucket_name>/<prefix>/<table_1>/
s3://<bucket_name>/<prefix>/<table_2>/
s3://<bucket_name>/<prefix>/<table_n>/
```

When you provide the prefix to Permutive, every directory under that prefix is treated as a table.

<Note>
  You can have multiple prefixes representing different schemas, each with multiple tables. Each schema prefix requires a separate Connection in Permutive.
</Note>

### Table Directory Structure

Within each table directory, you can organize your data files in one of two ways:

<Tabs>
  <Tab title="Partitioned (Recommended)">
    We recommend partitioning your data, especially for event or user activity tables. Partitioning reduces costs by allowing Permutive to filter data upfront when querying.

    **Single partition:**

    ```
    s3://<bucket_name>/<prefix>/<table_n>/<partition_name>=<value>/<data_file>.csv
    ```

    **Multiple partitions:**

    ```
    s3://<bucket_name>/<prefix>/<table_n>/<partition_1>=<value>/<partition_2>=<value>/<data_file>.csv
    ```

    Partition names become columns in your dataset, with the partition value populating the rows for all files under that partition.
  </Tab>

  <Tab title="Non-Partitioned">
    For non-partitioned tables, place your data files directly under the table prefix:

    ```
    s3://<bucket_name>/<prefix>/<table_n>/<data_file>.csv
    ```

    Permutive will scan for all files under the table prefix, regardless of subdirectory depth. For example, all these files would be included:

    ```
    s3://<bucket_name>/<prefix>/<table_n>/<data_file_1>.csv
    s3://<bucket_name>/<prefix>/<table_n>/<inner_prefix>/<data_file_2>.csv
    s3://<bucket_name>/<prefix>/<table_n>/<inner_1>/<inner_2>/<data_file_3>.csv
    ```

    <Warning>
      In non-partitioned mode, Hive partition prefixes are ignored. The partition information won't be extracted as columns.
    </Warning>
  </Tab>
</Tabs>

### Data Format Recommendations

<AccordionGroup>
  <Accordion title="Parquet Format (Recommended)">
    We highly recommend using Parquet due to its columnar storage benefits, which significantly improve query performance and reduce storage size.

    For Parquet files, we specifically recommend using the **ZSTD compression codec** to maximize storage efficiency and speed up data processing.
  </Accordion>

  <Accordion title="CSV Format">
    We support:

    * `.csv` (uncompressed CSV)
    * `.gz` (gzipped CSV)

    For CSV files, especially large datasets, **gzipping is highly recommended** to reduce storage costs and enhance processing speed.
  </Accordion>
</AccordionGroup>

<Note>
  All tables under a schema should use the same data format (either all CSV or all Parquet).
</Note>

## Step 2: Configure Bucket Permissions

Permutive needs permission to read data from your S3 bucket. You'll add an S3 Bucket Policy that grants Permutive read-only access.

<Steps>
  <Step title="Start Creating the Connection">
    In the Permutive dashboard, go to **Connectivity > Catalog** and select **Amazon S3**. Begin entering your connection details (covered in Step 3). Once you enter your bucket name, Permutive will generate a bucket policy for you.
  </Step>

  <Step title="Copy the Generated Policy">
    Copy the S3 Bucket Policy displayed in the Permutive dashboard. It will look similar to this:

    ```json theme={"dark"}
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<PermutiveAWSAccountId>:root"
          },
          "Action": "s3:ListBucket",
          "Resource": "arn:aws:s3:::<YourBucketName>",
          "Condition": {
            "StringEquals": {
              "aws:PrincipalArn": "arn:aws:iam::<PermutiveAWSAccountId>:role/<PermutiveCustomerSpecificRole>"
            }
          }
        },
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<PermutiveAWSAccountId>:root"
          },
          "Action": "s3:GetObject",
          "Resource": [
            "arn:aws:s3:::<YourBucketName>",
            "arn:aws:s3:::<YourBucketName>/*"
          ],
          "Condition": {
            "StringEquals": {
              "aws:PrincipalArn": "arn:aws:iam::<PermutiveAWSAccountId>:role/<PermutiveCustomerSpecificRole>"
            }
          }
        }
      ]
    }
    ```

    This policy grants Permutive the following permissions:

    * `s3:ListBucket` — List the contents of your bucket
    * `s3:GetObject` — Read objects from your bucket
  </Step>

  <Step title="Add the Policy to Your Bucket">
    1. Open the AWS Console and navigate to your S3 bucket
    2. Go to the **Permissions** tab
    3. Click **Edit** on the Bucket Policy section
    4. Paste the policy generated from the Permutive dashboard
    5. Save your changes
  </Step>
</Steps>

<Info>
  If you've already added the policy to your bucket and want to use a new location within the same bucket, you don't need to re-add the policy.
</Info>

## Step 3: Create the Connection

<Steps>
  <Step title="Select Amazon S3 from the Catalog">
    In the Permutive dashboard, go to **Connectivity > Catalog** and select **Amazon S3**.
  </Step>

  <Step title="Enter Your Connection Details">
    Fill in the following fields:

    | Field                        | Description                                                                                        |
    | :--------------------------- | :------------------------------------------------------------------------------------------------- |
    | **Name**                     | A descriptive name for your connection in Permutive                                                |
    | **AWS Bucket Region**        | The region where your bucket is located (only supported regions are shown)                         |
    | **AWS Bucket Name**          | The bucket name without any prefixes or suffixes (e.g., for `s3://my-bucket/*`, enter `my-bucket`) |
    | **AWS Bucket Schema Prefix** | The prefix path to your schema location, without a leading slash (e.g., `data/audiences/`)         |
    | **Data Format**              | Choose **Parquet** (recommended) or **CSV**                                                        |
    | **Data Partitioning**        | Select whether all tables are partitioned or no tables are partitioned                             |

    <Warning>
      **Data Partitioning Behavior:**

      * If set to "All tables are partitioned" — non-partitioned tables will be ignored
      * If set to "No tables are partitioned" — partition prefixes will be ignored and treated as regular directories
    </Warning>
  </Step>

  <Step title="Add the Bucket Policy">
    Before completing the connection, ensure you've added the generated bucket policy to your S3 bucket (see Step 2).
  </Step>

  <Step title="Create the Connection">
    Click **Save** to create the connection. It will appear on your **Connections** page with a "Processing" status while Permutive validates access. Once validated, the status changes to "Active".
  </Step>
</Steps>

## Step 4: Create an Import

Once your connection is active, you can create imports to bring data into Permutive.

<Steps>
  <Step title="Navigate to Imports">
    Go to **Connectivity > Imports** and click **Create Import**.
  </Step>

  <Step title="Configure the Import">
    1. Select **Amazon S3** as the source type
    2. Select your S3 connection
    3. The schema prefix will be pre-selected (there's only one per connection)
    4. Choose from the list of discovered tables
    5. Continue with the standard import configuration
  </Step>
</Steps>

For more details on configuring imports, see [Imports](/products/connectivity/imports).

## Troubleshooting

<AccordionGroup>
  <Accordion title="Connection fails to validate">
    If your connection remains in "Processing" status or fails:

    * Verify the bucket policy has been correctly applied
    * Check that the bucket name and region are correct
    * Ensure the schema prefix exists and contains table directories

    **Solution:** Double-check your AWS bucket policy in the S3 console and verify the bucket name matches exactly what you entered in Permutive.
  </Accordion>

  <Accordion title="Tables not appearing">
    If you don't see expected tables after creating the connection:

    * Verify your directory structure matches the required format
    * Check that data files exist under each table directory
    * Ensure the data format setting matches your actual file format

    **Solution:** Review your S3 bucket structure and ensure each table is a direct subdirectory of the schema prefix. After making changes, run a schema resync in Permutive to refresh the available tables.
  </Accordion>

  <Accordion title="Partition data not being extracted">
    If partition columns aren't appearing in your data:

    * Verify "All tables are partitioned" is selected in Data Partitioning
    * Check that partition directories use the correct Hive format (`column=value`)

    **Solution:** Update your connection settings or restructure your partition directories.
  </Accordion>
</AccordionGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="Create an Import" icon="download" href="/products/connectivity/imports">
    Learn how to import data from your S3 connection
  </Card>

  <Card title="Back to Sources" icon="arrow-left" href="/products/connectivity/sources">
    Return to Sources overview
  </Card>
</CardGroup>
