Connecting to Amazon S3

Overview

This guide walks you through connecting your Amazon S3 bucket to Permutive so you can import data for audience building and activation. You’ll learn how to structure your S3 bucket, configure the required permissions, and create a connection in the Permutive dashboard.

Prerequisites:

An AWS account with access to the S3 bucket you want to connect
Permission to modify the S3 bucket policy
Your data organized in the required directory structure (see below)

Key Concepts

Before setting up your connection, familiarize yourself with these terms:

Term	Description
Bucket Root	The root name of the bucket without the `s3://` prefix and without any trailing slashes or prefixes
S3 Prefix	A path within your S3 bucket where tables are stored
Schema	A group of tables, represented by an S3 prefix location
Table	A single table within Permutive, represented by a prefix under the schema prefix
Data file	The files containing your data (CSV or Parquet format)
Hive Partition	An S3 prefix in Hive partition format (e.g., `date=2025-01-01` or `region=EU`)

Step 1: Set Up Your Bucket Structure

Permutive uses the concept of a Schema (containing multiple Tables) to organize your data. Since S3 doesn’t have native schema or table concepts, you’ll need to structure your bucket in a specific way.

Schema Directory Structure

Organize your bucket so that each table is a directory under your schema prefix:

s3://<bucket_name>/<prefix>/<table_1>/
s3://<bucket_name>/<prefix>/<table_2>/
s3://<bucket_name>/<prefix>/<table_n>/

When you provide the prefix to Permutive, every directory under that prefix is treated as a table.

You can have multiple prefixes representing different schemas, each with multiple tables. Each schema prefix requires a separate Connection in Permutive.

Table Directory Structure

Within each table directory, you can organize your data files in one of two ways:

Partitioned (Recommended)
Non-Partitioned

We recommend partitioning your data, especially for event or user activity tables. Partitioning reduces costs by allowing Permutive to filter data upfront when querying.Single partition:

s3://<bucket_name>/<prefix>/<table_n>/<partition_name>=<value>/<data_file>.csv

Multiple partitions:

s3://<bucket_name>/<prefix>/<table_n>/<partition_1>=<value>/<partition_2>=<value>/<data_file>.csv

Partition names become columns in your dataset, with the partition value populating the rows for all files under that partition.

For non-partitioned tables, place your data files directly under the table prefix:

s3://<bucket_name>/<prefix>/<table_n>/<data_file>.csv

Permutive will scan for all files under the table prefix, regardless of subdirectory depth. For example, all these files would be included:

s3://<bucket_name>/<prefix>/<table_n>/<data_file_1>.csv
s3://<bucket_name>/<prefix>/<table_n>/<inner_prefix>/<data_file_2>.csv
s3://<bucket_name>/<prefix>/<table_n>/<inner_1>/<inner_2>/<data_file_3>.csv

In non-partitioned mode, Hive partition prefixes are ignored. The partition information won’t be extracted as columns.

Data Format Recommendations

Parquet Format (Recommended)

We highly recommend using Parquet due to its columnar storage benefits, which significantly improve query performance and reduce storage size.For Parquet files, we specifically recommend using the ZSTD compression codec to maximize storage efficiency and speed up data processing.

CSV Format

We support:

.csv (uncompressed CSV)
.gz (gzipped CSV)

For CSV files, especially large datasets, gzipping is highly recommended to reduce storage costs and enhance processing speed.

All tables under a schema should use the same data format (either all CSV or all Parquet).

Step 2: Configure Bucket Permissions

Permutive needs permission to read data from your S3 bucket. You’ll add an S3 Bucket Policy that grants Permutive read-only access.

Start Creating the Connection

In the Permutive dashboard, go to Connectivity > Catalog and select Amazon S3. Begin entering your connection details (covered in Step 3). Once you enter your bucket name, Permutive will generate a bucket policy for you.

Copy the Generated Policy

Copy the S3 Bucket Policy displayed in the Permutive dashboard. It will look similar to this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<PermutiveAWSAccountId>:root"
      },
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::<YourBucketName>",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalArn": "arn:aws:iam::<PermutiveAWSAccountId>:role/<PermutiveCustomerSpecificRole>"
        }
      }
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<PermutiveAWSAccountId>:root"
      },
      "Action": "s3:GetObject",
      "Resource": [
        "arn:aws:s3:::<YourBucketName>",
        "arn:aws:s3:::<YourBucketName>/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:PrincipalArn": "arn:aws:iam::<PermutiveAWSAccountId>:role/<PermutiveCustomerSpecificRole>"
        }
      }
    }
  ]
}

This policy grants Permutive the following permissions:

s3:ListBucket — List the contents of your bucket
s3:GetObject — Read objects from your bucket

Add the Policy to Your Bucket

Open the AWS Console and navigate to your S3 bucket
Go to the Permissions tab
Click Edit on the Bucket Policy section
Paste the policy generated from the Permutive dashboard
Save your changes

If you’ve already added the policy to your bucket and want to use a new location within the same bucket, you don’t need to re-add the policy.

Step 3: Create the Connection

Select Amazon S3 from the Catalog

In the Permutive dashboard, go to Connectivity > Catalog and select Amazon S3.

Enter Your Connection Details

Fill in the following fields:

Field	Description
Name	A descriptive name for your connection in Permutive
AWS Bucket Region	The region where your bucket is located (only supported regions are shown)
AWS Bucket Name	The bucket name without any prefixes or suffixes (e.g., for `s3://my-bucket/*`, enter `my-bucket`)
AWS Bucket Schema Prefix	The prefix path to your schema location, without a leading slash (e.g., `data/audiences/`)
Data Format	Choose Parquet (recommended) or CSV
Data Partitioning	Select whether all tables are partitioned or no tables are partitioned

Data Partitioning Behavior:

If set to “All tables are partitioned” — non-partitioned tables will be ignored
If set to “No tables are partitioned” — partition prefixes will be ignored and treated as regular directories

Add the Bucket Policy

Before completing the connection, ensure you’ve added the generated bucket policy to your S3 bucket (see Step 2).

Create the Connection

Click Save to create the connection. It will appear on your Connections page with a “Processing” status while Permutive validates access. Once validated, the status changes to “Active”.

Step 4: Create an Import

Once your connection is active, you can create imports to bring data into Permutive.

Navigate to Imports

Go to Connectivity > Imports and click Create Import.

Configure the Import

Select Amazon S3 as the source type
Select your S3 connection
The schema prefix will be pre-selected (there’s only one per connection)
Choose from the list of discovered tables
Continue with the standard import configuration

For more details on configuring imports, see Imports.

Troubleshooting

Connection fails to validate

If your connection remains in “Processing” status or fails:

Verify the bucket policy has been correctly applied
Check that the bucket name and region are correct
Ensure the schema prefix exists and contains table directories

Solution: Double-check your AWS bucket policy in the S3 console and verify the bucket name matches exactly what you entered in Permutive.

Tables not appearing

If you don’t see expected tables after creating the connection:

Verify your directory structure matches the required format
Check that data files exist under each table directory
Ensure the data format setting matches your actual file format

Solution: Review your S3 bucket structure and ensure each table is a direct subdirectory of the schema prefix. After making changes, run a schema resync in Permutive to refresh the available tables.

Partition data not being extracted

If partition columns aren’t appearing in your data:

Verify “All tables are partitioned” is selected in Data Partitioning
Check that partition directories use the correct Hive format (column=value)

Solution: Update your connection settings or restructure your partition directories.

Guides

Connecting to Amazon S3

Overview

Key Concepts

Step 1: Set Up Your Bucket Structure

Schema Directory Structure

Table Directory Structure

Data Format Recommendations

Step 2: Configure Bucket Permissions

Step 3: Create the Connection

Step 4: Create an Import

Troubleshooting

Next Steps

Create an Import

Back to Sources

Guides

​Overview

​Key Concepts

​Step 1: Set Up Your Bucket Structure

​Schema Directory Structure

​Table Directory Structure

​Data Format Recommendations

​Step 2: Configure Bucket Permissions

​Step 3: Create the Connection

​Step 4: Create an Import

​Troubleshooting

​Next Steps

Create an Import

Back to Sources

Overview

Key Concepts

Step 1: Set Up Your Bucket Structure

Schema Directory Structure

Table Directory Structure

Data Format Recommendations

Step 2: Configure Bucket Permissions

Step 3: Create the Connection

Step 4: Create an Import

Troubleshooting

Next Steps