Overview
This guide walks you through connecting your Amazon S3 bucket to Permutive so you can import data for audience building and activation. You’ll learn how to structure your S3 bucket, configure the required permissions, and create a connection in the Permutive dashboard.Prerequisites:
- An AWS account with access to the S3 bucket you want to connect
- Permission to modify the S3 bucket policy
- Your data organized in the required directory structure (see below)
Key Concepts
Before setting up your connection, familiarize yourself with these terms:| Term | Description |
|---|---|
| Bucket Root | The root name of the bucket without the s3:// prefix and without any trailing slashes or prefixes |
| S3 Prefix | A path within your S3 bucket where tables are stored |
| Schema | A group of tables, represented by an S3 prefix location |
| Table | A single table within Permutive, represented by a prefix under the schema prefix |
| Data file | The files containing your data (CSV or Parquet format) |
| Hive Partition | An S3 prefix in Hive partition format (e.g., date=2025-01-01 or region=EU) |
Step 1: Set Up Your Bucket Structure
Permutive uses the concept of a Schema (containing multiple Tables) to organize your data. Since S3 doesn’t have native schema or table concepts, you’ll need to structure your bucket in a specific way.Schema Directory Structure
Organize your bucket so that each table is a directory under your schema prefix:You can have multiple prefixes representing different schemas, each with multiple tables. Each schema prefix requires a separate Connection in Permutive.
Table Directory Structure
Within each table directory, you can organize your data files in one of two ways:- Partitioned (Recommended)
- Non-Partitioned
We recommend partitioning your data, especially for event or user activity tables. Partitioning reduces costs by allowing Permutive to filter data upfront when querying.Single partition:Multiple partitions:Partition names become columns in your dataset, with the partition value populating the rows for all files under that partition.
Data Format Recommendations
Parquet Format (Recommended)
Parquet Format (Recommended)
We highly recommend using Parquet due to its columnar storage benefits, which significantly improve query performance and reduce storage size.For Parquet files, we specifically recommend using the ZSTD compression codec to maximize storage efficiency and speed up data processing.
CSV Format
CSV Format
We support:
.csv(uncompressed CSV).gz(gzipped CSV)
All tables under a schema should use the same data format (either all CSV or all Parquet).
Step 2: Configure Bucket Permissions
Permutive needs permission to read data from your S3 bucket. You’ll add an S3 Bucket Policy that grants Permutive read-only access.Start Creating the Connection
In the Permutive dashboard, go to Connectivity > Catalog and select Amazon S3. Begin entering your connection details (covered in Step 3). Once you enter your bucket name, Permutive will generate a bucket policy for you.
Copy the Generated Policy
Copy the S3 Bucket Policy displayed in the Permutive dashboard. It will look similar to this:This policy grants Permutive the following permissions:
s3:ListBucket— List the contents of your buckets3:GetObject— Read objects from your bucket
If you’ve already added the policy to your bucket and want to use a new location within the same bucket, you don’t need to re-add the policy.
Step 3: Create the Connection
Select Amazon S3 from the Catalog
In the Permutive dashboard, go to Connectivity > Catalog and select Amazon S3.
Enter Your Connection Details
Fill in the following fields:
| Field | Description |
|---|---|
| Name | A descriptive name for your connection in Permutive |
| AWS Bucket Region | The region where your bucket is located (only supported regions are shown) |
| AWS Bucket Name | The bucket name without any prefixes or suffixes (e.g., for s3://my-bucket/*, enter my-bucket) |
| AWS Bucket Schema Prefix | The prefix path to your schema location, without a leading slash (e.g., data/audiences/) |
| Data Format | Choose Parquet (recommended) or CSV |
| Data Partitioning | Select whether all tables are partitioned or no tables are partitioned |
Add the Bucket Policy
Before completing the connection, ensure you’ve added the generated bucket policy to your S3 bucket (see Step 2).
Step 4: Create an Import
Once your connection is active, you can create imports to bring data into Permutive.
For more details on configuring imports, see Imports.
Troubleshooting
Connection fails to validate
Connection fails to validate
If your connection remains in “Processing” status or fails:
- Verify the bucket policy has been correctly applied
- Check that the bucket name and region are correct
- Ensure the schema prefix exists and contains table directories
Tables not appearing
Tables not appearing
If you don’t see expected tables after creating the connection:
- Verify your directory structure matches the required format
- Check that data files exist under each table directory
- Ensure the data format setting matches your actual file format
Partition data not being extracted
Partition data not being extracted
If partition columns aren’t appearing in your data:
- Verify “All tables are partitioned” is selected in Data Partitioning
- Check that partition directories use the correct Hive format (
column=value)