> ## Documentation Index > Fetch the complete documentation index at: https://docs.permutive.com/llms.txt > Use this file to discover all available pages before exploring further. # Connecting to Amazon S3 > How to set up a connection to Amazon S3 to import data into Permutive ## Overview This guide walks you through connecting your Amazon S3 bucket to Permutive so you can import data for audience building and activation. You'll learn how to structure your S3 bucket, configure the required permissions, and create a connection in the Permutive dashboard. **Prerequisites:** * An AWS account with access to the S3 bucket you want to connect * Permission to modify the S3 bucket policy * Your data organized in the required directory structure (see below) ## Key Concepts Before setting up your connection, familiarize yourself with these terms: | Term | Description | | :------------- | :-------------------------------------------------------------------------------------------------- | | Bucket Root | The root name of the bucket without the `s3://` prefix and without any trailing slashes or prefixes | | S3 Prefix | A path within your S3 bucket where tables are stored | | Schema | A group of tables, represented by an S3 prefix location | | Table | A single table within Permutive, represented by a prefix under the schema prefix | | Data file | The files containing your data (CSV or Parquet format) | | Hive Partition | An S3 prefix in Hive partition format (e.g., `date=2025-01-01` or `region=EU`) | ## Step 1: Set Up Your Bucket Structure Permutive uses the concept of a **Schema** (containing multiple **Tables**) to organize your data. Since S3 doesn't have native schema or table concepts, you'll need to structure your bucket in a specific way. ### Schema Directory Structure Organize your bucket so that each table is a directory under your schema prefix: ``` s3:///// s3:///// s3:///// ``` When you provide the prefix to Permutive, every directory under that prefix is treated as a table. You can have multiple prefixes representing different schemas, each with multiple tables. Each schema prefix requires a separate Connection in Permutive. ### Table Directory Structure Within each table directory, you can organize your data files in one of two ways: We recommend partitioning your data, especially for event or user activity tables. Partitioning reduces costs by allowing Permutive to filter data upfront when querying. **Single partition:** ``` s3://///=/.csv ``` **Multiple partitions:** ``` s3://///=/=/.csv ``` Partition names become columns in your dataset, with the partition value populating the rows for all files under that partition. For non-partitioned tables, place your data files directly under the table prefix: ``` s3://///.csv ``` Permutive will scan for all files under the table prefix, regardless of subdirectory depth. For example, all these files would be included: ``` s3://///.csv s3://////.csv s3:///////.csv ``` In non-partitioned mode, Hive partition prefixes are ignored. The partition information won't be extracted as columns. ### Data Format Recommendations We highly recommend using Parquet due to its columnar storage benefits, which significantly improve query performance and reduce storage size. For Parquet files, we specifically recommend using the **ZSTD compression codec** to maximize storage efficiency and speed up data processing. We support: * `.csv` (uncompressed CSV) * `.gz` (gzipped CSV) For CSV files, especially large datasets, **gzipping is highly recommended** to reduce storage costs and enhance processing speed. All tables under a schema should use the same data format (either all CSV or all Parquet). ## Step 2: Configure Bucket Permissions Permutive needs permission to read data from your S3 bucket. You'll add an S3 Bucket Policy that grants Permutive read-only access. In the Permutive dashboard, go to **Connectivity > Catalog** and select **Amazon S3**. Begin entering your connection details (covered in Step 3). Once you enter your bucket name, Permutive will generate a bucket policy for you. Copy the S3 Bucket Policy displayed in the Permutive dashboard. It will look similar to this: ```json theme={"dark"} { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam:::root" }, "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::", "Condition": { "StringEquals": { "aws:PrincipalArn": "arn:aws:iam:::role/" } } }, { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam:::root" }, "Action": "s3:GetObject", "Resource": [ "arn:aws:s3:::", "arn:aws:s3:::/*" ], "Condition": { "StringEquals": { "aws:PrincipalArn": "arn:aws:iam:::role/" } } } ] } ``` This policy grants Permutive the following permissions: * `s3:ListBucket` — List the contents of your bucket * `s3:GetObject` — Read objects from your bucket 1. Open the AWS Console and navigate to your S3 bucket 2. Go to the **Permissions** tab 3. Click **Edit** on the Bucket Policy section 4. Paste the policy generated from the Permutive dashboard 5. Save your changes If you've already added the policy to your bucket and want to use a new location within the same bucket, you don't need to re-add the policy. ## Step 3: Create the Connection In the Permutive dashboard, go to **Connectivity > Catalog** and select **Amazon S3**. Fill in the following fields: | Field | Description | | :--------------------------- | :------------------------------------------------------------------------------------------------- | | **Name** | A descriptive name for your connection in Permutive | | **AWS Bucket Region** | The region where your bucket is located (only supported regions are shown) | | **AWS Bucket Name** | The bucket name without any prefixes or suffixes (e.g., for `s3://my-bucket/*`, enter `my-bucket`) | | **AWS Bucket Schema Prefix** | The prefix path to your schema location, without a leading slash (e.g., `data/audiences/`) | | **Data Format** | Choose **Parquet** (recommended) or **CSV** | | **Data Partitioning** | Select whether all tables are partitioned or no tables are partitioned | **Data Partitioning Behavior:** * If set to "All tables are partitioned" — non-partitioned tables will be ignored * If set to "No tables are partitioned" — partition prefixes will be ignored and treated as regular directories Before completing the connection, ensure you've added the generated bucket policy to your S3 bucket (see Step 2). Click **Save** to create the connection. It will appear on your **Connections** page with a "Processing" status while Permutive validates access. Once validated, the status changes to "Active". ## Step 4: Create an Import Once your connection is active, you can create imports to bring data into Permutive. Go to **Connectivity > Imports** and click **Create Import**. 1. Select **Amazon S3** as the source type 2. Select your S3 connection 3. The schema prefix will be pre-selected (there's only one per connection) 4. Choose from the list of discovered tables 5. Continue with the standard import configuration For more details on configuring imports, see [Imports](/products/connectivity/imports). ## Troubleshooting If your connection remains in "Processing" status or fails: * Verify the bucket policy has been correctly applied * Check that the bucket name and region are correct * Ensure the schema prefix exists and contains table directories **Solution:** Double-check your AWS bucket policy in the S3 console and verify the bucket name matches exactly what you entered in Permutive. If you don't see expected tables after creating the connection: * Verify your directory structure matches the required format * Check that data files exist under each table directory * Ensure the data format setting matches your actual file format **Solution:** Review your S3 bucket structure and ensure each table is a direct subdirectory of the schema prefix. After making changes, run a schema resync in Permutive to refresh the available tables. If partition columns aren't appearing in your data: * Verify "All tables are partitioned" is selected in Data Partitioning * Check that partition directories use the correct Hive format (`column=value`) **Solution:** Update your connection settings or restructure your partition directories. ## Next Steps Learn how to import data from your S3 connection Return to Sources overview