> ## Documentation Index
> Fetch the complete documentation index at: https://docs.permutive.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Connecting to Google Cloud Storage

> How to set up a connection to Google Cloud Storage (GCS) to import data into Permutive

## Overview

This guide walks you through connecting your Google Cloud Storage (GCS) bucket to Permutive so you can import data for audience building and activation. Permutive offers two connection options: connecting to your own GCS bucket, or having Permutive provision a bucket for you.

<Info>
  **Prerequisites:**

  * A Google Cloud Platform (GCP) account
  * For customer-owned buckets: access to manage IAM permissions on your GCS bucket
  * Your data organized in the required directory structure (see below)
</Info>

## Key Concepts

Before setting up your connection, familiarize yourself with these terms:

| Term           | Permutive Context                 | GCS Context                                                                |
| :------------- | :-------------------------------- | :------------------------------------------------------------------------- |
| Schema         | A logical grouping of tables      | A static prefix (folder) that contains multiple table subdirectories       |
| Table          | A single dataset within Permutive | A sub-prefix (subdirectory) under the schema prefix                        |
| Data File      | Files that store the actual data  | Files within the table prefix, often organized under Hive-style partitions |
| Bucket Root    | N/A                               | The root name of your GCS bucket (e.g., `my-bucket-name`)                  |
| GCS Prefix     | N/A                               | A path within your bucket where tables are located (e.g., `data/events/`)  |
| Hive Partition | N/A                               | A directory structure used to segment data (e.g., `date=2025-01-01`)       |

## Step 1: Set Up Your Bucket Structure

Since GCS doesn't have built-in schema or table concepts, your bucket must follow a specific prefix (folder) structure so Permutive can correctly infer and manage your data imports.

<Note>
  The Permutive platform maps one Connection to one Schema Prefix. To manage multiple logical schemas, you must create distinct prefixes and therefore distinct connections.
</Note>

### Schema Directory Structure

Structure your bucket with a single GCS prefix under which all tables reside:

```
gs://<bucket_name>/<schema_prefix>/<table_1>/
gs://<bucket_name>/<schema_prefix>/<table_2>/
gs://<bucket_name>/<schema_prefix>/<table_n>/
```

### Table Directory Structure

The structure within a table prefix depends on whether you enable Data Partitioning during connection setup.

<Tabs>
  <Tab title="Partitioned (Recommended)">
    Supports Hive-style partitioning. Files are read only from the deepest partition level.

    **Single partition:**

    ```
    gs://<bucket_name>/<schema_prefix>/<table_n>/date=2025-01-01/<data_file>.csv
    ```

    **Multiple partitions:**

    ```
    gs://<bucket_name>/<schema_prefix>/<table_n>/date=2025-01-01/region=EU/<data_file>.csv
    ```

    We recommend partitioning all data where possible, especially for event or user activity tables, as it improves query performance and cost-efficiency.
  </Tab>

  <Tab title="Non-Partitioned">
    Permutive scans all files under the table prefix, regardless of their subdirectory depth.

    ```
    gs://<bucket_name>/<schema_prefix>/<table_n>/<data_file>.csv
    gs://<bucket_name>/<schema_prefix>/<table_n>/inner1/inner2/<data_file>.csv
    ```

    <Warning>
      In non-partitioned mode, any Hive-style partitions (e.g., `date=...`) will be treated as part of the file path, and their partitioning meaning will be ignored.
    </Warning>
  </Tab>
</Tabs>

### Data Format Recommendations

<AccordionGroup>
  <Accordion title="Parquet Format (Recommended)">
    We highly recommend using Parquet due to its columnar storage benefits, which significantly improve query performance and reduce storage size.

    For Parquet files, we recommend using the **ZSTD compression codec** to maximize storage efficiency and speed up data processing.
  </Accordion>

  <Accordion title="CSV Format">
    We support:

    * `.csv` (uncompressed CSV)
    * `.gz` (gzipped CSV)

    For CSV files, especially large datasets, **gzipping is highly recommended** to reduce storage costs and enhance processing speed.
  </Accordion>
</AccordionGroup>

<Note>
  All tables under a schema should use the same data format (either all CSV or all Parquet).
</Note>

## Step 2: Create the Connection

Permutive offers two connection options for GCS:

<Tabs>
  <Tab title="Customer Owned Bucket">
    Use this option to connect to an existing GCS bucket that you manage.

    ### Configure Bucket Permissions

    Before creating the connection, grant Permutive read access to your GCS bucket by assigning IAM roles to Permutive's service account: `connection@permutive.com`

    **Required roles:**

    * `roles/storage.objectViewer`
    * `roles/storage.bucketViewer`

    <Steps>
      <Step title="Open IAM Settings">
        In the Google Cloud Console, navigate to your GCS bucket and open the **Permissions** tab.
      </Step>

      <Step title="Add Permutive Service Account">
        Click **Grant Access** and add `connection@permutive.com` as a principal.
      </Step>

      <Step title="Assign Roles">
        Assign the `Storage Object Viewer` and `Storage Legacy Bucket Reader` roles (or equivalent).
      </Step>

      <Step title="Save">
        Save the IAM policy changes.
      </Step>
    </Steps>

    <Info>
      If you use fine-grained permissions or folder-level access, apply these permissions to the specific prefixes you intend to connect. If access is already granted for a parent prefix or the entire bucket, no further IAM changes are needed.
    </Info>

    ### Create the Connection in Permutive

    <Steps>
      <Step title="Select GCS from the Catalog">
        In the Permutive dashboard, go to **Connectivity > Catalog** and select **Google Cloud Storage**. Choose **Customer Owned Bucket**.
      </Step>

      <Step title="Enter Connection Details">
        Fill in the following fields:

        | Field                 | Description                                                              |
        | :-------------------- | :----------------------------------------------------------------------- |
        | **Name**              | A descriptive name for your connection in Permutive                      |
        | **GCP Project ID**    | The GCP project that the bucket belongs to                               |
        | **GCS Bucket Region** | Select from the available regions for your workspace                     |
        | **GCS Bucket Name**   | The full bucket name (without `gs://` prefix)                            |
        | **Schema Prefix**     | The prefix within the bucket that acts as your schema (no leading slash) |
        | **Data Format**       | Choose **Parquet** (recommended) or **CSV**                              |
        | **Data Partitioning** | Select whether all tables are partitioned or no tables are partitioned   |
      </Step>

      <Step title="Save the Connection">
        Click **Save** to create the connection. It will appear on your **Connections** page with a "Processing" status while Permutive validates access.
      </Step>
    </Steps>
  </Tab>

  <Tab title="Permutive Provisioned Bucket">
    Use this option to have Permutive create and manage a GCS bucket for you.

    <Steps>
      <Step title="Select GCS from the Catalog">
        In the Permutive dashboard, go to **Connectivity > Catalog** and select **Google Cloud Storage**. Choose **Permutive Provisioned Bucket**.
      </Step>

      <Step title="Enter Connection Details">
        Fill in the following fields:

        | Field                      | Description                                                                                   |
        | :------------------------- | :-------------------------------------------------------------------------------------------- |
        | **Name**                   | A descriptive name for your connection in Permutive                                           |
        | **Upload-Access Accounts** | GCP principals (email addresses) granted permission to upload files. At least one is required |
        | **Read-Access Accounts**   | Optional GCP principals granted read-only permission to files                                 |
        | **Data Format**            | Choose **Parquet** (recommended) or **CSV**                                                   |
        | **Data Partitioning**      | Select whether all tables are partitioned or no tables are partitioned                        |

        The default Schema Prefix is `/`. The bucket region will automatically match the region where your data resides within BigQuery.
      </Step>

      <Step title="Save the Connection">
        Click **Save** to create the connection. Permutive will provision the bucket and display the bucket name on the Connection Details page once complete.
      </Step>
    </Steps>

    ### Access Accounts

    When specifying GCP principals, you can choose from three types:

    * **Group** (recommended)
    * **User**
    * **Service Account**

    <Note>
      **We strongly recommend using Google Workspace group email addresses** over individual user accounts. Using groups allows you to manage user access without contacting Permutive. When a user is removed from your organization, their access is automatically revoked if they were part of a group.
    </Note>

    **IAM roles applied:**

    * Upload-Access accounts receive: `roles/storage.objectCreator` and `roles/storage.objectViewer`
    * Read-Access accounts receive: `roles/storage.objectViewer`
  </Tab>
</Tabs>

## Step 3: Create an Import

Once your connection is active, you can create imports to bring data into Permutive.

<Steps>
  <Step title="Navigate to Imports">
    Go to **Connectivity > Imports** and click **Create Import**.
  </Step>

  <Step title="Configure the Import">
    1. Select **Google Cloud Storage** as the source type
    2. Select your GCS connection
    3. Choose the discovered schema (matches the prefix defined in your connection)
    4. Choose from the list of detected tables
    5. Continue with the standard import configuration
  </Step>
</Steps>

For more details on configuring imports, see [Imports](/products/connectivity/imports).

## Limitations

<Warning>
  **Important limitations to be aware of:**

  * **Partitioning Standard**: Only Hive-style partitioning is supported
  * **Mixed Partitioning**: Not supported in a single schema connection. All tables must either be partitioned or non-partitioned
  * **Schema Evolution**: Column changes (additions/removals) are not supported for GCS imports. If your column structure changes, you'll need to create a new connection
</Warning>

## Troubleshooting

<AccordionGroup>
  <Accordion title="Connection fails to validate">
    If your connection remains in "Processing" status or fails:

    * Verify the IAM permissions have been correctly applied to `connection@permutive.com`
    * Check that the bucket name and project ID are correct
    * Ensure the schema prefix exists and contains table directories

    **Solution:** Double-check your IAM settings in the Google Cloud Console and verify the bucket name matches exactly what you entered in Permutive.
  </Accordion>

  <Accordion title="Tables not appearing">
    If you don't see expected tables after creating the connection:

    * Verify your directory structure matches the required format
    * Check that data files exist under each table directory
    * Ensure the data format setting matches your actual file format

    **Solution:** Review your GCS bucket structure and ensure each table is a direct subdirectory of the schema prefix. After making changes, run a schema resync in Permutive to refresh the available tables.
  </Accordion>

  <Accordion title="Partition data not being extracted">
    If partition columns aren't appearing in your data:

    * Verify "All tables are partitioned" is selected in Data Partitioning
    * Check that partition directories use the correct Hive format (`column=value`)

    **Solution:** Update your connection settings or restructure your partition directories.
  </Accordion>

  <Accordion title="Where can I find the bucket name for a Permutive Provisioned Bucket?">
    The bucket name is generated upon connection creation. You can find the full GCS Bucket Name on the **Connection Details** page immediately after setup is complete.
  </Accordion>
</AccordionGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="Create an Import" icon="download" href="/products/connectivity/imports">
    Learn how to import data from your GCS connection
  </Card>

  <Card title="Back to Sources" icon="arrow-left" href="/products/connectivity/sources">
    Return to Sources overview
  </Card>
</CardGroup>
