Connecting to Google Cloud Storage

Overview

This guide walks you through connecting your Google Cloud Storage (GCS) bucket to Permutive so you can import data for audience building and activation. Permutive offers two connection options: connecting to your own GCS bucket, or having Permutive provision a bucket for you.

Prerequisites:

A Google Cloud Platform (GCP) account
For customer-owned buckets: access to manage IAM permissions on your GCS bucket
Your data organized in the required directory structure (see below)

Key Concepts

Before setting up your connection, familiarize yourself with these terms:

Term	Permutive Context	GCS Context
Schema	A logical grouping of tables	A static prefix (folder) that contains multiple table subdirectories
Table	A single dataset within Permutive	A sub-prefix (subdirectory) under the schema prefix
Data File	Files that store the actual data	Files within the table prefix, often organized under Hive-style partitions
Bucket Root	N/A	The root name of your GCS bucket (e.g., `my-bucket-name`)
GCS Prefix	N/A	A path within your bucket where tables are located (e.g., `data/events/`)
Hive Partition	N/A	A directory structure used to segment data (e.g., `date=2025-01-01`)

Step 1: Set Up Your Bucket Structure

Since GCS doesn’t have built-in schema or table concepts, your bucket must follow a specific prefix (folder) structure so Permutive can correctly infer and manage your data imports.

The Permutive platform maps one Connection to one Schema Prefix. To manage multiple logical schemas, you must create distinct prefixes and therefore distinct connections.

Schema Directory Structure

Structure your bucket with a single GCS prefix under which all tables reside:

gs://<bucket_name>/<schema_prefix>/<table_1>/
gs://<bucket_name>/<schema_prefix>/<table_2>/
gs://<bucket_name>/<schema_prefix>/<table_n>/

Table Directory Structure

The structure within a table prefix depends on whether you enable Data Partitioning during connection setup.

Partitioned (Recommended)
Non-Partitioned

Supports Hive-style partitioning. Files are read only from the deepest partition level.Single partition:

gs://<bucket_name>/<schema_prefix>/<table_n>/date=2025-01-01/<data_file>.csv

Multiple partitions:

gs://<bucket_name>/<schema_prefix>/<table_n>/date=2025-01-01/region=EU/<data_file>.csv

We recommend partitioning all data where possible, especially for event or user activity tables, as it improves query performance and cost-efficiency.

Permutive scans all files under the table prefix, regardless of their subdirectory depth.

gs://<bucket_name>/<schema_prefix>/<table_n>/<data_file>.csv
gs://<bucket_name>/<schema_prefix>/<table_n>/inner1/inner2/<data_file>.csv

In non-partitioned mode, any Hive-style partitions (e.g., date=...) will be treated as part of the file path, and their partitioning meaning will be ignored.

Data Format Recommendations

Parquet Format (Recommended)

We highly recommend using Parquet due to its columnar storage benefits, which significantly improve query performance and reduce storage size.For Parquet files, we recommend using the ZSTD compression codec to maximize storage efficiency and speed up data processing.

CSV Format

We support:

.csv (uncompressed CSV)
.gz (gzipped CSV)

For CSV files, especially large datasets, gzipping is highly recommended to reduce storage costs and enhance processing speed.

All tables under a schema should use the same data format (either all CSV or all Parquet).

Step 2: Create the Connection

Permutive offers two connection options for GCS:

Customer Owned Bucket
Permutive Provisioned Bucket

Use this option to connect to an existing GCS bucket that you manage.

Configure Bucket Permissions

Before creating the connection, grant Permutive read access to your GCS bucket by assigning IAM roles to Permutive’s service account: [email protected]Required roles:

roles/storage.objectViewer
roles/storage.bucketViewer

Open IAM Settings

In the Google Cloud Console, navigate to your GCS bucket and open the Permissions tab.

Add Permutive Service Account

Click Grant Access and add [email protected] as a principal.

Assign Roles

Assign the Storage Object Viewer and Storage Legacy Bucket Reader roles (or equivalent).

Save

Save the IAM policy changes.

If you use fine-grained permissions or folder-level access, apply these permissions to the specific prefixes you intend to connect. If access is already granted for a parent prefix or the entire bucket, no further IAM changes are needed.

Create the Connection in Permutive

Select GCS from the Catalog

In the Permutive dashboard, go to Connectivity > Catalog and select Google Cloud Storage. Choose Customer Owned Bucket.

Enter Connection Details

Fill in the following fields:

Field	Description
Name	A descriptive name for your connection in Permutive
GCP Project ID	The GCP project that the bucket belongs to
GCS Bucket Region	Select from the available regions for your workspace
GCS Bucket Name	The full bucket name (without `gs://` prefix)
Schema Prefix	The prefix within the bucket that acts as your schema (no leading slash)
Data Format	Choose Parquet (recommended) or CSV
Data Partitioning	Select whether all tables are partitioned or no tables are partitioned

Save the Connection

Click Save to create the connection. It will appear on your Connections page with a “Processing” status while Permutive validates access.

Use this option to have Permutive create and manage a GCS bucket for you.

Select GCS from the Catalog

In the Permutive dashboard, go to Connectivity > Catalog and select Google Cloud Storage. Choose Permutive Provisioned Bucket.

Enter Connection Details

Fill in the following fields:

Field	Description
Name	A descriptive name for your connection in Permutive
Upload-Access Accounts	GCP principals (email addresses) granted permission to upload files. At least one is required
Read-Access Accounts	Optional GCP principals granted read-only permission to files
Data Format	Choose Parquet (recommended) or CSV
Data Partitioning	Select whether all tables are partitioned or no tables are partitioned

The default Schema Prefix is /. The bucket region will automatically match the region where your data resides within BigQuery.

Save the Connection

Click Save to create the connection. Permutive will provision the bucket and display the bucket name on the Connection Details page once complete.

Access Accounts

When specifying GCP principals, you can choose from three types:

Group (recommended)
User
Service Account

We strongly recommend using Google Workspace group email addresses over individual user accounts. Using groups allows you to manage user access without contacting Permutive. When a user is removed from your organization, their access is automatically revoked if they were part of a group.

IAM roles applied:

Upload-Access accounts receive: roles/storage.objectCreator and roles/storage.objectViewer
Read-Access accounts receive: roles/storage.objectViewer

Step 3: Create an Import

Once your connection is active, you can create imports to bring data into Permutive.

Navigate to Imports

Go to Connectivity > Imports and click Create Import.

Configure the Import

Select Google Cloud Storage as the source type
Select your GCS connection
Choose the discovered schema (matches the prefix defined in your connection)
Choose from the list of detected tables
Continue with the standard import configuration

For more details on configuring imports, see Imports.

Limitations

Important limitations to be aware of:

Partitioning Standard: Only Hive-style partitioning is supported
Mixed Partitioning: Not supported in a single schema connection. All tables must either be partitioned or non-partitioned
Schema Evolution: Column changes (additions/removals) are not supported for GCS imports. If your column structure changes, you’ll need to create a new connection

Troubleshooting

Connection fails to validate

If your connection remains in “Processing” status or fails:

Verify the IAM permissions have been correctly applied to [email protected]
Check that the bucket name and project ID are correct
Ensure the schema prefix exists and contains table directories

Solution: Double-check your IAM settings in the Google Cloud Console and verify the bucket name matches exactly what you entered in Permutive.

Tables not appearing

If you don’t see expected tables after creating the connection:

Verify your directory structure matches the required format
Check that data files exist under each table directory
Ensure the data format setting matches your actual file format

Solution: Review your GCS bucket structure and ensure each table is a direct subdirectory of the schema prefix. After making changes, run a schema resync in Permutive to refresh the available tables.

Partition data not being extracted

If partition columns aren’t appearing in your data:

Verify “All tables are partitioned” is selected in Data Partitioning
Check that partition directories use the correct Hive format (column=value)

Solution: Update your connection settings or restructure your partition directories.

Where can I find the bucket name for a Permutive Provisioned Bucket?

The bucket name is generated upon connection creation. You can find the full GCS Bucket Name on the Connection Details page immediately after setup is complete.

Guides

Connecting to Google Cloud Storage

Overview

Key Concepts

Step 1: Set Up Your Bucket Structure

Schema Directory Structure

Table Directory Structure

Data Format Recommendations

Step 2: Create the Connection

Configure Bucket Permissions

Create the Connection in Permutive

Access Accounts

Step 3: Create an Import

Limitations

Troubleshooting

Next Steps

Create an Import

Back to Sources

Guides

​Overview

​Key Concepts

​Step 1: Set Up Your Bucket Structure

​Schema Directory Structure

​Table Directory Structure

​Data Format Recommendations

​Step 2: Create the Connection

​Configure Bucket Permissions

​Create the Connection in Permutive

​Access Accounts

​Step 3: Create an Import

​Limitations

​Troubleshooting

​Next Steps

Create an Import

Back to Sources

Overview

Key Concepts

Step 1: Set Up Your Bucket Structure

Schema Directory Structure

Table Directory Structure

Data Format Recommendations

Step 2: Create the Connection

Configure Bucket Permissions

Create the Connection in Permutive

Access Accounts

Step 3: Create an Import

Limitations

Troubleshooting

Next Steps