Skip to main content

Overview

This guide walks you through connecting your Google Cloud Storage (GCS) bucket to Permutive so you can import data for audience building and activation. Permutive offers two connection options: connecting to your own GCS bucket, or having Permutive provision a bucket for you.
Prerequisites:
  • A Google Cloud Platform (GCP) account
  • For customer-owned buckets: access to manage IAM permissions on your GCS bucket
  • Your data organized in the required directory structure (see below)

Key Concepts

Before setting up your connection, familiarize yourself with these terms:
TermPermutive ContextGCS Context
SchemaA logical grouping of tablesA static prefix (folder) that contains multiple table subdirectories
TableA single dataset within PermutiveA sub-prefix (subdirectory) under the schema prefix
Data FileFiles that store the actual dataFiles within the table prefix, often organized under Hive-style partitions
Bucket RootN/AThe root name of your GCS bucket (e.g., my-bucket-name)
GCS PrefixN/AA path within your bucket where tables are located (e.g., data/events/)
Hive PartitionN/AA directory structure used to segment data (e.g., date=2025-01-01)

Step 1: Set Up Your Bucket Structure

Since GCS doesn’t have built-in schema or table concepts, your bucket must follow a specific prefix (folder) structure so Permutive can correctly infer and manage your data imports.
The Permutive platform maps one Connection to one Schema Prefix. To manage multiple logical schemas, you must create distinct prefixes and therefore distinct connections.

Schema Directory Structure

Structure your bucket with a single GCS prefix under which all tables reside:
gs://<bucket_name>/<schema_prefix>/<table_1>/
gs://<bucket_name>/<schema_prefix>/<table_2>/
gs://<bucket_name>/<schema_prefix>/<table_n>/

Table Directory Structure

The structure within a table prefix depends on whether you enable Data Partitioning during connection setup.

Data Format Recommendations

We support:
  • .csv (uncompressed CSV)
  • .gz (gzipped CSV)
For CSV files, especially large datasets, gzipping is highly recommended to reduce storage costs and enhance processing speed.
All tables under a schema should use the same data format (either all CSV or all Parquet).

Step 2: Create the Connection

Permutive offers two connection options for GCS:
Use this option to connect to an existing GCS bucket that you manage.

Configure Bucket Permissions

Before creating the connection, grant Permutive read access to your GCS bucket by assigning IAM roles to Permutive’s service account: [email protected]Required roles:
  • roles/storage.objectViewer
  • roles/storage.bucketViewer
1

Open IAM Settings

In the Google Cloud Console, navigate to your GCS bucket and open the Permissions tab.
2

Add Permutive Service Account

Click Grant Access and add [email protected] as a principal.
3

Assign Roles

Assign the Storage Object Viewer and Storage Legacy Bucket Reader roles (or equivalent).
4

Save

Save the IAM policy changes.
If you use fine-grained permissions or folder-level access, apply these permissions to the specific prefixes you intend to connect. If access is already granted for a parent prefix or the entire bucket, no further IAM changes are needed.

Create the Connection in Permutive

1

Select GCS from the Catalog

In the Permutive dashboard, go to Connectivity > Catalog and select Google Cloud Storage. Choose Customer Owned Bucket.
2

Enter Connection Details

Fill in the following fields:
FieldDescription
NameA descriptive name for your connection in Permutive
GCP Project IDThe GCP project that the bucket belongs to
GCS Bucket RegionSelect from the available regions for your workspace
GCS Bucket NameThe full bucket name (without gs:// prefix)
Schema PrefixThe prefix within the bucket that acts as your schema (no leading slash)
Data FormatChoose Parquet (recommended) or CSV
Data PartitioningSelect whether all tables are partitioned or no tables are partitioned
3

Save the Connection

Click Save to create the connection. It will appear on your Connections page with a “Processing” status while Permutive validates access.

Step 3: Create an Import

Once your connection is active, you can create imports to bring data into Permutive.
1

Navigate to Imports

Go to Connectivity > Imports and click Create Import.
2

Configure the Import

  1. Select Google Cloud Storage as the source type
  2. Select your GCS connection
  3. Choose the discovered schema (matches the prefix defined in your connection)
  4. Choose from the list of detected tables
  5. Continue with the standard import configuration
For more details on configuring imports, see Imports.

Limitations

Important limitations to be aware of:
  • Partitioning Standard: Only Hive-style partitioning is supported
  • Mixed Partitioning: Not supported in a single schema connection. All tables must either be partitioned or non-partitioned
  • Schema Evolution: Column changes (additions/removals) are not supported for GCS imports. If your column structure changes, you’ll need to create a new connection

Troubleshooting

If your connection remains in “Processing” status or fails:
  • Verify the IAM permissions have been correctly applied to [email protected]
  • Check that the bucket name and project ID are correct
  • Ensure the schema prefix exists and contains table directories
Solution: Double-check your IAM settings in the Google Cloud Console and verify the bucket name matches exactly what you entered in Permutive.
If you don’t see expected tables after creating the connection:
  • Verify your directory structure matches the required format
  • Check that data files exist under each table directory
  • Ensure the data format setting matches your actual file format
Solution: Review your GCS bucket structure and ensure each table is a direct subdirectory of the schema prefix. After making changes, run a schema resync in Permutive to refresh the available tables.
If partition columns aren’t appearing in your data:
  • Verify “All tables are partitioned” is selected in Data Partitioning
  • Check that partition directories use the correct Hive format (column=value)
Solution: Update your connection settings or restructure your partition directories.
The bucket name is generated upon connection creation. You can find the full GCS Bucket Name on the Connection Details page immediately after setup is complete.

Next Steps