Overview
This guide walks you through connecting your Google Cloud Storage (GCS) bucket to Permutive so you can import data for audience building and activation. Permutive offers two connection options: connecting to your own GCS bucket, or having Permutive provision a bucket for you.Prerequisites:
- A Google Cloud Platform (GCP) account
- For customer-owned buckets: access to manage IAM permissions on your GCS bucket
- Your data organized in the required directory structure (see below)
Key Concepts
Before setting up your connection, familiarize yourself with these terms:| Term | Permutive Context | GCS Context |
|---|---|---|
| Schema | A logical grouping of tables | A static prefix (folder) that contains multiple table subdirectories |
| Table | A single dataset within Permutive | A sub-prefix (subdirectory) under the schema prefix |
| Data File | Files that store the actual data | Files within the table prefix, often organized under Hive-style partitions |
| Bucket Root | N/A | The root name of your GCS bucket (e.g., my-bucket-name) |
| GCS Prefix | N/A | A path within your bucket where tables are located (e.g., data/events/) |
| Hive Partition | N/A | A directory structure used to segment data (e.g., date=2025-01-01) |
Step 1: Set Up Your Bucket Structure
Since GCS doesn’t have built-in schema or table concepts, your bucket must follow a specific prefix (folder) structure so Permutive can correctly infer and manage your data imports.The Permutive platform maps one Connection to one Schema Prefix. To manage multiple logical schemas, you must create distinct prefixes and therefore distinct connections.
Schema Directory Structure
Structure your bucket with a single GCS prefix under which all tables reside:Table Directory Structure
The structure within a table prefix depends on whether you enable Data Partitioning during connection setup.- Partitioned (Recommended)
- Non-Partitioned
Supports Hive-style partitioning. Files are read only from the deepest partition level.Single partition:Multiple partitions:We recommend partitioning all data where possible, especially for event or user activity tables, as it improves query performance and cost-efficiency.
Data Format Recommendations
Parquet Format (Recommended)
Parquet Format (Recommended)
We highly recommend using Parquet due to its columnar storage benefits, which significantly improve query performance and reduce storage size.For Parquet files, we recommend using the ZSTD compression codec to maximize storage efficiency and speed up data processing.
CSV Format
CSV Format
We support:
.csv(uncompressed CSV).gz(gzipped CSV)
All tables under a schema should use the same data format (either all CSV or all Parquet).
Step 2: Create the Connection
Permutive offers two connection options for GCS:- Customer Owned Bucket
- Permutive Provisioned Bucket
Use this option to connect to an existing GCS bucket that you manage.
Configure Bucket Permissions
Before creating the connection, grant Permutive read access to your GCS bucket by assigning IAM roles to Permutive’s service account:[email protected]Required roles:roles/storage.objectViewerroles/storage.bucketViewer
Open IAM Settings
In the Google Cloud Console, navigate to your GCS bucket and open the Permissions tab.
Add Permutive Service Account
Click Grant Access and add
[email protected] as a principal.Assign Roles
Assign the
Storage Object Viewer and Storage Legacy Bucket Reader roles (or equivalent).If you use fine-grained permissions or folder-level access, apply these permissions to the specific prefixes you intend to connect. If access is already granted for a parent prefix or the entire bucket, no further IAM changes are needed.
Create the Connection in Permutive
Select GCS from the Catalog
In the Permutive dashboard, go to Connectivity > Catalog and select Google Cloud Storage. Choose Customer Owned Bucket.
Enter Connection Details
Fill in the following fields:
| Field | Description |
|---|---|
| Name | A descriptive name for your connection in Permutive |
| GCP Project ID | The GCP project that the bucket belongs to |
| GCS Bucket Region | Select from the available regions for your workspace |
| GCS Bucket Name | The full bucket name (without gs:// prefix) |
| Schema Prefix | The prefix within the bucket that acts as your schema (no leading slash) |
| Data Format | Choose Parquet (recommended) or CSV |
| Data Partitioning | Select whether all tables are partitioned or no tables are partitioned |
Step 3: Create an Import
Once your connection is active, you can create imports to bring data into Permutive.
For more details on configuring imports, see Imports.
Limitations
Troubleshooting
Connection fails to validate
Connection fails to validate
If your connection remains in “Processing” status or fails:
- Verify the IAM permissions have been correctly applied to
[email protected] - Check that the bucket name and project ID are correct
- Ensure the schema prefix exists and contains table directories
Tables not appearing
Tables not appearing
If you don’t see expected tables after creating the connection:
- Verify your directory structure matches the required format
- Check that data files exist under each table directory
- Ensure the data format setting matches your actual file format
Partition data not being extracted
Partition data not being extracted
If partition columns aren’t appearing in your data:
- Verify “All tables are partitioned” is selected in Data Partitioning
- Check that partition directories use the correct Hive format (
column=value)
Where can I find the bucket name for a Permutive Provisioned Bucket?
Where can I find the bucket name for a Permutive Provisioned Bucket?
The bucket name is generated upon connection creation. You can find the full GCS Bucket Name on the Connection Details page immediately after setup is complete.