Skip to main content

Overview

Automate data uploads to Permutive using the GCS service account credentials provided when you created your import. This guide covers authenticating with service account credentials and scripting uploads for automated data pipelines.
Prerequisites:
  • A GCS import already created in Permutive
  • Service account credentials (JSON key file) from your import settings
  • Your bucket path and folder structure
  • gsutil CLI or Google Cloud SDK installed

When to Use Programmatic Upload

Programmatic upload is ideal for:
  • Regular CRM syncs: Daily or weekly uploads of customer segments
  • Partner data feeds: Automated ingestion from data partners
  • Pipeline integration: Upload as part of ETL/ELT workflows
  • Scheduled jobs: Cron jobs, Airflow DAGs, or other schedulers
For one-time or testing uploads, consider manual upload instead.

Getting Your Credentials

Service account credentials are provided when you create a GCS import:
  1. Navigate to Connectivity > Imports in the Permutive Dashboard
  2. Select your import
  3. Download the service account JSON key file
  4. Note your bucket path
Keep credentials secure: The service account JSON file grants write access to your import bucket. Store it securely and never commit it to version control.

Setup Steps

1

Install Google Cloud SDK

Install the Google Cloud SDK which includes gsutil:
# macOS (using Homebrew)
brew install google-cloud-sdk

# Linux (Debian/Ubuntu)
sudo apt-get install google-cloud-sdk

# Or download from:
# https://cloud.google.com/sdk/docs/install
2

Authenticate with Service Account

Activate the service account using your JSON key file:
gcloud auth activate-service-account --key-file=/path/to/service-account-key.json
Alternatively, set the environment variable for SDK authentication:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
3

Verify Access

Test that you can list the bucket contents:
gsutil ls gs://your-permutive-bucket/your-import-path/
If this succeeds, you have proper access configured.
4

Prepare Your Data File

Create your data file in the correct format and compress it:
# Create tab-separated file
cat > audience_data.tsv << 'EOF'
76E5F445-1993-4B13-A67A-76E5F4451993	0002,0007,0012
5E824DCF-2C6D-4A89-9B34-5E824DCF2C6D	0010,0011
A1B2C3D4-E5F6-7890-ABCD-EF1234567890	demo_25_34,intent_auto
EOF

# Compress with gzip
gzip audience_data.tsv
This creates audience_data.tsv.gz ready for upload.
5

Upload the File

Use gsutil cp to upload:
gsutil cp audience_data.tsv.gz gs://your-permutive-bucket/your-import-path/
For larger files, use parallel uploads:
gsutil -m cp audience_data.tsv.gz gs://your-permutive-bucket/your-import-path/

Code Examples

Bash Script

A reusable script for scheduled uploads:
#!/bin/bash
set -e

# Configuration
SERVICE_ACCOUNT_KEY="/path/to/service-account-key.json"
BUCKET_PATH="gs://your-permutive-bucket/your-import-path"
DATA_DIR="/path/to/data"

# Authenticate
gcloud auth activate-service-account --key-file="$SERVICE_ACCOUNT_KEY"

# Find today's data file
DATA_FILE="$DATA_DIR/audience_export_$(date +%Y%m%d).tsv"

# Compress if not already compressed
if [[ ! -f "${DATA_FILE}.gz" ]]; then
    gzip -k "$DATA_FILE"
fi

# Upload
gsutil cp "${DATA_FILE}.gz" "$BUCKET_PATH/"

echo "Upload complete: ${DATA_FILE}.gz"

Python Script

Using the Google Cloud Storage Python SDK:
from google.cloud import storage
import gzip
import os
from datetime import datetime

def upload_audience_data(
    credentials_path: str,
    bucket_name: str,
    folder_path: str,
    data_file_path: str
):
    """Upload audience data to Permutive GCS bucket."""

    # Initialize client with service account
    client = storage.Client.from_service_account_json(credentials_path)
    bucket = client.bucket(bucket_name)

    # Generate destination filename with timestamp
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"audience_data_{timestamp}.tsv.gz"
    destination_path = f"{folder_path}/{filename}"

    # Compress and upload
    blob = bucket.blob(destination_path)

    with open(data_file_path, 'rb') as f_in:
        compressed_data = gzip.compress(f_in.read())

    blob.upload_from_string(
        compressed_data,
        content_type='application/gzip'
    )

    print(f"Uploaded to gs://{bucket_name}/{destination_path}")

# Example usage
if __name__ == "__main__":
    upload_audience_data(
        credentials_path="/path/to/service-account-key.json",
        bucket_name="your-permutive-bucket",
        folder_path="your-import-path",
        data_file_path="audience_export.tsv"
    )

Scheduling Uploads

Cron (Linux/macOS)

Add to your crontab for daily uploads at 2 AM:
0 2 * * * /path/to/upload_script.sh >> /var/log/permutive_upload.log 2>&1

Apache Airflow

Example DAG task:
from airflow.providers.google.cloud.transfers.local_to_gcs import LocalFilesystemToGCSOperator

upload_task = LocalFilesystemToGCSOperator(
    task_id='upload_audience_data',
    src='/path/to/audience_data.tsv.gz',
    dst='your-import-path/audience_data.tsv.gz',
    bucket='your-permutive-bucket',
    gcp_conn_id='permutive_gcs_connection',
)

Best Practices

  • Use descriptive filenames: Include dates or timestamps (e.g., audience_2024-01-15.tsv.gz)
  • Implement error handling: Check upload success and retry on transient failures
  • Log uploads: Maintain audit trails of what was uploaded and when
  • Validate before upload: Check file format and size before uploading
  • Monitor processing: Verify files are processed successfully in the Dashboard
  • Rotate credentials: Periodically rotate service account keys for security

Troubleshooting

The service account may lack write permissions to the bucket.Solution: Verify you’re using the correct service account JSON file provided by Permutive. The credentials may have expired or been regenerated—check with your Permutive representative if needed.
The Google Cloud SDK may not be installed or not in your PATH.Solution: Install the Google Cloud SDK and ensure gsutil is available in your PATH. Run gcloud components update to ensure you have the latest version.
Network timeouts or interruptions can cause large uploads to fail.Solution: Use gsutil -m for parallel uploads. For very large files, consider splitting into multiple smaller files. Enable resumable uploads with gsutil -o GSUtil:resumable_threshold=0.
Files may not be in the correct format or may be missing taxonomy entries.Solution: Verify file format (tab-separated, gzip compressed with .gz extension). Check that all segment codes exist in your taxonomy. Monitor the import status in the Dashboard for processing errors.

Next Steps