Skip to main content

Overview

Classification Cohorts enable publishers to reach their non-endemic audience through data modeling. Publishers select a dataset containing categorized users—e.g. by age ranges—and Permutive uses the publisher’s audience data to learn which behavioral patterns predict a user’s membership to a distinct category, such as those in the 18–35 age range. Classification Cohorts can be built using either partner data or first-party data, allowing publishers to scale valuable data from brands and data providers or their own declared data to make informed targeting decisions across their whole audience.

Why Use Classification Cohorts?

Reach your non-endemic audience — When publishers receive briefs from advertisers requesting non-endemic audiences, they need a way to prove they have these audiences and that they can reach them at sufficient scale. Building Classification Cohorts with partner data allows publishers to reach their non-endemic audience by identifying which of their users have similar behavior to those of their partners, such as brands or data providers. For example, an auto publisher may work with Rolex to identify which of their users are luxury watch intenders. Scale your declared first-party data across your whole audience — With the availability of third-party data decreasing, more and more publishers are investing in capturing declared first-party data, such as user information collected during sign-up or the responses to surveys. Building Classification Cohorts with first-party data allows publishers to scale their high fidelity understanding of a subset of their audience to identify and reach users across their entire audience. For example, a food and drink publisher responding to a brief from a vegan food retailer may survey a small set of their users and leverage this dataset to identify users across their total audience according to diet preference.

Concepts

Definitions

  • Seed Dataset: A set of users each assigned to one of a fixed set of categories, called labels. For example, diet information from a brand might consist of a set of hashed email addresses each assigned one label from vegan, vegetarian, carnivore, other.
  • Label: The set of categories users in the seed dataset can belong to and that the model predicts for. A user is assigned to exactly one label since Classification Cohorts are designed for use-cases where you want to enforce mutual exclusivity.
  • Classification Model: The seed dataset is trained against the publisher’s custom cohorts (excluding those using third-party data) to produce a Classification Model. For a given set of custom cohorts that a user belongs to, this model predicts the confidence with which the user belongs to each of the model’s categories (labels).
  • Confidence Threshold: When we evaluate the model for a user, we will make a prediction if the user should be assigned to label A, B, or C. Publishers can specify a confidence threshold such that users are only assigned to a label if that minimum confidence threshold is met or exceeded. For higher thresholds, confidence in the accuracy of the classification increases but audience reach decreases. Conceptually, at 0% confidence, the model assigns a label for 100% of your users but its predictions are tantamount to choosing a label at random. At 100% confidence, the model assigns labels only for those users in the seed dataset, and therefore provides 0% increase in reach.
  • Classification Cohort: At each confidence threshold, publishers can create a Classification Cohort for each label in the model. A user falls into the Classification Cohort if the minimum threshold is met for that label.

Data Model

Classification data model
  • One Classification Model can be created for one partner Seed Dataset.
  • Many Classification Models can be created for one first-party Seed Dataset, parameterized by the choice of 2–4 custom cohorts for labels.
  • Many Classification Cohorts can be created for one Classification Model, parameterized by the confidence threshold and the model’s labels.

Workflows

Building a Classification Model

Publishers select a seed dataset—partner data or first-party data—which is joined with the publisher’s custom cohort user data to form training data. During training, the Classification Model learns patterns in the cohorts that distinguish the different labels in the seed dataset. Building a classification model

Deploying a Classification Cohort

A Classification Cohort is created from a Classification Model by first choosing a confidence threshold and then selecting the label for which the publisher would like to create a Classification Cohort. Deploying a classification model

Activating a Classification Cohort

Once a Classification Cohort is deployed, the Permutive SDK manages the evaluation of users arriving on your sites and apps against the Classification Cohort to determine whether they meet the minimum threshold to fall into the label. To evaluate a user, the model uses the set of custom cohorts the user belongs to and predicts a level of confidence that the user falls into the label—if the confidence meets or exceeds the threshold for the Classification Cohort, then the user is deemed to fall into the cohort. Activating a classification cohort

Guides

Step-by-step instructions for working with Classification Cohorts.

Troubleshooting

If your Classification Model remains in “In Progress” status for an extended period (more than 12 hours), this is typically caused by one of the following issues:Insufficient users in seed cohorts — The model encountered issues due to an absence of users in the cohorts when they were initially created. This commonly occurs when cohorts and models are generated on the same day, resulting in insufficient training data.Low unique users — The seed dataset does not have enough unique users to train a reliable model. Each Custom Cohort used as a label must have at least 1,000 users.Solution:
  • Wait 24-48 hours after creating your seed cohorts to ensure they have accumulated sufficient users before creating a Classification Model
  • Verify that each cohort in your seed dataset has at least 1,000 users
  • If the issue persists after 12 hours, contact Support for assistance
If your deployed Classification Cohort shows lower than expected reach or accuracy, this may be due to the confidence threshold setting.Confidence threshold too high — Setting a very high confidence threshold (e.g., 90%+) will increase accuracy but significantly reduce reach, as fewer users will meet the minimum confidence requirement.Confidence threshold too low — Setting a very low confidence threshold (e.g., below 50%) will increase reach but may reduce accuracy, as the model will classify more users with lower certainty.Solution:
  • Review the confidence threshold for your Classification Cohort
  • Adjust the threshold to balance reach and accuracy for your use case
  • Create multiple cohorts from the same model at different confidence thresholds to test performance
  • Remember: At 0% confidence, the model assigns labels to 100% of users but predictions are essentially random; at 100% confidence, only users in the seed dataset are classified
If your Classification Cohort is not appearing in your ad server or activation destination, verify the following:Activation not enabled — The cohort may not be enabled for activation to your desired destination.SDK rebuild required — After creating a Classification Cohort, the Permutive SDK may need to be rebuilt to include the classification models addon.Unsupported platform — Classification Cohorts are only supported on Web, iOS, and Android platforms. They are not currently available for CTV or API Direct.Solution:
  • Verify the cohort is enabled for activation in the dashboard
  • Check that the activation destination (Google Ad Manager or Xandr) is properly configured
  • Allow time for the SDK to rebuild and deploy (this can take several minutes)
  • Confirm the platform where you’re attempting activation is supported

Environment Compatibility

Core Product

Classification Cohorts functionality is supported on the following platforms:
FunctionalityWebiOSAndroidCTVAPI Direct
ActivationYesYesYesNoNo
Live audience sizeNoNoNoNoNo

Insights

FunctionalityWebiOSAndroidCTVAPI Direct
Audience InsightsYesYesYesNoNo
Campaign InsightsYesYesYesNoNo

Activation

Classification Cohorts can be activated against the following destinations:

Dependencies

Classification Cohorts rely on the following products being configured for your organization and workspace.
DependencyRequiredDescription
Identity Graph~When using partner data, there must be a common identity configured in Identity Graph between you and the partner.
Custom CohortsCustom Cohorts (excluding those using third-party data) are used for training and predicting Classification Cohorts.

Limits

Classification Cohorts adhere to the following product SLAs.

Feature Limits

FeatureDescriptionLimit
Number of labels for first-party seed dataThe number of Custom Cohorts that can be used as labels for each model.2–5

Performance Limits

MetricDescriptionLimit
Time to train a Classification ModelOnce created in the dashboard, Classification Models may take up to 12 hours to train before they are ready to be used.Up to 12 hours
Minimum number of users per Custom CohortFor first-party seed datasets, a Custom Cohort must have at least 1,000 users to be used as a model label.1,000 users

Usage Limits

SKUDescriptionLimit
Classification ModelsMaximum number of active models each workspace can run at any time.10 models per workspace
Classification CohortsMaximum number of active cohorts each workspace can run at any time.30 cohorts per workspace

FAQ

Classification Models can take up to 12 hours to train after being created in the dashboard. The exact training time depends on the size of your seed dataset and the complexity of the model.Once training is complete, the model will be automatically deployed and available for creating Classification Cohorts. You will be notified when training completes.
Partner data comes from external data providers (such as Fluent or OnAudience) and consists of a fixed, system-wide set of labels. Partner data allows you to reach your non-endemic audience by identifying which of your users have similar behavior to those in the partner dataset. For example, an auto publisher might use partner data from a luxury brand to identify users who are luxury watch intenders.First-party data comes from your own sources, such as declared data collected during sign-up or survey responses. With first-party data, you select 2-5 of your own Custom Cohorts to use as labels for the model. This allows you to scale your high-fidelity understanding of a subset of your audience to identify and reach users across your entire audience.
No. Classification Models are trained using your Custom Cohorts, excluding those using third-party data. This is intentional, as the goal of Classification Models is to help you scale your own first-party data or leverage partner datasets, not to amplify third-party data.
The confidence threshold determines the trade-off between accuracy and reach:
  • Higher thresholds (70-90%): More accurate predictions but lower reach. Use when precision is critical.
  • Medium thresholds (50-70%): Balanced accuracy and reach. Good starting point for most use cases.
  • Lower thresholds (30-50%): Greater reach but lower confidence in predictions. Use when maximizing audience size is important.
You can create multiple Classification Cohorts from the same model at different confidence thresholds to test which performs best for your specific use case. Monitor performance through Audience Insights and Campaign Insights to optimize your threshold selection.
Each workspace is limited to:
  • 10 active Classification Models at any time
  • 30 active Classification Cohorts at any time
These limits help ensure optimal performance across the platform. If you need to create new models or cohorts and have reached the limit, you’ll need to delete existing ones first.
For first-party seed datasets, each Custom Cohort used as a model label must have at least 1,000 users to ensure the model has sufficient training data. Models trained on cohorts with fewer users may fail to train or produce unreliable predictions.If your cohorts don’t yet have 1,000 users, wait for your audience to grow before creating the Classification Model. You can check cohort sizes in the dashboard before starting model training.
Classification Cohorts can be activated to:
  • Google Ad Manager (GAM)
  • Xandr (AppNexus)
Classification Cohorts are supported on Web, iOS, and Android platforms. They are not currently available for CTV or API Direct environments.For more information about activation destinations, see the Activation documentation.
No. Once a Classification Model has been created and trained, it cannot be edited. If you need to make changes to the seed dataset or labels, you must create a new Classification Model.However, you can create multiple Classification Cohorts from the same trained model by adjusting the confidence threshold and selecting different labels, without needing to retrain the model.

Changelog

For detailed changelog information, visit our Changelog.