Behind-the-Scenes: Real-Time Segments with Blueshift

A businessman looking at data on a tablet

(Here is a behind the scenes look at the segmentation engine that powers Programmatic CRM.)

Real-time segmentation matters: Customers expect messages based on their most recent activity. Customers do not want reminders for products they may have already purchased or messages based on transient past behaviors that are no longer relevant.

However, real-time segmentation is hard: it requires processing large amounts of behavioral data quickly. This requires a technology stack that can:

  • Process event & user attributes immediately, as they occur on your website or mobile apps
  • Track 360-degree customer profiles and deal with data fragmentation challenges
  • Scale underlying data stores to process billions of customer actions and support high write and read throughput.
  • Avoid time consuming steps of data modeling that require human curation and slows down on-boarding

Marketers use Blueshift to reach each customer as a segment-of-one, and deliver highly personalized messages across every marketing channel using Blueshift’s Customer Data Activation Platform capabilities. Unlike legacy systems, Segments in Blueshift are always fresh and updated in real-time, enabling marketers to respond to the perpetually connected customer in a timely manner. Marketers use the intuitive and easy to use segmentation builder to define their own custom segments by mixing and matching filters across numerous dimensions including: event behavioral data, demographic attributes, predictive scores, lifetime aggregates, catalog interactions, CRM attributes, channel engagement metrics among others.

Edit segment screenshot

Behind the scenes, Blueshift builds a continually changing graph of users and items in the catalog. The edges in the graph come from user’s behavior (or implied behavior), we call this the “Interaction graph”. The “interaction graph” is further enriched by machine-learning models that add predicted edges and scores to the graph (if you liked item X, you may also like item Y) and also expand user attributes through 3rd party data sources (example: given the firstname “John”, with reasonable confidence we can infer gender is male).

Blueshift interaction graph

The segment service can run complex queries against the “interaction graph” like: “Female users that viewed ‘Handbags’ over $500 in last 90 days, with lifetime purchases over $1,000 and not using mobile apps recently and having a high churn probability” and return those users within a few seconds to a couple of minutes.

360-degree user profiles

For every user on your site/mobile app, Blueshift creates a user profile that tracks anonymous user behavior and merges it with their logged-in activities across devices. These rich user profiles combine CRM data, aggregate lifetime statistics, catalog-related activity, predictive attributes, campaign & channel activity and website / mobile app activity. The unified user profiles form the basis for segmentation. A segment query matches these 360 degree user profiles against the segment definition to identify the target set of users.

Real-time segments screenshot

Multiple data stores (no one store to rule them all)
The segmentation engine is powered by several different data stores. A given user action or attribute that hits the event API is replicated across these data stores including: timeseries stores for events, relational database for metadata, in-memory stores for aggregated data & counters, key-value stores for user lookups, as well as a reverse index to search across any event or user attributes quickly. The segmentation engine is tuned for fast retrieval of complex segment definitions compared to a general purpose SQL-style database where joins across tables could take hours to return results. The segmentation engine leverages data across all these data stores to pull the right set of target users that match the segment definition.

Real-time event processing

Website & mobile apps send data to Blueshift’s event APIs via SDKs and tag managers. The events are received by API end-points and written to in-memory queues. The event queues are processed continuously in-order, and updates are made across multiple data stores (as described above). The user profiles and event attributes are updated continuously with respect to the incoming event stream. Campaigns pull the audience data just-in-time for messaging, which result in segments that are continuously updated and always fresh. Marketers do not have to worry about out of date segment definitions and avoid the “list pull hell” with data-warehouse style segmentation.

Dynamic attribute binding

The segmentation engine further simplifies onboarding user or event attributes by removing the need to model (or declare) attribute types ahead of time. The segmentation engine dynamically assesses the type of each new attribute based on sample usage in real-time. For instance, an attribute called “loyalty_points” with a value of “450”, would be interpreted as a number (and show related numeric operators for segmentation), while an attribute like “membership_level” with a value of “gold” would be dynamically interpreted as a string (and show related string comparison operators for segmentation), or an attribute like “redemption_at” with a value like “2016-09-23” will be interpreted as a timestamp (and show relative time operators).

Several Blueshift customers have thousands of CRM & event attributes, and are able to use these attributes without any data modeling or declaring their data upfront, saving them numerous days of implementing data schemas in SQL-based implementations.

The combination of 360-degree user profiles, real-time event processing, multiple specialized data stores and dynamic attribute binding, empowers marketers to create always fresh and continuously updated segments.