Technical challenges behind Streamhub’s Audience Segment Builder

Reading Time: 10 minutes



During the summer of 2020, we decided to build Streamhub’s audience segment builder, which is a key component of our latest product module – Activate – currently available for beta users.

In the process of doing so, we faced some interesting challenges, which constitute the inspiration for this blog post.

But, wait … First, what is an audience segment builder?

Let’s assume a hypothetical customer that has programs and advertisement logs. This customer might also have CRM data, QoS or Panel data. Let’s say to simplify that the CRM data includes fields like gender, age, email, and subscription model for each of that customer’s user base.

Now, let’s consider this customer wants to target the users that are males, on premium or unlimited devices subscription, and have watched “Prison Break” for more than 20 hours in the last month.

With the audience segment builder, we can select dimensions from the datasets that the customer exposes and visually build a query with groups and nested subgroups, including program’s dimensions conditions (here, `have watched “Prison Break”`) as well as user-based conditions (here, `male users on Premium or unlimited devices subscription`) metrics conditions (here, `timeWatched > 20h`) or time-based conditions (here, `in the last month`) in the query.

From the system perspective, the output of the segment builder is a JSON payload that represents an Activate query. From the customer perspective, the output is a list of user identifiers distributed over 3 possible categories: cookies, device identifiers (AAID, IDFA), and PPID (Publisher Provided Identifiers)

This is what the audience segment builder user interface looks like nowadays.

activate segment builder

Figure 1: defining a segment in the audience segment builder Activate

A very first internal iteration of the audience segment builder UI looked like

old streamhub audience builder

Figure 2: Proof-of-Concept UI for the audience segment builder

You can see quite a few changes were made, just by looking at the 2 screenshots! But what changed and why?

Simplified and intuitive UX is the key for an audience segment builder

Let’s get this out of the way, the first internal version of our audience segment builder was not very user-friendly. From the 2nd screenshot, it’s obvious, just looking at the colours or the way the tool is supposed to be manipulated, with the user first having to select one of the 3 operators “All”, “Any” and “None” and then only configure the values for a particular condition.

Frontend developers among our readers might already have spotted the Angular Tree component is at use here. The whole experience was built upon the capabilities of the Angular tree component. Using a tree component was very helpful to develop the PoC quickly. However, it comes at a price: a ridiculously poor user experience.

Indeed, to express a query like

Select all the users who have watched “The Crown” and are either female or aged between 18 and 24.

With the old version, the customer would need to:

Select the All operator on the root element
Add and configure a condition node below it as the “have watched ‘The Crown’” condition
Create a sub-group with the Any operator
Add 2 conditions within the sub-group: a. “Gender is female” / b. “Age between 18 and 24”

While this way of thinking might be natural for a Lisp programmer, it’s not intuitive for a regular user.

So the first and main motivation to provide a newer UI and UX was to enable a more natural and linear experience similar to how most users think, starting with the condition first (“users that have watched ‘The Crown’”) and then adding an “And” operator if, and only if, a second condition is required. Other factors that led to rewriting this component included a complete change of design and theming for the UI, as well as the need to incorporate new features without being limited by the Angular Tree component or any 3rd party ready-made component.

The problem and objective were set, let’s crack on with it!

The design of an Audience Segment Builder

From a technical point of view, we knew that the new component would need to satisfy these requirements:

It needed to manipulate a hierarchical in-memory tree-like data structure. We will refer to it as the Query Tree for the rest of the post.
The query tree would be composed of groups, sub-groups, standalone conditions, and operators.
Within a group, the elements could be conditions, operators or sub-groups- Sub-groups could contain other sub-groups and therefore allow for any level of nesting.
Different types of conditions would require a different type of Angular component, meaning that the component would need to be dynamically instantiated depending on the context. To illustrate this, let’s consider:
- a condition like “Device type is mobile, desktop, tv” can be configured with a visual component that requires 2 lists:
  - The list on the left will contain all the possible device values while the list on the right will contain the user-selected values.
- On the other hand, a condition like “The average time watched is greater than 200 minutes” will require a different type of UI, with inputs that can accept numerical values.
- Another case could be a condition like “users having watched ‘Breaking Bad Season 1 Ep. 3‘” which would require an autocomplete search component.

The system would require high consistency. To illustrate:
- If a user linearly adds condition1 and condition2 or condition 3 within a single group, we don’t know if the brackets should be placed around (condition1 and condition2) or placed around (condition 2 or condition3).
- In this case, we ask the user to disambiguate the situation by letting him choose whether he wants all the operators to be ands or ors.
- Alternatively, the user can also choose to create a nested sub-group within the current sub-group to hold either one of the 2 arrangements of brackets. Sub-groups are a metaphor for brackets.
The query tree would need to be marshalled to be persisted or processed by the backend stack. Upon reloading a segment, the query tree would need to be reconstructed (we’ll come back to this in detail)

Key Angular components

What was a nice realisation was that most of the above technical points could be implemented with simple Angular primitives:

The NgForOf directive would let us iterate through the elements of a group or subgroup and render them.
Angular @Component would let us define a few wrapper components for the elements manipulated by the SB. We defined:
- GroupComponent, to represent a top-level group
- RuleSubGroupComponent, which is a metaphor for brackets. It can contain conditions and other RuleSubGroupComponents thus allowing for any level of nested subqueries within a query
- RuleOperatorComponent, which represents an operator between 2 top-level groups (possible values are and, or, except) or between 2 subgroups (possible values are and, or)
- ConditionPlaceholderComponent, a placeholder component to add a new condition to the query tree.
- SegmentBuilderWidgetComponent: used as an envelope to dynamically instantiate any other kind of component, like the TwoListsComponent mentioned in the requirements.
@ViewChildren, @Input(), @Output() decorators would let us define clear channels of communication within the component tree
- Between child and parent (example, condition → enclosing subgroup)
- Between parent and children (example, subgroup → conditions, subgroup → operators)
- Between siblings (example, between 2 conditions enclosed within 2 distinct subgroups)
Angular’s ComponentFactoryResolver would allow us to dynamically instantiate components depending on the context.

Have a look at a (simplified) version of the HTML involved to render the Segment Builder.

In segment.builder.component.html the root *ngFor loop renders the first level of the Query Tree (the top-level groups):

<!-- The segment builder rendering logic -->
<div *ngFor="let element of queryTree"><!-- rendering each node of the query tree --> 
 <group-operator #groupOperator [element]="element" ...></group-operator>
 <group #group (onRemoveGroup)="on...($event)" [element]="element" ...></group>
</div>
<!-- *ngFor -->

Now, let’s deep dive into a Group component’s HTML. The inner *ngFor loop will render the elements within the group.

<div *ngIf="isGroup(element)">
 <div ...>{{"GROUP" | translate}}</div>
 <div ...>  <!-- control-flow components here. skipping that part -->  <!-- Group’s inner *ngFor loop will render the elements within a group --> 
 <div *ngFor="let child of element.value" class="...">  <!-- the element can be a Generic segment-builder-widget-component (an envelope to construct any other components at runtime) -->
 <segment-builder-widget-component #widgets
 [element]="child"
 [groupData]="element" ...> 
</segment-builder-widget-component>  <!-- element can also be a rule operator ... ->
 <rule-operator #ruleOperator
 [siblings]="element.value"
 [element]="child" ...></rule-operator>  <!-- … or a placeholder for a new condition ->
 <condition-placeholder
 [element]="child" ...></condition-placeholder>  <!-- … or a Sub-Group -->
 <rule-sub-group #ruleSubGroup
 [groupData]="element"
 [element]="child" ...>
 </rule-sub-group>  </div>
 <!-- *ngFor -->
 <!-- a few more controls come here. Skipping that part -->
 <!-- Recency & Frequency component. Skipping that part →
 </div>
</div>

Finally, the most nested *ngFor loop within the rule-sub-group.component.html will render the leaves of the Audience Segment Builder. Note that the structure is recursive here with the rule-sub-group component being a possible element within the inner loop of rule-sub-group.component.html

<div ...>
 <div ...>{{"SUB_GROUP" | translate}}</div>
 <div ...>  <!-- skipping controls here -->  <div *ngFor="let child of element.value">  <!-- nested sub-group -->
 <rule-sub-group #ruleSubGroup
 ...
 [element]="child"
 [groupData]="groupData" ...>
 </rule-sub-group>  <!-- skipping other nested components ... -->
 </div> </div> </div>

To summarise, given a hierarchical tree data structure (the Query Tree), a couple of user-defined components (group, subgroup, condition, operator, widget), the ngForOf directive, and a few sass-based classes (to materialize the layer of nesting mainly) we can render any segment in the Segment Builder.

Of course, there are other aspects which we are not covering here, for example regarding the flow of events within the SB: adding and removing conditions/operators/sub-groups, managing the internal states of the SB (locking/unlocking the SB when the user operates on the current node), but from the strict point of view of rendering the Query Tree, this is pretty much what was required.

On top of what Angular already provided, we added 2 services:

A TreeAlgorithmService, which implemented generic versions of Depth-First search and Breadth-first search algorithms as higher-order functions.
- Being higher-order functions, these functions can accept another function as a parameter to represent the matching criteria to select a node. This is useful to select any kind of node in the query tree based on simple predicates.
A MarshallingService, which implemented the logic to transform the query tree into a JSON equivalent payload that could be sent to the API.

Challenges of marshalling and unmarshalling the query tree

As we saw, the in-memory structure that the segment builder is manipulating is called a Query Tree. This is a simple data structure that is made of JavaScript Arrays and Objects. The default query tree when starting a new empty segment resembles this

[
 {
 id: uuid(),
 nodeType: "GROUP",
 root: true,
 name: "",
 datasetId: -1,
 recencyAndFrequency: {},
 value: [
 {
 id: uuid(),
 nodeType: "RULE_SUB_GROUP",
 root: true,
 name: "",
 value: [
 {
 id: uuid(),
 nodeType: "CONDITION_PLACEHOLDER",
 root: true,
 value: "SELECT_DIMENSION_PLACEHOLDER"
 }
 ]
 }
 ]
 }
]

Figure 3: the query tree manipulated by the audience segment builder in Activate

As we see, any object in it can have a value property that itself contains an array of objects, and therefore can be iterated.

These arrays of objects are literally what the Angular ngForOf directives are iterating over when rendering the DOM. The query tree is very close to the HTML that is rendered in the UI. By contrast, the way the segment is persisted into the DB storage or processed by the backend API is quite different from the query tree. To illustrate, this is how a small portion of a segment looks like once stored in MongoDB:

{
 "groupOperator": "UNION",
 "groupChildren": [
 {
 "groupRule": {
 "name": "G0",
 "query": {
 "datasetId": {
 "$numberInt": "0"
 },
 "dimensions": [],
 "filters": {
 "children": [
 {
 "rule": {
 "dimension": "programs:id",
 "value": [
 {
 "$numberLong": "xxx"
 }
 ],
 "operator": "Any"
 }

Figure 4: an Activate segment, once stored into MongoDB

Depending on the context, converting one representation into another could get complicated. This is because some parts of a segment will be treated differently by the UI and by the API.

For instance, if a date range is applied to a segment, the API expects the segment payload to contain an additional ‘date’ dimension for each sub-groups defined by the segment, connected by an AND operator with the rest of the conditions in the sub-group. Let’s assume a user has created a segment with 2 subgroups: (condition A or condition B) and (condition C or condition D)

Now, if the user wants to apply a date range restriction to the segment, the generated payload should be set as:daterange and ((condition A or condition B) and (condition C or condition D))

So there are transformations involved. These transformations are relatively easy to implement in one direction (from Query Tree to API payload) but would be more difficult to implement in the other direction (reconstructing the original user-defined Query Tree from the API payload)

Besides, we wanted to have full flexibility on the UI side of things and provide a flow that is as convenient as possible for the end-user.

To reduce the code complexity and guarantee that once loaded back the segment is faithful to what the user had created, we chose to convert between the UI representation and the API representation in only 1 direction, that is in the marshalling direction.

In practice, what it means is that we store the Query Tree itself (or rather an altered version), as a property of the Segment payload.

The steps involved;

First, we transform the Query Tree into a binary-equivalent representation.
Then, we use a lossless compression algorithm to reduce the size of the binary-equivalent representation
Finally, we take the base64 of the compressed binary representation and store it as a property of the Segment payload, namely the uiData property.

By doing that, we save on the marshalling complexity when reloading an existing segment into the Segment Builder because we only need to decode and uncompress the query tree from the API responses into memory, as opposed to having to reconstruct the Query Tree from the API response itself.

Collaterally, we also avoid potentially hard to detect bugs. Given that segments can be created, then loaded back and updated multiple times, any glitch in the marshalling logic would only increase the damage done over time.

Another advantage of using the Query Tree as an intermediate UI representation of a segment is that it allows performing some optimizations while marshalling it.

For instance, it becomes possible to simplify the payload to get rid of unnecessary levels of nesting, which in turn, makes processing the payload faster as it takes fewer operations on the backend to handle it.

Below are some of the optimizations we perform when converting the Query tree into the API payload equivalent:

If a sub-group X contains a nested sub-group Y which itself contains a single condition C, then the structure can be simplified so that C becomes a direct child node of X.
Any group that contains a single direct sub-group can also be flattened so that the sub-groups children are exposed directly
Empty sub-groups can be eliminated.
Any subgroup that connects its children with the same operator that connects the sub-group with a standalone condition can also be flattened. To illustrate, this is the case where the user has defined: (condition C and (condition D and condition E)) which can be simplified by getting rid of the subgroup surrounding (D and E) => (condition C and condition D and condition E)

While we perform these optimizations during the marshalling phase, we do not modify the user-defined query tree. We only reduce the complexity and size of the generated API payload.

When loading back the same segment later in the SB, the segment is presented exactly as it was created originally by the user.

Voila! I hope you have enjoyed this reading. Please let me know if you have any questions!

Tony Broyez

Senior Full Stack Data Integration Engineer at Streamhub

Tony is an experienced and versatile programmer with more than 15 years of experience mainly in the media/tech industry. His skills have ramifications into data collection and preparation, data computation and export, as well as data visualisations. When he’s not coding, he can be found having fun with his 2 young kids Theo and Tom.

Would you like to work with the guy behind this article? Why don’t you join our engineering team?

Senior Data Operations Engineer (Contract / Part Time / Remote)

Applications Engineer (Microservices|Java|K8s) – PERM / Bengaluru

Technical challenges behind Streamhub’s Audience Segment Builder

But, wait … First, what is an audience segment builder?

Simplified and intuitive UX is the key for an audience segment builder

The design of an Audience Segment Builder

Key Angular components

Challenges of marshalling and unmarshalling the query tree

Tony Broyez

Read more blogs:

Tech Webinar: RxJS Essentials for Angular Developers

The Quest To Unify Measurement: Adding Live + Cloud DVR viewing data

Category

Search

Archives

Newsletter

Awesome, you're all signed up!

Social Media

Products

Office

London HQ

About

Tokyo

Help & Inspiration

Bangalore