Directory-Derived Data vs. Event-Derived Data

When Customer Insights debuted, it was a banner day for Identity Cloud subscribers: those subscribers now had quick and easy access to a wide array of identity-related data and events. Want to know how many user profiles you have? No problem. Want to see your registration trends for the past six months? No problem. Want to know how many users you have in that all-important 25-to-34 age group? You got it: no problem.

As incredibly cool as all of that was, however, there were one or two minor limitations to Customer Insights 2.0. For one thing, demographic data (like age and gender) did not come directly from the Identity Cloud directory; instead, this data was derived from the underlying event. For example, when a user signed on, demographic information would be copied from the sign-on event: the user profile store was never actually queried for this data. For the most part, that worked. However, due to technical issues involving the event stream, the data displayed in Customer Insights would occasionally differ from the actual data. The difference was slight: for example, Customer Insights might say you had 999,987 user profiles when you really had 1,000,000 user profiles. But, slight or not, there was a difference.

In addition to that, and because Customer Insights could only drive demographic information from the sign-on event, all organizations had the exact same set of attributes; that’s because events don’t forward the full user profile. The available attributes were definitely useful (things like age, gender, country of residence, and email domain) but they weren’t necessarily the only attributes of interest to you. For example, suppose you had a custom attribute named subscriptionLevel, an attribute that tracked the, well, subscription level for a user (e.g., Gold, Platinum, or Silver). That’s the sort of thing you’d love to see in your CI reports, but that was  also the sort of thing that simply could not be made available in Customer Insights.

Or at least not in Customer Insights 2.0. With the release of Customer Insights 2.1, however, those afore-mentioned limitations are a thing of the past: now you can get truly accurate counts of your users, and now you can add any attribute you want (including custom attributes like subscriptionLevel) to Customer Insights.

Good question: why is CI 2.1 more accurate than its predecessor? Well, as noted a moment ago, Customer Insights 2.0 derived data from each event (registrations, sign-ins, deactivations, etc.) it recorded. This resulted in data that was very accurate, but not perfectly accurate. For example, if you counted the number of user profiles in Customer Insights and then compared that value with the number of user profiles reported in, say, the Capture Dashboard, the two values were close, but often-times were not identical:

Needless to say, 1,001,007 is not the same as 1,001,877.

So what’s changed? For one thing, Customer Insights 2.1 no longer derives demographic information by parsing event data. Instead, CI 2.1 can directly query your user profile store (or at least a mirrored replica of that store) to retrieve user information. As a result, the number of user profiles reported in Customer Insights now exactly matches the number of user profiles reported elsewhere (seeing as how the numbers are being pulled from the same place):

Like we said, the improved accuracy comes from the way Customer Insights 2.1 retrieves demographics data: data is now queried directly from the Identity Cloud directory and is not derived from event data. Or, to be a little more accurate ourselves, not all of demographic information is derived from event data. And yes, maybe a little explanation is in order here.

To begin with, event-derived data is still available in Customer Insights, and is still updated in the same way. Furthermore, that data remains extremely accurate, albeit not perfectly accurate. But it’s still available for you to use.

And that’s a good question: if there’s a newer, better way to retrieve demographic data, then why keep the old method? As it turns out, there are several reasons for that. For one thing, there’s the all-important issue of backward compatibility: if the event-derived data was discontinued, your existing Looks and Dashboards would no longer work. To keep those reports functioning (and to keep them useful), that data needs to keep coming in.

There’s also a history issue. The new directory-derived data essentially has no history: after all, directory-derived data wasn’t introduced until early December 2018. By comparison, event-derived data dates back several years. Tossing out event-derived data would mean tossing out all your historical data, making it difficult to chart trends and make projections and do all those other things that rely on having a dataset that’s been around for awhile.

And don’t forget that, while not perfect, event-derived data is very close to perfect. True, you can’t use the event-derived data to say something like this: “On March 3, 2018, we had exactly 987,353 registered users.” However, you can use the event-derived data to chart your growth rate in registrations over the past year, or to see whether sign-in spikes coincided with your marketing campaigns (for better or for worse). Event-derived data has its purposes, and will continue to have those purposes for quite some time to come. To coin a phrase, we don’t want to throw the event-derived baby out with the directory-derived bathwater.

Or something like that.

The move to a new method for querying data has also led to a couple of important infrastructure changes in CI 2.1. For one thing, the default Dashboards that ship with Customer Insights have been overhauled. In looking over the new Dashboards, you might notice that two of these Dashboards – Demographic Trends and Last Login, Creation, and Deactivation Trends– feature the word Trends in their name. There’s a reason for that: because these Dashboards use event-derived data, that means that the data is not necessarily 100% accurate, which also means that this data – and these Dashboards – are best used for tracking trends. In fact, each of these Dashboards includes a disclaimer to that effect:

In addition to the new Dashboards, there’s another infrastructure change of note: Customer Insights no longer ships with a large collection of Looks. In fact, it doesn’t ship with any Looks whatsoever: the previous Looks have all been retired. That said, however:

  • The information collected by those Looks is still available on the various Dashboards. For example, the data retrieved by the Events by PC (Yes/No) Look can still be found on the various Event Details Dashboards.
  • Any custom Looks you created yourself will still be available. Only the Akamai-created Looks have been removed.

And, as always, you can create your own Looks, using either the “old-fashioned” event-derived Explores, or by using the new directory-derived Explore. For more information on the latter, see Adding Attributes to Customer Insights.