Big Data: Going Beyond the Landing

Rami Mukhtar, Chief Executive Officer, Ambiata

Rami Mukhtar, Chief Executive Officer, Ambiata

With the buzz of big data flowing through the cubicle farms of enterprise for over 5 years, we now have a good spread of installation maturity to reflect back on to understand the true value and challenges of the paradigm. If you study the spectrum of projects what you will most certainly find is that in the early stages the emphasis was on technology, during the mid stages it becomes all about the right people, and the most mature projects struggle with integration with the people and processes of the business.

Big data is still a relatively new paradigm. In many cases the focus has been on insight generation for the purpose of generating bespoke business insights that could then be put into action as strategic business decisions. Whilst this clearly has value, it is relatively straight forward for enterprise to implement this, as it is not significantly different to how strategic data analytics has worked in the past, albeit using different tools and more data.

Enterprise has been far less successful in leveraging big data to generate granular insights on a per consumer basis and then action them individually.

The reason is likely attributable to the significant departure this brings to how an enterprise undertakes and leverages analytics.

Analytics in enterprise is typically highly manual and bespoke. Select data sets are curated and prepared by analysts and then analyzed or modeled on an as needs basis to produce insights or models. When models need to be put into ‘production’ then the data preparation steps are carefully replicated by a database programmer. A lead time of months between model generation and production is a typical industry standard.

In the world of Big Data data preparation becomes an order of magnitude more complex. Whilst previously some SAS scripts or SQL queries would suffice, within the Big Data realm the Data Scientist has to deal with a combination of Structured, Semi-Structured and Unstructured sources in a variety of file formats persisted onto a data lake. Not only are the input sources more varied and complex but the Data Scientist needs to manufacture more attributes or features from these to feed increasingly data hungry algorithms that attempt to draw from every more subtle signals in the data.

To be successful in this paradigm building pipelines that enable an enterprise to amortize to cost of data preparation across the entire Data Science team becomes critical. Ensuring that the time to create a model is reduced to a timeframe of relevance, and that it doesn’t take months to take a model from the lab to the factory.

The next major challenge to applying Big Data to drive personalization is having the ability to deploy data driven strategies to drive front line decisions. Digital platforms are a relatively easy target. There the main challenge is identity. Most traditional enterprises have a wonderful understanding of offline identity of their customers, but joining this to online identity is very challenging. There is no doubt that we will see this improve as Internet properties continue to get us to engage with their platforms in exchange for content, however, today this is usually available to enterprise in exchange for ad-revenue. This leaves enterprise with a big challenge in activating individual customer insights on their own domains.

If Big Data is to continue to add value to enterprise then two key problems that need to be solved is customer identity management and attribute preparation.