Behind the Hypothesis: an approach to effective Big Data projects

After spending some time consulting with an organisation around Big Data, trying to ready an internal team to “sell” Big Data solutions and working on a project that was literally “crying out” for Big Data but had no interest in going there I am starting to develop an approach to these types of project that I feel merits a blog post.

What I hope this blog post will do is stimulate some discussion to inform me whether I am on the right track or not.

Step Back in Time – the Enterprise Data Warehouse

About 15 years ago I was very involved in a number of Enterprise Data Warehouse projects and I recall the struggles we had in those days to both “sell” the concept to relevant stakeholders and to build an effective EDW. At the time I was reading all the books and magazines on the topic and trying to learn from successful projects.

A couple of approaches struck me at the time.

One of the consultants I was working with had a brilliant meta-model for an EDW that we tried to implement. It appealed to me because, with a data modelling background, the concept of a data entity called THING, related to itself by a CLASSIFICATION entity and implemented in such a way that you could build a completely flexible structure that just depended on data to allow navigation to answer database queries was ideologically perfect …. but practically impossible.

The 2nd approach was far more practical. Our client was facing competition with a 2nd mobile operator entering the market and there was a nervousness in the market regarding what would happen. It was inevitable that the 2nd operator would take market share from the incumbent but the question was how much and in what segments.

Our hypothesis was that we could help the incumbent manage its market share loss at the same time deliver better results to shareholders (improved EBIT) by ensuring a focus on more profitable customer segments and better targetted marketing. The implication of this hypothesis was a need to understand customer behaviour and develop a better segmentation model to allow targetted marketing.

You guessed it … the 2nd approach delivered the better result.

Fast Forward – the problem with Big Data

The experiences of the last year have made me wonder if I have learnt anything at all.

One client I was working with had a long list of Big Data use cases. In fact they confessed they were drowning in use cases. Interestingly they chose to use them as input to a Big Data tool selection process and creating a reference architecture to allow the use cases to be abstracted into a solution framework and it became just that …. abstract!

Another client I am working with has a single big ticket item that is crying out for a sophisticated analytics approach but has no desire to go in that direction. In short they are implementing a complex operational process involving 4 separate companies over a complex e2e process (100+ discrete steps and over 20 cross-organisation handoffs) and they lack reliable instrumentation for that process. The variety of data relevant to the process includes structured data from at least 6 major systems, real-time tracking of location data and a sophisticated overlay of radio network propagation data over photographic images taken in the field. In fact the volume of historical image data has the potential to swamp the e2e process.

My hypothesis that unites these 2 experiences is a left brain-right brain analogy.

If we classify the first experience as left brain thinking and the second experience as right brain thinking the first challenge is how to unite the two … assuming that the best approach to solving a complex problem is to be able to utilise different thinking approaches in tandem rather than conflicting with each other.

The Importance of a Strong Hypothesis

I was talking to a colleague recently about a successful Big Data project they had undertaken and it started with a rather obvious but interesting observation.

In this case the client was a quad-play telco that had an IP TV business and the observation related to people watching more TV on weekends and when the weather was bad. Obvious perhaps but how is it relevant to Big Data?

Well the next step is to examine the process of predicting bad weather and creating a hypothesis that if the weather can be predicted then perhaps IP TV viewing habits and volumes can also be predicted.

However it it the weather forecast or the actual weather that drives viewing habits?

Correlating two different sets of open data, the weather forecast at 1 day out and the actual weather on the day, with IP TV viewing habits creates the first insight. Potentially giving us a prediction methodology for both programs viewed and volumes.

When the forecasted bad weather coincides with a weekend there is a predictable change in behaviour.

When it also coincides with major events – like a sporting world cup – it demonstrates a different change.

And when you combine this prediction capability with the non-linear nature of IP TV the programming possibilities open up!

Why is this process a Strong Hypothesis?

First off it starts with something obvious or self-evident. Proving a correlation in such a case is not hard. Note that the initial hypothesis is not something that immediately comes to mind as a Big Data problem.

Second it invites deeper analysis. And that deeper analysis leads to conclusions that are aligned to the problem being solved. The deeper analysis tends to leverage Big Data better (more variety, more volume, more velocity perhaps?).

Elements of a Successful Approach

I’d like to suggest that there are 4 elements in a successful approach to Big Data projects. Admittedly my sample size is not large so I am expecting there is room for improvement but this is what I have learnt so far:

  1. It’s about process not data – getting results (in a business situation) from Big Data is usually about being able to improve or disrupt a business process. This means the starting point needs to be the process so as to be able to understand what data is needed.
  1. Have a strong hypothesis – the topic of this blog post.
  1. Take an Agile approach – possibly the most over-used word in software today but I have deliberately capitalised the word to indicate this is NOT about agility but it is all about some of the principles of the Agile Manifesto that I think are very relevant.
  1. Be prepared to fail – I’m not yet sure how to phrase this one but my thinking here relates to the challenge with business cases for Big Data projects where the insights you gain are unlikely to be predictable before you start. Therefore you have to be prepared that failure is a possibility … which, in many companies, equates to not starting in the first place!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s