Menu Close

A 10-point plan for managing big data

Analytics can help you make better decisions – but only in the context of an effective big data strategy. 

How you manage and process your big data determines how useful it is. Writing for MIT Sloan, Sara Brown draws on a talk by database research pioneer Michael Stonebraker at last year’s MIT Citi Conference. In his talk, Stonebraker highlighted the mistakes businesses make when formulating big data strategy. Here, Sara Brown presents a ten point blueprint for a big data strategy that really works.


1. Embrace the cloud. Moving all your data to a public (or private) cloud, such as Amazon, gives you economical storage with specialised back-up, often with better infrastructure and security than you could hope to achieve in house. As Stonebraker says: “They’re [cloud providers] deploying servers by the millions; you’re deploying them by the tens of thousands.”

2. Prepare for AI and machine learning disruption. Accept that in many industries AI disruption is inevitable and it will mean some workers being replaced. You must act now to snap up the expert talent you’ll need for your organisation to make the new tech work for it.

3. Create a clear data cleaning strategy. Simply having data scientists on staff doesn’t mean you’ll be ahead of the game. Instead of keeping on top of the latest data science and developments in machine learning, these staff often spend 99% of their time on data discovery, integration and the vital task of cleaning and fixing cleaning errors. Appointing a chief data officer can help to manage these necessary tasks more effectively, freeing staff to use their time to move the data strategy forward.

4. Look for innovative, agile, tech solutions. Solving your data cleaning issue won’t come easily if you’re relying on a time-consuming and capacity-limited traditional processes like ETL (extract, transform, load), which require lots of human input and a host of rule systems. When General Electric’s staff were struggling to classify 20 million spending transactions, Tamr – a company co-founded by Stonebraker – created a machine learning model to do the job quickly and efficiently.

5. Use your data warehouse sensibly. Data warehousing stores information from various sources across your organisation for purposes of reporting and analysis, but this is often an expensive option and is not a sensible long-term solution. As Stonebraker says, “Just remember, always, that your warehouse is going to move to the cloud.”

6. Acknowledge the limitations of open source. It’s fine to use Apache’s Hadoop open-source software collection or their Spark analytic engine, but these won’t answer all your needs, particularly in the important area of data integration. As Stonebraker says, especially in the case of high-level business functions, “you should be looking at best-of-breed technologies, not the lowest common denominator”.

7. Pay for a robust data curation system. If you want useful information from your data, you must make sure your central data repository, or lake, doesn’t become a swamp. The data gets murky when, for example, mismatching terminology is used – one input listing ‘wages’ and the other ‘salary’.

“The net result is your analytics will be garbage, and your machine learning models will fail. Garbage in, garbage out.” Source a curation system that can fix these errors, potentially one from a forward-thinking startup.

8. Let your best talent step up to the mark. Instead of hiring in big data analytics firms for new work, outsource the more tedious routine jobs like maintenance instead. Giving gifted employees a chance to show their strengths will help you retain your most creative team members. “Shiny new stuff gets outsourced, often because there is no appropriate talent internally, or because your best people are stuck keeping your accounts receivable system up.”

9. Let go of legacy systems. When the time is right, don’t hesitate – even if it seems risky. “You have to make all kinds of bets on the future, and a bunch of them are going to require you to give up at least some piece of your current business model and reinvent yourself. You simply have to be willing to do that in any high-tech field.”

10. Invest in some superstars. You need highly skilled team members who can lead the way in big data processing disruption. Be prepared to pay them what they are worth.

If you want to extract really useful, transformational information from your big data, then it’s time to dodge – or repair – the common blunders underlined by Stonebraker and start creating an effective big data strategy.

Source Article: Ten Big Data Blunders Businesses Should Avoid
Author(s): Sara Brown
Publisher: MIT Sloan