原創譯文 | 大數據分析:最難的不是分析,而是大數據













2016年美國總統大選的預測分析,很好地證明了數據質量的重要性。在當時的預測中,大多數數據是基於州級和國家級的電話投票進行的。但是電話調查中很容易出現無人接聽的現象,而各州無人接聽的佔比率也存在著很大的區別,這會很大程度上影響選舉團的預測(選舉團制度是美國特有的一種選舉方式, 選民在大選日投票時,不僅要在總統候選人當中選擇,而且要選出代表50個州和華盛頓特區的538名選舉人,以組成選舉團。當選的選舉人必須宣誓在選舉團投票時把票投給在該州獲勝的候選人。美國總統由選舉團選舉產生,並非由選民直接選舉產生,獲得半數以上選舉人票者當選總統),結果就是,傾斜的數據產生錯誤的預測。



The Hardest Part of Analytics Isn』tAnalysis. It』s Data

From advanced BI tooling to machinelearning and artificial intelligence, modern businesses have more ways thanever to slice and dice their data. As data scientists and business leaders alikefixate on the great potential of these new technologies, we risk losing sightof what』s most important: the data itself. After all, fancy visualizations andpredictive analytics don』t matter without the right data powering them.

Every single business needs to prioritizecollecting and structuring their underlying data over the analysis they use tounderstand it. Here』s why:

Data will be ingrained in every part ofhow we do business

Companies have just begun to grasp not onlythe complexity of data, but also the depth of its relationship with their ownemployees. All business roles and levels need to make good decisions, and thebest decisions are made with user data. Thus, every department – not just thedata science team – should have access to that information, from product tocustomer service to sales.

It』s no longer enough to just reviewtopline metrics at a monthly all-hands meeting. Organizations must infusedata-driven processes into their decision-making. Take a modern marketing team,for example. Marketers today have a multitude of rich data sources at theirdisposal, especially with the explosion of smartphones, tablets, social mediaplatforms and digital touchpoints through which a brand can interact with itsaudience. If all of this data is collected into a central place, it opens uppowerful new ways of understanding long-term customer behavior. Otherdepartments like sales, product, and customer success similarly have access toan unprecedented amount of data.

Every bit of data contributes to thebigger picture

As data plays a bigger role across everydepartment and level, businesses must consider all of its data as a growingcollection of opportunities. Every dataset – CRM, CMS, ERP, marketing software– contains a multitude of possible insights. Findings that seem insignificantnow might matter a great deal down the road. It』s impossible to know upfrontwhat data matters, so businesses need to collect as much of it as they can.This lets companies retroactively unearth insights, even if their priorities ormarket conditions change.

Insights are only as good as theunderlying data

Data quality is king. Bad data leads to badresults. If you base your decisions on incomplete data, it becomes harder totrust the results, and it ultimately erodes confidence in a data-drivenculture. Clean, complete, and correct data is necessary for generatingactionable insights.

We saw this with the 2016 presidentialelection. Most predictions were based on national and state-level pollingresults conducted over the phone. But phone surveys are especially susceptibleto nonresponse bias, which itself varies wildly from state to state. Thisaffects the forecast for the Electoral College more than the overall popularvote, yet the Electoral College is what wins elections. The result? Skewed dataproducing the wrong prediction.

Machinelearning has received a great deal of hype, and for good reason. But it cannotlive up to its bold potential unless it』s informed by a strong foundation:clean, complete data produced by an organization that ingrains data into itsculture. The term 「data-driven」 has been around for years, but in today』sfast-paced and increasingly digital economy, it will need to become a culturalmandate for companies everywhere.

