lunes, 2 de marzo de 2015

Big Data needs better questions | Open Data

Big Data needs better questions | Open Data



Big Data needs better questions

Elizabeth Sabet's picture
The term "big data" is much in the news lately – alternatingly touted as the next silver bullet potentially containing answers to myriad questions on natural and human dynamics, and dismissed by others as hype.  We are only beginning to discover what value exists in the vast quantities of information we have today, and how we are now capable of generating, storing, and analyzing this information. But how can we begin to extract that value?  More importantly, how can we begin to apply it to improving the human condition by promoting development and reducing poverty?

That is precisely the question that motivated the World Bank Group and Second Muse to collaborate on the recently released report Big Data in Action for Development. Interviews with big data practitioners around the world and an extensive review of literature on the topic led us to some surprising answers.
Good questions help define scope of analysis, identify key behaviors
It is a common assumption that in order to engage effectively with big data, you have to start with the data itself and let them "speak".  It turns out, most practitioners disagree.  We heard time and again from experts in the field that any work with big data must begin first with questions.  As opposed to being led by whatever dataset is available, starting with questions allows practitioners to define the setting and scope of their analysis and identify the behaviors or conditions in the world that interest them.  Questions help practitioners determine why they are seeking data and identify the media generating the data relevant to their purpose and scope.

Big Data for Development ReportIn Big Data in Action for Development, we note that the purpose of most big data projects fall into three related categories: awareness, understanding, and forecasting. To share a few examples:
  • Real-time information and awareness regarding the extent of the damage resulting from Typhoon Haiyan in the Philippines provided insight into the optimal direction of response efforts, while access to data raised awareness of the extent of mobile money transfers in Kenya and was able to inform changes in banking policy in that country. 
  • Mexico's pilot project tracking population movements in response to the spread of epidemic disease deepened understanding of those dynamics, informing the need for policy levers that could reduce infection rates.
  • Assessing sentiments of "confusion" in conversations about employment in online forums in Ireland forecasted unemployment increases three months earlier than official statistics.
Awareness, understanding, and forecasting
These categories can give shape to the formulation of questions. If you're interested in the changing price of wheat in a given country, big data may be used to answer one of the following questions:
  • How much are farmers currently receiving for the wheat they are selling?  (Awareness)
  • What is driving changes in wheat purchase prices? (Understanding)
  • What will wheat purchase prices be next month? (Forecasting)
Combining datasets can reveal insights
A well-articulated series of questions and a purpose help inform the selection of relevant data mediums.  Mediums that provide effective sources of big data include satellite, mobile phones, social media, internet text, internet search queries, financial transactions, among others.

As the examples below illustrate, by cross-referencing primary media with the primary purpose of the big data, big data projects can take on a great variety of configurations depending on the context. Carefully combining datasets from various sources to create "mashups" can reveal further insights.

No hay comentarios: