Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

UN tackles socio-economic crises with big data

Julia King | June 4, 2013
United Nations researchers had a sobering realization in 2010. For all of the official data and reports collected by the group's member nations and various UN programs and agencies, precious little of the data that supports the organization's operations was truly up to date.

To achieve this goal, Persistent Systems Ltd., a 2013 Computerworld Honors Laureate, set about monitoring and analyzing a massive amount of data it collected from social media channels immediately after each 90-minute episode of the program aired.

"The show is a cross between Oprah and 60 Minutes," explains Mukund Deshpande, head of BI/analytics at Persistent Systems. "The goal was to use social media to connect directly with people and close the loop as a way to have a conversation with viewers."

The show was carried on 13 TV channels in India, and each episode was posted to YouTube within 30 minutes of its airing. Each show immediately elicited millions of messages on Facebook, Twitter and other online discussion forums. The challenge, says Deshpande, was to make sense of long, complex messages that were very emotional and often contained stories of people's personal encounters with abuse.

This created a big-data problem both in terms of volume and network performance. The show was flooded with a staggering 1.09 billion impressions across social channels. All structured and unstructured data was analyzed in real time to convey the show's impact on legislation, society and individuals, which was displayed on a so-called impact dashboard.

Persistent Systems designed and developed the custom end-to-end analytics process in three weeks. The project was implemented using the latest distributed computing technology and Hadoop.

Adding to the unstructured data challenge, social media responses were in "Hinglish" (Hindi words in Roman script embedded in English). This ruled out using existing tools to handle messages, which is why developers created a customized system to understand response sentiment.

Deep analytics extracted valuable insights, Deshpande says. The new system aggregated all unstructured data then automatically filtered it to weed out spam and unrelated messages. Valid messages were tagged and rated. Short messages praising the show were rated lower than longer messages and personal stories. Final selection was done manually using triangulation to determine the top content.

Deshpande says that social scientists have expressed interest in using a similar process to conduct a new kind of social research. "Usually, they form a small group of people and study them intensely for three to six months," he notes. "But what we have here is exactly the opposite of that. We don't have rich data about a small number of individuals but data about millions of people, including their age and gender and how they feel about particular issues. It would be a new way to do social science research."


Previous Page  1  2  3  4 

Sign up for MIS Asia eNewsletters.