November 2008 will long be remembered as the beginning of an historic change in the U.S. and the world politics. This was the year when a relatively unknown junior senator, an African American to boot, was first elected President of the United States of America. “The arrival of Barack Hussein Obama signaled a change in not only American Politics but also in American voting patterns,” writes Robert McGuigan Burns in an essay titled ‘Barack Obama: How an ‘Unknown’ Senator Became President of the USA.’
November 2012, he managed to do it again. “Obama’s campaign began the election year confident it knew the name of every one of the 69,456,897 Americans whose votes had put him in the White House,” writes Sasha Issenberg for MIT Technology Review in an article titled, A More Perfect Union.
How did he do it? “With the advantage of knowing the course of events since the 2008 presidential election, it is now evident that Barack Obama’s candidacy and campaign were fundamental in changing electoral politics in the United States,” asserts Rebecca Pineiro, in an article titled E-Electioneering: The Political and Cultural Influence of Social Media in the 2008 and 2012 Presidential Elections. “His campaign was the first in which the Internet was effectively harnessed as a tool to reach voters and collect information from the existing online databases. The digitalization of campaigns propelled voters to learn about the candidates and issues, and encouraged Americans to become politically socialized online due to the rapid dissemination and careful targeting of social media straight from the campaign itself,” she explains.
- The use of big data analytics and technology has helped political parties and individuals to attract and influence voters in different countries.
- Experts believe this is right time for political parties in Pakistan to explore the possibilities of taking advantage of big data in 2018 general elections.
- Election campaigns backed by big data and technology have proved incubators for new companies, especially in U.S. This might happen in Pakistan too!
Jim Messina, Obama’s 2012 election campaign manager hired “an analytics department five times as large as that of the 2008 campaign.” For its Chicago headquarters, he hired a chief scientist named Rayid Ghani who in his previous life had “crunched huge data sets for supermarket sales promotions” among other things with only one brief from his boss, Jim Messina, “measure every single thing in this campaign.”
“For all the praise Obama’s team won in 2008 for its high-tech wizardry, its success masked a huge weakness: too many databases,” underlines Michael Scherer for Times article titled ‘Inside the Secret World of the Data Crunchers Who Helped Obama Win.’ “Back then, volunteers making phone calls through the Obama website were working off lists that differed from the lists used by callers in the campaign office. Get-out-the-vote lists were never reconciled with fundraising lists. It was like the FBI and the CIA before 9/11: the two camps never shared data,” he continues. “So over the first 18 months,” says Scherer, “the campaign started over, creating a single massive system that could merge the information collected from pollsters, fundraisers, field workers and consumer databases as well as social-media and mobile contacts with the main Democratic voter files in the swing states.”
Read more: Social Media and Big Data in Politics
“At Obama for America, Ghani built statistical models that assessed each voter along five axes,” reports Ted Greenwald for the MIT Technology Review in an article titled Data Won the Election. Can It Save the World? “Support for the president, susceptibility to being persuaded to support the president, and the likelihood of donating money, of volunteering, and of actually casting a vote. These models allowed the campaign to target door knocks, phone calls, TV spots, and online ads to where they were most likely to benefit Obama,” he explains.
Data about voters in electioneering was key. “Computing hardware used to be a capital asset, while data wasn’t thought of as an asset in the same way,” says Erik Brynjolfsson, Director MIT Initiative on the Digital Economy. “Now hardware is becoming a service people buy in real time, and the lasting asset is the data,” he says. “For most companies, their data is their single biggest asset,” concurs Andrew W. Lo, Director MIT Laboratory for Financial Engineering. “Many CEOs in the Fortune 500 don’t fully appreciate this fact,” he comments. “More and more important assets in the economy are composed of bits instead of atoms,” notes Erik Brynjolfsson. A 2011 study conducted by Brynjolfsson and colleagues at MIT and the University of Pennsylvania supports the concept of data as a capital asset. The researchers concluded based on a sample of 180 large public companies, that businesses that emphasize “data-driven decision making” performed highest in terms of output and productivity, typically “5 to 6 percent higher than what would be expected, given their other investments and information technology usage” said the report. Source: The Rise of Data Capital.
Can the power of big data and data science technology do the same for Pakistan?
Are the political parties in Pakistan ready to gain advantage in politics and elections by leveraging big data and do we have a conducive environment? For elections, political parties today do not just rely on offline information—such as voters’ lists—rather the big data in politics is more about following the digital footprints of the individuals (the voters) and capturing their sentiments and emotions through direct interaction on a mass level using technology and by doing so to develop campaigns to influence them.
The short answer? So far, there is no significant mention of the use of data analytics and technology in elections and politics in Pakistan. On the corporate front, many companies in Pakistan, however, have now started looking towards data analytics for more informed business decisions and to acquire new customers.
Big data and politics
David W. Nickerson is Associate Professor of Political Science, University of Notre Dame, South Bend, Indiana. He served as the “Director of Experiments” in the Analytics Department in the 2012 reelection campaign of President Barack Obama.
Todd Rogers is Assistant Professor of Public Policy, Harvard Kennedy School of Government, Cambridge, Massachusetts. He co-founded the Analyst Institute, which uses field experiments and behavioral science insights to develop best practices in progressive political communications.
In their joint paper “Political Campaigns and Big Data” in the Journal of Economic Perspectives (Vol. 28, No. 2, pp. 51-73) the two scholars note that contemporary campaigns use data in a number of creative ways. However according to the authors the primary purpose of political data has been and will be for the foreseeable future ‘providing a list of citizens to contact.’
The abstract from the paper reads: “Modern campaigns develop databases of detailed information about citizens to inform electoral strategy and to guide tactical efforts. Despite sensational reports about the value of individual consumer data, the most valuable information campaigns come from the behaviors and direct responses provided by citizens themselves. Campaign data analysts develop models using this information to produce individual-level predictions about citizens’ likelihoods of performing certain political behaviors, of supporting candidates and issues, and of changing their support conditional on being targeted with specific campaign interventions. The use of these predictive scores has increased dramatically since 2004, and their use could yield sizable gains to campaigns that harness them.”
In a feature—Why “Big Data” Is a Big Deal—in the Harvard magazine, Gary King who is the Albert J. Weatherhead III University Professor at Harvard University, has been quoted as saying “…it is not the quantity of data that is revolutionary. The big data revolution is that now we can do something with the data.”
Read more: Harnessing the Power of Data
The Indian experience
The victory of Bharatiya Janata Party (BJP) in India’s general elections 2014 is widely attributed to the use of big data which, is believed to have, helped the party to better understand the electorate and develop an efficient campaign accordingly.
According to the website of Election Commission of India, the 2014 Lok Sabha election involved 543 constituencies, over 8,000 candidates and over 800 million electors (voters). This information together with the candidates’ details including their assets declarations, other records and their followers on social media and voters’ details certainly make this all a big data thing.
BJP took advantage of this enormous information by establishing ‘a digital war room,’ involving over a hundred techies and experts and by hiring some big data firms and consultants analyzing all this data for campaign and voter engagement and digitally staying connected with the voters and reacting faster to any controversies.
“Unlike earlier elections, BJP stood apart as they marshalled technocrats into their team to add more bite to their campaign and spot trends to react faster than its arch rival, the Indian National Congress,” reported The Economic Times in an article on the BJP’s successful election campaign.
All this did not end with the elections. The Modi-led government launched different big data initiatives like MyGov with the slogan—‘Developing and Transforming India’—and platforms like Open Government Data (OGD) to directly and digitally engaging the masses in the governance.
Narendra Modi clearly realizing the potential and power of technology also launched “Startup India”—initiative which is aimed at ‘fostering entrepreneurship and promoting innovation by creating an ecosystem that is conducive for growth of startups.’
The Indian elections also witnessed the rise of related startups and many of these were widely used by the political parties particularly the BJP for political mileage. Voxta—dubbed as Political Siri—had launched a Hindi speech recognition technology based interactive phone line for Modi. According to Voxta “People were able to talk to Modiji about the issues that concerned them, their words were recognized, and the right clip of Modiji was played to them.”
The trickle-down effect
Rabindra Kumar Jena was elected to the 16th Lok Sabha (lower house of India’s Bicameral-Parliament) from Balasore constituency in Odisha, India in 2014.
An increasing number of parliamentarians in India are now employing big data and analytics to improve and strengthen political decision-making and governance, RK Jena is one of them.
According to a report, volunteers from Bharat Gyan Vigyan Samiti—a nonprofit voluntary outfit—are collecting the village level data on an app on tablets in RK Jena’s constituency. The data, on a wide range of subjects including schools, health facilities, roads, transport and agriculture etc., is being analyzed by a not-for-profit development organization, Swaniti Initiative, with the ultimate goal to draft a development plan for the lawmaker.
No wonder if the parliamentarian meets the expectations of his constituents by following and implementing the development plan which, in fact, directly comes from the people, the real stakeholders. The informed decisions by the parliamentarian based on the meaningful insights derived from data are certainly going to add to his political profile in the years to come.
Read more: Harnessing the Power of Data
What is big data?
The Big Data Initiative at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) defines Big Data as the “data that is too big, too fast, or too hard for existing tools to process.”
By “too big”, CSAIL means that organizations increasingly have to deal with petabyte-scale collections of data, which come from click streams, transaction records, sensors, and many other places. By “too fast” it means that not only is data big, but it needs to be processed quickly and finally “too hard” is a catchall for data that doesn’t fit neatly into an existing processing tool, i.e., data that needs more complex analysis than existing tools can readily provide.
According to IBM, big data is arriving from multiple sources at an alarming velocity, volume and variety and in order to extract meaningful value from big data, we need optimal processing power, analytics capabilities and skills.
Primarily because of the Internet, it is generally argued that 90 percent of the data in the world today has been created in the last few years alone. This makes it fairly easy to understand the enormous amounts of data that is being generated on a daily basis today. A simple online search reveals that more than 2.5 Exabytes—equivalent to 2.5 billion Gigabytes (GB)—of data is being generated every single day!
Where does big data come from?
Mainly this treasure trove of information is coming from the user-generated data from the Internet—in real time—though almost everything around us today is generating data. The data is being generated though log files, receipts, tax returns and devices including sensors, mobiles, ATMs and even satellites.
MIT CSAIL: Big data comes from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media – much of it generated in real time and in a very large scale.
IBM: Big data is being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it.
A good example to understand big data being generated every day in the local context is the data about thousands of vehicles plying the Motorways and National Highways of Pakistan on a daily basis. This data is not just limited to entry and exit details of the vehicles and if integrated with data related to accidents and issuance of traffic violation tickets this can uncover a number of significant insights.
Another example could be the visits of Monitoring and Evaluation Assistants (MEAs) to public schools in the country’s Punjab province who collect and upload real-time data about presence of teachers, students’ attendance, schools’ facilities and cleanliness etc. On a monthly basis they visit over 52,000 public schools where more than 10 million students are enrolled and around 400,000 teachers are employed besides thousands of non-teaching staff. Once integrated with students’ enrolment and teachers’ information management system, presently being developed by the Punjab Information Technology Board (PITB), these datasets indeed fall in the category of big data and meaningful information could be retrieved for informed policy decisions.
Today the Internet of Things (IoT) is triggering a massive influx of big data with unbelievable speed while the global view of data is also changing rapidly because of the social media such as Facebook, Twitter and YouTube, to name a few.
It is primarily this social data together with mass level surveys and other information available that is believed to have revolutionized how people participate in the political process these days.
The political parties take advantage of this treasure trove of data through analytics to gain actionable insights and develop effective strategies accordingly. The main aim is to capture public sentiments and emotions and engage them on a mass level through technology with the help of a bunch of experts and active volunteers on the ground.
What is big data analytics and its market size?
In simple words, data analytics is the process of uncovering insights and hidden trends and patterns which help individuals or companies to make informed decisions. Similarly, the big data analytics is the use of advanced analytic techniques to unlock the enormous data both in structured and unstructured forms for gaining actionable insights and subsequently developing new strategy and making more informed decisions.
According to IBM analyzing big data allows analysts, researchers, and business users to make better and faster decisions using data that was previously inaccessible or unusable. “Using advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics, and natural language processing, businesses can analyze previously untapped data sources independent or together with their existing enterprise data to gain new insights resulting in significantly better and faster decisions.”
An in-depth analysis of last five or ten years of traffic related data on the motorways and highways is a simple example understanding big data analytics in the local context. If integrated, this data can provide meaningful insights (among many) like which section/s of the motorways or highways witnessed more traffic accidents and which gender or age group drivers were issued more tickets for traffic violation and so much so the time factor could be noted in the process. Even the data analytics can tell you the impact of weather vis-à-vis accidents. You can know performance of patrol police and incidents of highway robbery; like when and where did these happen the most? Eventually this data analytics helps identify significant insights for subsequent future planning and actions.
As an enormous amount of data is being generated every day and new related technologies are emerging from it, it is not hard to visualize the ever growing market for big data. According to a report by Grand View Research, Inc.,—a U.S. based market research and consulting company—the global big data market size was valued at over $27 billion in 2014 while this is expected to reach $72.38 billion by the year 2022.
It was in 2012 when Harvard Business Review termed data scientist as ‘the sexiest job of the 21st century’ as title of one of its reports and still almost everyone refers to this expression when it comes to the growing demand of data scientists in the Information Age.
While businesses in the developed countries have since long been leveraging technology for commercial gains by engaging statisticians and computer technology, the role of data scientists in developing campaign strategies to mobilize supporters, gauge public attitudes and influence voters is becoming crucial with the passage of time. There is a growing debate on this scarce breed and more and more universities are now focusing on related academic programs to meet this shortage.
The U.S. experience
While the 2008 and 2012 campaigns by Obama are dubbed as the most data-savvy in history, the latter is acknowledged, widely, for being more refined and effective. One of the reasons experts believe for this was heavy investment in big data analytics and technology.
There are reports that as soon as Barack Obama took the oath of office in the beginning of 2009, his team started laying down the groundwork for the 2012 reelection campaign. This is evident from the kind of research the Obama team did to improve the campaign strategy in order to secure the presidential victory for the next term.
Experts and analysts see more sophisticated data efforts in campaign by some candidates, particularly Hillary Clinton, for the current presidential election. An effective campaign powered by data crunchers is believed to prove a game changer this time too.
Today different social networks including social media giant Facebook are offering special tools for election campaigns and obviously this is being done by using enormous user generated data.
The 2008 and 2012 presidential elections also witnessed the rise of a phenomenon in election prediction—Nate Silver. Today his political calculus, FiveThirtyEight—which is a reference to the number of votes in the Electoral College—‘uses statistical analysis (hard numbers) to tell compelling stories about elections, politics, sports, science, economics and culture.’
With an aim to increase public participation in the democratic dialogue, the Obama Administration launched Data.gov initiative in 2009 where thousands of datasets are today available for public from agriculture to energy and education to science and research. “Open government data is important because the more accessible, discoverable, and usable data is, the more impact it can have,” says the open government website. Today this open government data is the power behind many software applications helping people to make informed decisions.
In March 2012, the Obama Administration announced a “Big Data Research and Development Initiative”—with an aim ‘to make the most of the fast-growing volume of digital data’ and ‘to develop big data technologies, demonstrate applications of big data, and train the next generation of data scientists’.
Now in May 2016, the Administration has issued strategic plan for ‘Big Data Research and Development (R&D) Initiative’—being dubbed as ‘an important milestone in the Administration’s data-science efforts’—and made open data ‘the new default for Federal agencies.’
The data-based campaigning prompted many of those involved to institutionalize the tools and technologies they used while developing campaign strategies. Civis Analytics—‘born on the campaign trail’—a Chicago-based startup is one such example. It was found by Dan Wagner who served as the Chief Analytics Officer on the Obama for America 2012 campaign and ‘more than a third of the original team works at Civis today.’
While the concept of polls on business and commercial campaigns is entrenched in the developing countries, the same is deep rooted when we talk about polls on political campaigns in the developing countries especially in the U.S. This is, in fact, an industry and firms and individuals do polls on almost everything to gain insights of individuals on a mass level. The parties engage public-opinion experts and firms and conduct polls on the election campaigns while there are many companies and firms including media houses which conduct such polls independently to know more and more about the attitudes and behaviors of the voters and predict the results. It is also generally believed that sometimes the opinion polls can also influence the behavior of voters.
Big data for development
In the words of the Economist’s technology editor, Ludwig Siegele, “The internet and the availability of huge piles of data on everyone and everything are transforming the democratic process, just as they are upending many industries. They are becoming a force in all kinds of things, from running election campaigns and organizing protest movements to improving public policy and the delivery of services.”
Experts believe that open data initiatives by the governments are good to keep a direct and constant contact with the people while these can also help governments to influence the users in future elections.
According to a World Bank report “Big Data in Action for Development” big data can be used for capturing population sentiments while making good use of big data requires collaboration of various actors including ‘data scientists and practitioners, leveraging their strengths to understand the technical possibilities as well as the context within which insights can be practically implemented.’
According to the report, the mediums that provide effective sources for big data include (among others) satellite, mobile phone, social media, internet text, internet search queries, and financial transactions. However it also observes, “Added benefits accrue when data from various sources are carefully combined to create ‘mashups’ which may reveal new insights.”
Pakistan electoral system
When it comes to electioneering in Pakistan, we all know of maximum a dozen of political parties. So this might come as a surprise to many that there are 270 political parties enlisted with the Election Commission of Pakistan (ECP).
Pakistan’s Electoral College consists of the Senate, the National Assembly of Pakistan and the Provincial Assemblies and while the members of the National Assembly and Provincial Assemblies are directly elected by the people in general elections, the members of the Senate are indirectly elected by the provincial assemblies.
The National Assembly members elect the Prime Minister—the head of government while the President—the head of state—is elected though the Electoral College.
The National Assembly has a total of 342 members, including 60 seats reserved for women and 10 for non-Muslims. The seats in the National Assembly are allocated to each province, the Federally Administered Tribal Areas (FATA) and the Federal Capital on the basis of population. Out of these 272 members are directly elected by the people.
While none of the official website of the provincial assemblies gives stats about its total seats, according to Wikipedia, Punjab Assembly has a total of 371 seats including 66 for women and 8 reserved for non-Muslims.
Khyber-Pakhtunkhwa Assembly has 124 seats in total out of which 22 seats reserved for women and 3 seats for non-Muslims.
The total number of seats for the Sindh Assembly is 168 of which 29 seats are reserved for women and 9 seats for non-Muslims.
Balochistan Assembly has 65 seats in total out of which 11 seats are reserved for women and 3 for non-Muslims.
According to the ECP the overall voter turnout in general elections 2013 the National Assembly witnessed 55.02 percent turnout while for provincial assemblies the voter turnout was 55.26. There were a total of 86,189,802 registered voters in 2013 general elections.
Lessons for Pakistan
Murtaza Haider, PhD, is an associate professor at the Ted Rogers School of Management, Ryerson University, in Toronto and author of the book “Getting Started with Data Science: Making Sense of Data with Analytics.”
According to Haider, good data is not publically available in Pakistan while what we lack more are consumer and public research surveys. “The political parties in Pakistan rely on muscle power more than efforts to bring latent power out of data. The parties’ use of social media suggests that they are in the Stone Age and they intend to stay there,” says the academic and suggests “The parties (in Pakistan) have to start and document membership drives, hold internal elections, and then use their data to determine where they lag and where they lead.”
The fact that unlike India, which had its last population census in 2011, the last census in Pakistan was held in 1998 irks Haider. “Our sampling frames are based on 18 year old data. With high fertility rates, massive internal migration and displacement, Pakistan’s local demographics could remarkably change in 18 years. Hence, any attempt to draw a nationally representative sample will face huge, probably insurmountable, challenges,” he adds.
According to Haider, in order to get big data working for Pakistan, we need to hold the census and in the meanwhile, NADRA data and the household and housing data collected for the Benazir Income Support Program must be released for public use. It will help with development planning and a side benefit will be using the same data for devising electoral strategies. “However, this will not reach the same sophistication as is prevalent in the US where they used data science to determine which restaurant patrons are more likely to vote for President Obama,” he declares.
Khalid Khattak is a journalist working for The News International. Lately Khalid got involved in data journalism and also founded Pakistan’s first independent data journalism website http://www.datastories.pk/.