Global Editions

Riding the New Wave of Data Science

By Ahmed Raza

A brief review of the limitations of big data in Pakistan.

Big data has been a big deal in marketing circles for quite some time. Most marketers agree, big data helps. It is one of the most effective ways decision makers and marketers can draw fresh insights on consumer behavior. It also adds greater value to their marketing campaigns based on those insights. Yet many companies still aren’t using big data as efficiently. This is perhaps, because they don’t fully understand what big data is or its benefits.

According to a Forbes Insights survey (2013), The Big Potential of Big Data, conducted in association with the big data AI firm Rocket Fuel has found that, “marketers (advertisers and corporate executives) who use big data generate more benefits from their initiatives than those who do not use it. They are more likely to generate helpful insights about consumers than organizations that are behind the data usage curve, and they are more likely to see gains in sales, the holy grail of marketing initiatives.”

Politicians the world over have borrowed this insight from the marketers to run their electioneering/media campaigns, creating their own big data about potential voters. It’s called “micro-targeting,” and it was a key element of the successful ‘Obama for America’ campaign and its unprecedented fundraising.

Governments are also benefitting from big data. President Obama’s big data initiative, Data.gov is in its seventh year for example. In health care, it’s working on harnessing genomic and personal health information to work on precision medicine, which the initiative describes as “an innovative approach to disease prevention and treatment that takes into account individual differences in people’s genes, environments, and lifestyles.”

In criminal and social justice, it’s working to improve policing by encouraging different states and precincts to share information in order to compare stop rates, search rates, and other metrics to “[give] data back to the officer to make better decisions.” Projects are also underway in in higher education and housing development, according to a Times news article, ‘Obama’s Chief Data Scientist Reveals How the Government Uses Big Data,’ by Tessa Berenson.

In Brief

  • Pakistan is the sixth most populous country in the world with an estimated population of over 180 million people. Wait. We have estimated stats here since we had our last population census in 1998!
  • In order to benefit from big data, researchers and scientists across the world are finding ways to use this vast pool of data in the most effective manner. While the developing countries like Pakistan, though also deluged with data, seem oblivious to the enormous potential big data has.
  • Population census is part of the Constitution of Pakistan. However it had been ignored over the years. Though the prevailing law and order situation is a challenge, those at the helm need to find ways to carry out this important exercise.

There are number of online applications available at the Data.gov. “Data USA is an online application developed by a team of data scientists at MIT Media Lab and Datawheel, backed by Deloitte. It is helping Americans visualize demographic and economic data using an open source platform. Where are the Jobs, another online application developed by SymSoft Solutions provides insights on employment trends and salary information by geographic data. The application won first place at the U.S. Department of Labor’s 2011 Occupational Employment Statistics Challenge. FarmPlenty, recipient of the USDA-Microsoft Innovation Challenge Grand Prize award, uses open government data to help farmers better analyze U.S. Department of Agriculture data on crops grown within a five mile radius of their farms.”

A deluge of data

The developed world is rapidly increasing reliance on new streams of big data and this is no more limited to analogue surveys and census data etc.

According to UN Global Pulse white paper “Big Data for Development: Challenges and Opportunities” (May 2012), the amount of available digital data at the global level is projected to increase by 40 percent annually in the next few years, which is about 40 times of the much-debated growth of the world’s population. This rate of growth means that the stock of digital data is expected to increase 44 times between 2007 and 2020, doubling every 20 months.

Today a massive amount of data is regularly being generated and flowing from various sources, through different channels. The world is indeed experiencing a data revolution, or ‘data deluge’ and this is transforming everything.

Read more: Social Media and Big Data in Politics

Researchers and scientists across the world are finding ways to use this vast pool of data for tracking development progress; improve social protection and understand where existing policies and programs require adjustment and improvement. From mobile phone call logs to banking transactions, online user-generated content to online shopping, web searches statistics to satellite images – all these sources offer real-time access to new datasets which were not available in the past and could not be analyzed in absence of related technology.

According to the white paper the data revolution is extremely recent (less than one decade old), extremely rapid (the growth is exponential) and immensely consequential for society, perhaps especially for developing countries. It is not restricted to the industrialized world; it is also happening in developing countries – and increasingly so. The spread of mobile-phone technology into the hands of billions of individuals over the past decade might be the single most significant change that has affected developing countries since the decolonization movement and the Green Revolution, the white paper underlines.

Big data: Pakistani context

Though during the last couple of years Pakistan has also started experimenting with these new data streams, the country lacks interest in collection and dissemination of data, primarily, due to political reasons. In contrast to the developed world and like many other developing states, Pakistan does not have authentic data for most sectors especially related to population, healthcare, education and economy even though, it has number of warehouses for collection and storage of its citizens’ data – for example, the Federal Board of Revenue (FBR), the Federal Bureau of Statistics, the National Accountability Bureau (NAB), the State Bank of Pakistan (SBP) and National Database and Regulatory Authority (NADRA) to name a few.

Pakistan is the sixth most populous country in the world with an estimated population of over 180 million people. It is the 26th largest economy in the world in terms of purchasing power parity (PPP) and 40th largest in terms of nominal gross domestic product (GDP). But still most socio-economic planning, budget making and distribution of resources in Pakistan is merely being done on estimates or on the basis of nearly two decades old census data.

It does not mean that the country does not have expertise or capacity to collect statistics or valuable data. Biometric re-verification of over 130 million mobile phone SIM cards in just a couple of months is a proof this potential. Now the government has directed NADRA to re-verify Computerized National Identity Cards (CNICs) of over 180 million Pakistanis to restore world’s confidence in the country’s national photo ID database.

wave-04

There are several other success stories, but when it comes to population census successive governments in Pakistan seem to have put it on the back burner.

In several countries, including Pakistan, census is a part of the constitution because regular counting of men, women and children helps governments in economic planning and development. Census data provides estimation of population density. It gives a holistic picture of changing patterns of rural and urban movement and concentration and tells governments about geographical distribution of population and its socio-economic characteristics. Subsequently, policymakers use this valuable data to align socio-economic and development policies according to the local needs of people. This data further helps in giving community representation in government and distribution of national, regional and local resources. According to Article 51 (5) of the Constitution of Pakistan “The seats in the National Assembly shall be allocated to each Province, the Federally Administered Tribal Areas and the Federal Capital on the basis of population in accordance with the last preceding census officially published.” Similarly the representation in provincial and local governments is also done on the same principle.

The government (including successive governments) and political parties in the country understand the significance of population census. But many researchers and social scientists believe that most of the political parties in Pakistan are afraid of census data because it can expose radical demographic changes taking place in the country. It can affect their decades-long hegemony over resources, including allocation of the national and provincial assembly seats, distribution of funds between the federation and provinces made through National Finance Commission (NFC) and the quota for recruitment to federal posts, etc.

Irtiqa Institute of Social Sciences (IISS), Karachi, had recently organized a seminar to sensitize the masses about the significance of population and household census. It emphasized that Sindh had witnessed an exponential population bust in recent decades. According to a Dawn report addressing the seminar Malir University of Science and Technology (MUST) Vice Chancellor Dr. Mehtab S. Karim expressed that being a port city and economic hub of Pakistan, nearly 40 percent of total migration in the country ended up in Karachi alone. Expressing doubts over 1998 population count, he suggested after adjusting the undercount Sindh’s population was perhaps about 25 percent of the country’s total population instead of 23 percent as reported in the 1998 census.

Read more: Unlocking Big Data for Electioneering

wave-01

Social scientists in Punjab also highlight similar problems. A recent study conducted by the Punjab Agriculture Department indicates the agriculture land in suburbs of Lahore is decreasing 10 percent annually, which means the population in the provincial capital is multiplying at a rapid pace.

But political leadership in the country has its own agenda. Political governments have proved these fears true through its actions. Previous government had promulgated the General Statistics (Reorganization) Act, 2011 that allowed the federal government to hold all kinds of census, including population, agriculture and others, on its convenience instead of regular intervals or legal obligation. The Article 31 of the General Statistics (Reorganization) Act, 2011 “The Federal Government may, from time to time, by notification in the official Gazette, declare that a census of population and housing conditions of Pakistan shall be taken by the Bureau during such period as may be specified therein.”

The first census in Pakistan was held in 1951 after independence in 1947. The second was conducted in 1961, while third census was done in 1972 instead of 1971 due to political environment in the country and war with India. The fourth census was conducted in March 1981 and fifth one, which was due in 1991, could only be held in March, 1998 owing to internal political crisis. The sixth decennial census has already been overdue since 2008. Earlier the government could not provide funds to hold census in March 2016 and now the military has shown its inability to deploy a large number of troops for security assistance to enumerators.

Being a frontline state in the ‘war on terror’, Pakistan has the most unique socio-economic and developmental challenges in the world. Earlier, successive governments put the decennial census on hold and now country’s security situation does not allow military to spare huge number of troops for population census.

Resource limitations

Population census is a tedious and expensive exercise that’s why statistical organizations around the world are looking towards new data sources. In a recent session of the United Nations Statistical Commission cost and quality of population surveys were prime concerns of almost all the statistical organizations.

The United States Census Bureau officials disclosed a principal goal for the US 2020 Census was to conduct census at a lower cost than the previous 2010 census while maintaining high quality results and for achieving the target the bureau is reengineering most of the census processes, including data collection techniques, methodologies and field structure. The bureau indicated increased reliance on big data sources would held country to save approximately $5.1 billion, as compared with 2010 census.

Specialists from across United Nations and academic institutions have recently published a comprehensive report titled: “Data for Development – A Needs Assessment for SDG Monitoring and Statistical Capacity Development” to estimate the resources required for statistical capacity development to support proposed Sustainable Development Goals (SDGs). The study estimates that a total of $1 billion per annum will be required to enable 77 of the world’s lower-income countries to catch-up and put in place statistical systems capable of supporting and measuring the SDGs, for the period 2016 to 2030.

wave-03

The global post-2015 development agenda and the SDGs has renewed interest in the quality and availability of statistics for management, program design and monitoring performance. Experts say that most of the necessary statistics are produced by national statistical systems in developing countries, and this data is a crucial component for good governance. Without information on where people live, how much they earn and what services they can access, it is impossible to respond to the populations’ needs. Therefore, improving statistics requires investment in national statistical capacity.

The advent of the Millennium Development Goals (MDGs) in 2000 had drawn attention to the many gaps in the statistical record. In 2003, the Partnership in Statistics for Development in the 21st Century (PARIS21) formed a task team to examine ways to improve support to the statistical systems needed to monitor development goals. The team’s findings would apply to many developing countries ten years on: “The [National] statistical systems are characterized by under-funding, reliance on donor support, particularly for household surveys, and very weak administrative data systems. Overall, there is a shortfall in funding for the core statistical systems required to provide information both for economic management and for monitoring the MDGs.”

While in the case of Pakistan, the situation is much more adverse. According to the International Development Association (IDA), a subsidiary of the World Bank aims to reduce poverty by providing loans and grants; Pakistan does not have any National Strategies for the Development of Statistic (NSDS) or aspirational plan to boost the capacity of the statistical system. Security situation in the country has further aggravated the cost of data collection as government estimates suggest the country will require around PKR 7.4 billion out of total PKR 14.5 billion for the security of enumerators for sixth population census in Pakistan.

Read more: Harnessing the Power of Data

Lack of political will and data regime

No matter whether it’s a political or military regime, governments in Pakistan always give preference to short-term gains over national wellbeing, which has resulted in that the country has to lowest tax-to-GDP ratio, stagnant at around 10-11 percent, compared to other Asian countries, like Sri Lanka (13 percent), India (16 percent), Indonesia (15 percent), Malaysia (14 percent), Thailand (17 percent), Philippines (14 percent) and South Korea (16 percent).

The Federal Board of Revenue (FBR) has repeatedly announced that it has gathered information through multiple sources and identified hundreds of thousands potential taxpayers but it could never tap this potential, despite having access to the world’s largest multi-biometric database of over 91 million or 96 percent of the adult population of Pakistan. It has unrestricted access to other data sources, like financial transactions, travel records, utility bills statistics, etc., but still it has failed to achieve desired results. Even two tax amnesty schemes have failed to increase the tax base in Pakistan.

wave-02

Tax experts believe most of these schemes are doomed to fail due to politics. Former President of Lahore Tax Bar Association Muhammad Awais underlines political solutions to economic problems always produce similar results.

Punjab Finance Minister Dr. Ayesha Ghaus Pasha terms Pakistan’s low tax-to-GDP ratio as a story of some success and some failures in tax reforms. In one of her papers (2010) ‘Can Pakistan Get Out of the Low Tax-to-GDP Trap?’ she writes “It appears that the country has been able to only partially compensate for the loss of revenue resulting from the trade liberalization process initiated almost two decades ago. In particular, the country lost an historic opportunity for achieving a significant jump in tax revenue in a period of fast economic growth during the last decade.”

The way forward, she presented in the paper, is the implementation of a resource mobilization strategy which has three pillars: expansion of the current GST to cover services, exempt and zero-rated sectors; improvement of direct/income tax administration; and enhancement in the provincial tax- to-GDP ratio. These measures will constitute the first step at bringing the country out of low tax-to-GDP trap. However, the success will crucially hinge on the political will to bring improvements in tax administration, adopt rational tax policies and promote higher tax compliance.

Similar is the situation in healthcare and education sector. After eighteenth constitutional amendment, now it is the responsibility of the provincial governments to ensure provision basic healthcare and education facilities to their citizens. But they (provincial governments) strongly lack in both areas owing to non-availability of reliable statistics. However the Punjab government seems to be moving in the right direction by embracing the data regime. Today it’s monitoring and evaluation assistants visit thousands of schools and healthcare facilities across the province and collect and present real-time data on facilities and attendance and presence of teachers and students and other staff.

Read more: From Gut Instincts to Data-Driven Decision Making

Big data for development

Contrary to the traditional survey techniques, the rapid growth of Internet and electronic environment has created new sources of data collection and the world is researching new ways to utilize these new data streams, like online data repositories, social media and transactional data sets, etc., to increase efficiencies and reduce costs and time to gain significant insights from these statistics.

In developed world, the governments have increased reliance on these new data streams and administrative records as most economies in the western world are well documented. Political parties are using big data for electioneering while corporate and financial sectors are using it for marketing purposes and evaluating consumer behaviors.

But in developing countries like Pakistan, traditional statistics system produces most data required for socio-economic planning and development while big data is an evolving phenomenon. Generally, governments and private sector, except telecom and financial services, rely on traditional data sources because most socio-economic fields are undocumented and lack reliable statistics.

Ahmed Raza has an abiding interest in Technology. He writes under a different pen name for leading dailies.

Tags:
Authors
Tags

Related posts

  • Edward A. Meinert

    which industries will benefit the most from this “new wave”?

    • Mostly the public sector will benefit – for example, they can make policy decisions on health, education and development based on actual data analysis.

  • Sarmad Haider

    Arn’t the development decisions based on maximum kickbacks for the politicians? Why would they use science or data analysis?

Top