Global Editions

A Look at Data Collection and Tabulation Methodology in Census 2017

Photo Credit: AFP
By Aima Khosa

PBS is using an Optical Character Recognition technology to convert manually collected data into machine-readable format, making tabulation more efficient and cost-effective.

A population and household census is underway in the country after a delay of nine years. In the two decades since the last census of 1998, there has been a great transformation in ways in which census data is collected and analyzed across the globe.

Historically, censuses have been conducted manually by teams of enumerators and statisticians who gather, compile, and analyze data using paper-based forms. However, this classical method is no longer the only way governments collect and analyze information about their populations. Census information is now increasingly being gathered through online questionnaires, toll-free telephone numbers, and pre-paid envelopes.

None of these methods are being used in the 2017 census because the Pakistan Bureau of Statistics (PBS), the federal authority responsible for the task, feels that there is no guarantee that these [questionnaires] will be filled up and returned. “Literacy matters,” says a PBS official.

Though the PBS is collecting census data manually, it is using an Optical Character Recognition (OCR) technology to convert this data into machine-readable format and transfer it onto computers. The OCR system provides full alphanumeric recognition of printed or handwritten characters at electronic speed. The version available with the Bureau has been updated with an Intelligent Character Recognition (ICR) feature allowing recognition of  image data, in particular alphanumeric text. It turns images of handwritten or printed characters into ASCII data (machine-readable format). Additionally, the OCR technology being used by  the Bureau has also been updated for input of data in Urdu language.

Read more: Phase One of Census Enters Second Stage

The OCR technology is not just effective in converting handwritten or typed characters into machine-readable format for tabulation or compilation purposes but also helps cut cost.  The United Nations Statistics Division calculates that use of OCR imaging saves up to two percent of the total cost of the census and requires less staff for data analysis. However, the OCR is not as accurate as the Optical Mark Recognition (OMR) technology used for data collection in the 1998 census. Data-entry operators at the Bureau are, thus, required to check all  forms manually before converting them into machine-readable format. The operators work in batches of 120. Additionally, the OCR machines also have a built-in automatic error-detection system.

Unlike the OCR, the OMR technology used in 1998 could not recognize hand-printed or machine-printed characters. It featured automated data input using customized paper-based forms. A common example of OMR usage is in examinations for answering questions with multiple answer choices. Those taking the exam are required to mark their answers on specially printed sheets using either a pencil or a special marker. The data from the sheets is read using the OMR scanner.

Another suggestion floated during the planning phase was to use a tablet-based application for data collection and  tabulation, says an official privy to the planning process. The official says the proponents had argued that the tablet could not only easily count citizens bearing Computerised National Identity Cards (CNICs) but also collect data of those not yet registered by the National Database Registration Authority (NADRA). “Enumerators could have been linked to the NADRA system. The Punjab Information Technology Board (PITB) was willing to provide the technological expertise in this regard,” he says.

However, the suggestion was dropped as no consensus could be reached on it. It was argued that the procurement of these tablets would be expensive and time-consuming. There were also concerns about transparency and credibility of the software used with tablets. “There was not enough time to procure these devices and programme them to suit the needs of the census,” says another official familiar with the matter.

PBS officials overseeing the census say that enumerators are collecting data on two forms. Form 1 is being used to count houses and Form 2 to count households. The bureau expects to complete the count and release a provisional analysis of the data in two months. This information will provide a clear picture of the country’s demographics and will end reliance on projections and estimates only for range of activities including delimitations of constituencies and distribution of seats in the parliament, development funds and tax revenues as well as lead to more informed policies.


Related posts