How is data collected and tabulated for Pakistan's 2017 census?
A population and household census is underway in the country after a delay of nine years. In the two decades since the last census of 1998, there has been a great transformation in ways in which census data is collected and analysed across the globe.
Historically, censuses have been conducted manually by teams of enumerators and statisticians who gather, compile, and analyse data using paper-based forms. However, this classical method is no longer the only way governments collect and analyse information about their populations.
Census information is now increasingly being gathered through online questionnaires, toll-free telephone numbers, and pre-paid envelopes.
None of these methods are being used in the 2017 census because the Pakistan Bureau of Statistics (PBS), the federal authority responsible for the task, feels that there is no guarantee that these questionnaires will be filled up and returned, “Literacy matters,” says a PBS official.
Though the PBS is collecting census data manually, it is using Optical Character Recognition (OCR) technology to convert this data into machine-readable format and transfer it onto computers.
The OCR system provides full alphanumeric recognition of printed or handwritten characters at electronic speed.
The version available with the Bureau has been updated with an Intelligent Character Recognition (ICR) feature allowing recognition of image data, in particular alphanumeric text. It turns images of handwritten or printed characters into ASCII data (machine-readable format).
Additionally, the OCR technology being used by the Bureau has also been updated for input of data in Urdu language.
Read more: Phase One of Census Enters Second Stage
The OCR technology is not just effective in converting handwritten or typed characters into machine-readable format for tabulation or compilation purposes but also helps cut costs.
The United Nations Statistics Division calculates that use of OCR imaging saves up to two percent of the total cost of the census and requires less staff for data analysis.
However, the OCR is not as accurate as the Optical Mark Recognition (OMR) technology used for data collection in the 1998 census. Data-entry operators at the Bureau are, thus, required to check all forms manually before converting them into machine-readable format. The operators work in batches of 120.
Additionally, the OCR machines also have a built-in automatic error-detection system.
Unlike the OCR, the OMR technology used in 1998 could not recognise hand-printed or machine-printed characters. It featured automated data input using customised paper-based forms.
A common example of OMR usage is in examinations for answering questions with multiple answer choices. Those taking the exam are required to mark their answers on specially printed sheets using either a pencil or a special marker. The data from the sheets is read using the OMR scanner.
Another suggestion floated during the planning phase was to use a tablet-based application for data collection and tabulation, says an official privy to the planning process.
The official says the proponents had argued that the tablet could not only easily count citizens bearing Computerised National Identity Cards (CNICs) but also collect data of those not yet registered by the National Database Registration Authority (NADRA).
“Enumerators could have been linked to the NADRA system. The Punjab Information Technology Board (PITB) was willing to provide the technological expertise in this regard,” he says.
However, the suggestion was dropped as no consensus could be reached on it. It was argued that the procurement of these tablets would be expensive and time-consuming.
There were also concerns about transparency and credibility of the software used with tablets. “There was not enough time to procure these devices and programme them to suit the needs of the census,” says another official familiar with the matter.
PBS officials overseeing the census say that enumerators are collecting data on two forms.
Form 1 is being used to count houses and Form 2 to count households. The bureau expects to complete the count and release a provisional analysis of the data in two months.
This information will provide a clear picture of the country’s demographics and will end reliance on projections and estimates for a range of activities including delimitation of constituencies and distribution of seats in the parliament, development funds and tax revenues as well as lead to more informed policies.
This article originally appeared in MIT Tech Review Pakistan and has been reproduced with permission.