How Pakistan’s spelling mistakes could lead to fraud
A rose by any other name would smell as sweet. But, will it smell as sweet with another spelling?
Spelling matters in writing, and even more so in record-keeping. But not in Pakistan, where misspelling nouns is a national sport.
Elsewhere, variant spellings of the same word are common. However, variant spellings of the name of the same person or town are uncommon. In Pakistan, however, it’s a different “storey.”
I came face-to-face with our dyslexia when I tried to obtain my father's death certificate. His first name was Ajaz. I spelled it for the benefit of the administration assistants working at the hospitals and various municipal offices.
They instead took it merely as a suggestion and issued documents with their preferred spellings of my deceased father’s name. These included Ijaz, Ejaz, and Aijaz, to name a few.
It may sound unimportant, but spelling mistakes can impose significant economic costs in the world that is increasingly relying on analytics.
Take retail-banking fraud as an example. An individual can have fraudulent documents issued with variant spellings, e.g., Umer and Umar. When new documentation, such as a bank account, is created using the existing documents in Urdu, applicants can use any spelling of their liking in English.
This would pose serious hardships in fraud detection, when the same person walks around with multiple identification documents made possible by variant spellings.
I decided to test my hypothesis about the spelling challenges of Pakistanis. It so happened that I got my hands on a data set that was available online for a brief time. The data set included the names and other details of 15,176 members of the armed forces who had died in the line of duty. Also included was the city district of the deceased’s origin.
A quick analysis of the data set revealed that spelling Campbellpur has been a real change for Pakistanis. No wonder they changed its name to Attock because the nation was stuck getting the spellings right for the city named after Sir Colin Campbell.
I used Open Refine software to deal with misspelled cities. The raw data set listed 450 cities as the city of origin. After running several clustering algorithms to identify and correct misspelled names, I was able to reduce the number of cities to 204. So imagine, almost every town in the database, relatively speaking, had a misspelled variant.
It came as no surprise that the most frequently listed city of origin, 1,393 to be precise, of the deceased soldiers was Rawalpindi.
The garrison city is at the centre of the arid districts that have historically been for the lack of agriculture the primary catchment for military’s recruitment. Following Rawalpindi was Poonch (Punch), a small town in Azad Jammu and Kashmir. What is interesting about this data is the disproportionately large number of deceased soldiers, relative to the town’s population, belonging to AJK (Azad Kashmir). This, however, is a subject for another blog.