In the linguistic melting pot of today’s world, a Princeton graduate decided to bring his native language, Urdu, to life in the digital world. Zeerak Ahmed is working on a modified Urdu keyboard called Matnsaz. This keyboard simplifies typing in Urdu, especially on smartphones.
Describing his motivation to build an Urdu keyboard, Ahmed recounted that his grandmother lost her hearing a few years ago. “In any other country or if you’re an English speaker, you would be able to watch television with subtitles, but you can’t do that in Urdu,” he said while talking to MIT Technology Review Pakistan.
Ahmed believes that part of the reason is the lack of infrastructure to quickly annotate things or quickly produce text. “I am not saying that we would solve that problem directly, but if we incrementally make every piece that leads to that happening easier, then we are more likely going to be in a situation where people who are unable to speak English can still live the same quality of life as other people.”
Ahmed worked on the keyboard for his Masters in Design Engineering degree at Harvard and then continued after he graduated in 2018. Another anecdote Ahmed associated with his inspiration to build the technology was that some of his colleagues, who helped him develop the project proposal, spoke Spanish natively.
“They asked why it was that I use my computer in English but they used their computer in Spanish. They found it more problematic than we tend to find it.” It made him consider doing something to address this issue.
While there has been some development of Urdu software and keyboards in the past, Ahmed thinks that the language still lags behind and the availability and use of Urdu technology can be further improved. “A lot of what we see as the struggles of Urdu language can be linked back in some way to the struggles of Urdu technology,” he said. Ahmed has been working on this keyboard for more than three years as a way of accelerating Urdu technology and fixing problems that may occur in previously existing ones.
The number of tools that need to be developed in order to build an Urdu keyboard can be used to create other Urdu technologies as well. “When we build a text corpus for example, which we need in the keyboard to build an autocorrect system, that is useful for any kind of NLP (natural language processing) applications such as word processors, in some sense voice recognition, search and so forth,” Ahmed explained.
A text corpus is a library of written Urdu text made available for anyone who wants to perform natural language processing. Ahmed clarified that “a text corpus is required to be able to train a natural language model, by teaching the computer from examples of currently written language.”
He pointed out that a text corpus is relatively more important in a smartphone keyboard than any other keyboard. “In general, the keys are too small for you to be able to hit reliably. The way that we get around it, is that the keyboard tries to guess what word you’re typing based on how often the word occurs, what characters repeat after other characters, what words repeat after other words, and to learn all of that, we give it a text corpus.”
Text corpora are available in many languages including Urdu and other regional languages of Pakistan but the Matnsaz team is building their own, carefully making choices regarding the quality, editorial soundness and diversity of the text. “We are slowly growing that corpus using help from researchers at LUMS, UC Berkley and some other institutions.”
Ahmed described the process of creating a text corpus for the keyboard. “In general, my limitations are smaller than the limitations of the people that we’ve been able to get in touch with who have been able to give us the files in the format that we need.”
Currently, Matnsaz has two large donations of text from the Al-Mawrid Institute and LUMS Gurmani Center, which is in the process of being converted into the required format. They are also building the tools to convert the text.
How it’s different
While talking about how Matnsaz is different from pre-existing keyboards, Ahmed said, “What we are designing is not just one keyboard. We are designing a set of different keyboards with experimentation built-in. The idea is, that when we launch the app, you can try a number of different keyboards on the app.”
After the app launch, users will test different versions of the keyboard and data will be collected regarding the performance and experience of these keyboards. This will help the developers figure out what works best.
According to Ahmed, there has been very little testing that has been done on keyboards with Arabic script and most of it has been simulations, not actual people using the keyboard. There are many things that Ahmed’s team is experimenting with, including layout, letter formation and the number of keys.
Most common Urdu keyboards have a layout that phonetically maps on to the qwerty layout of the Latin script keyboards. This means that the Urdu letter ق (pronounced qaaf) comes in place of letter Q and the letter پ (pronounced pay) comes in place of the letter P. Ahmed noted, “That is problematic because it implies that you must know Latin text for you to be able to input Urdu text. So for a lot of people, that don’t speak English or have not typed in Latin script, that’s a problematic assumption to have, both culturally and technically.”
He hopes that undoing this assumption and trying a different layout, such as an alphabetically ordered keyboard, might solve the problem and make it easier for people to use. His team plans to test a number of variable layouts of the keyboard which is why they have built the keyboard in a way that the layouts can be changed pretty fast.
Another factor that can be tested is the formation of letters. “In the Arabic script, the letters change shape depending on where they are in a word. But the letters on the (Urdu) keyboard only display the isolated form of the character. If you type in Latin script, you’ll notice that the letters actually change to show you whether they are going to show up in an upper case or lower case,” Ahmed explained.
The Matnsaz keyboard will have a layout in which users can see the shape that a character will take in a word rather than its isolated form. Such a keyboard doesn’t exist yet but he hopes that it will help users to type Urdu more easily.
Ahmed’s team is also finding a way to compress the number of keys on the keyboard into a smaller number. “There are some letters which have the same basic shape, like ب, پ and ت or ص and ض. So we’ll make it into one word. And the software will put the ijam (marks that appear as dots on Urdu alphabets) on them,” he said. “It reduces the number of keys on the keyboard and makes the remaining keys larger, which means that you’re more likely to hit them and the chance that you would type an incorrect word goes down.”
Another benefit of this layout is that the same layout can include keyboards of other languages that use the Arabic script as well, without the need to go into special keys or add extra keys. “We can make this a true multilingual keyboard for the Arabic script.”
Ahmed indicated that the Urdu language has 39 alphabets and the Sindhi language has 52 alphabets depending on how you count them. Other Arabic script languages have different numbers of alphabets as well. So right now, if someone wants to write both Sindhi and Urdu, they try to use the keyboard which has the most keys, or they try to use the Urdu keyboard to write Sindhi.
“The idea is that once we make this keyboard for Urdu, then we can add to it and build in support for other languages without anyone having to learn a new keyboard.”
Arabic script vs Latin script
Highlighting the importance of having an Urdu keyboard, Ahmed said that the reason most people type Urdu in Latin or Roman script is that it’s easier. He thinks that this practice is not going to go away and it has its usefulness, but we need to have some ability to type in the Arabic script well.
When asked whether people will switch to the Arabic script keyboard for Urdu typing, he estimated that some people might make the switch sometimes, depending on the kind and purpose of the text. “I don’t see Urdu newspapers using ‘Roman Urdu’ any time soon, and those publishers and writers use keyboards as well so I think there’s a healthy audience as well for the Arabic script,” he argued.
Testing and availability
Ahmed revealed that while it might be a few months until the Matnsaz Urdu keyboard is released on the app store, they will be doing a public beta test soon. A number of people who have already signed up for the beta release will test the keyboard’s early performance.
Ahmed hopes that a lot of this technology will lead to lots of other Urdu tools as well. “I am hoping people use it and build other things. There’s too big an audience for us to rely on everybody to learn to speak English to be able to express themselves in the digital world,” he said.