Pakistan open source AI regulation

A Solo Developer Shipped Pakistan's First Pashto LLM. The State's AI Regulator Still Doesn't Exist.

Qehwa and Qalb show Pakistan's low-resource-language AI is emerging from citizens, while the National AI Policy 2025's regulator stays on paper.

Built by citizens, not the state People of Internet Research · Pakistan 60M+ Pashto speakers targeted Qehwa serves a language largely ab… 85.3% Qehwa benchmark accuracy Across 150 tests in 15 categories,… 1.97B Qalb Urdu training tokens The student-built Urdu model that … Jul 2025 National AI Policy approved Its regulator and national LLM rem… peopleofinternet.com

Key Takeaways

On April 4, 2026, a Pakistani developer named Junaid Ahmed released Qehwa, billed as the world's first large language model for Pashto, a language spoken by roughly 60 million people but almost invisible in mainstream AI systems. He built it alone — no team, no funding, no institutional backing — fine-tuning the openly licensed Qwen2.5-7B model and publishing the weights under an Apache 2.0 licence on Hugging Face. Qehwa was, in turn, inspired by Qalb, a 1.97-billion-token Urdu model that three students led by Taimoor Hassan documented in an arXiv paper this January.

Both projects are real, working artifacts. And both raise the same uncomfortable question for Pakistani policymakers: the country's most consequential indigenous-language AI is arriving from citizens building in the open, while the state apparatus meant to nurture exactly this — promised in the National AI Policy 2025 — remains largely on paper.

What the citizens actually built

Qehwa is not a demo. According to its model card, it underwent continued pre-training on 3.4 million Pashto documents — news, books, religious texts, Wikipedia, web crawl — followed by supervised fine-tuning on 126,519 Peshawari-dialect instruction pairs. On a custom 150-question benchmark across 15 categories it reports 85.3% overall accuracy, including 90% on English-to-Pashto translation. It accepts prompts in Pashto, English, and Urdu, and the weights are downloadable and modifiable by anyone.

Qalb, its predecessor, took the same route for Urdu's 230-million-plus speakers: a two-stage continued-pre-training-plus-fine-tuning pipeline on 1.97 billion tokens, built by a student and his university teammates and released into the research commons. The pattern is identical — open base models, open methods, open weights, and a single motivated individual closing a gap that the global frontier labs have no commercial reason to close.

What the state promised

To be fair to Islamabad, the strategy on paper is sound. Pakistan's Federal Cabinet approved the National AI Policy 2025 on July 30–31, 2025. It is structured around six pillars, backed by a National AI Fund channelled through Ignite, and — crucially — it explicitly commits to "support for a national LLM to ensure cultural and linguistic relevance." It also proposes an AI Regulatory Directorate tasked, in the words of one analysis by Pakistan's Digital Rights Monitor, with "ensuring ethical AI practices, data protection, and algorithmic transparency."

There is a genuine case for all of this. A national strategy can pool the compute, datasets, and evaluation infrastructure that no solo developer can afford. A regulator can set guardrails before harms — surveillance misuse, discriminatory automated decisions, untraceable deepfakes — become entrenched. Amnesty International has already warned that the policy's human-rights and oversight safeguards are too thin, and that critique deserves to be taken seriously rather than dismissed.

The gap between the plan and the shipping code

But nearly a year on, the AI Regulatory Directorate is not operational, the National AI Fund has yet to disburse at scale, and the "national LLM" the policy promised does not exist. The national-language models Pakistan now actually has — Qehwa and Qalb — were delivered by a developer in Peshawar and a graduate student, on consumer-grade resolve rather than state budgets.

That is not an indictment of having a policy. It is evidence about what kind of policy works. The thing that made Qehwa and Qalb possible was not a directorate or a fund. It was the permissionless availability of strong open-weight base models like Qwen, an open licence regime that let one person legally build and redistribute a derivative, and a research culture where publishing a paper and uploading weights is the default. No license application, no pre-clearance, no sandbox enrolment stood between the idea and 60 million potential users.

The risk a heavy regulator would pose

Here is where proportionality matters. If the still-unbuilt AI Regulatory Directorate eventually arrives as a gatekeeping body — requiring registration, pre-deployment approval, or compliance paperwork to release a model — it would fall hardest on precisely the solo and student builders who have delivered Pakistan's only wins so far. A frontier lab can absorb compliance overhead; an unfunded developer fine-tuning a model on a rented GPU cannot. Regulation calibrated to OpenAI-scale risk, applied to a one-person Pashto project, simply prices the project out of existence.

The legitimate concerns — and open ecosystems do carry real ones, as the recent wave of poisoned open-source packages and software-supply-chain attacks reminds us — are better met with targeted tools than blanket licensing: provenance and signing for model artifacts, security disclosure norms, and liability that tracks deployment context rather than the act of publishing weights. None of that requires a permission regime to release a language model.

What proportionate policy looks like here

Pakistan's comparative advantage is not regulatory architecture; it is a diaspora of capable engineers and a deep stock of low-resource languages the world's labs ignore. The policy's national-LLM ambition would be better spent funding shared compute, curating open Pashto, Urdu, Sindhi, and Balochi corpora, and adopting community models like Qehwa for public services — rather than standing up a directorate that, if mis-designed, regulates a market the state has not yet helped create.

The lesson of April 4 is simple. When the barrier to entry is low, citizens close the gaps the market and the state both miss. The fastest way for Islamabad to support indigenous-language AI is to keep that barrier low — and to make sure the regulator it is still building does not become the barrier itself.

Sources & Citations

  1. Qehwa Pashto LLM model card (Hugging Face)
  2. Qalb: Urdu LLM for 230M speakers (arXiv)
  3. National Artificial Intelligence Policy (MoITT, Pakistan)
  4. Pakistan's National AI Policy 2025 summary (regulations.ai)
  5. Understanding Pakistan's AI Policy (Digital Rights Monitor)