About This Service
Prompt Engineering and LLM Optimization for UAE Products
If your product already has an AI feature that is inconsistent, expensive or slow, this is the fix. I audit your existing prompts, redesign system prompts with clear roles, constraints and few-shot examples, and enforce structured outputs through JSON mode and tool schemas so downstream code stops breaking on malformed responses.
Changes are proven, not eyeballed. I build a lightweight eval harness from your real inputs and score every prompt revision against it, so you see before-and-after numbers for accuracy, format compliance and refusal rate. The same harness drives model selection — GPT versus Claude versus open-weight models — and cost and latency optimization, because the right prompt on a smaller model often beats a lazy prompt on the flagship at a fraction of the dirham cost.
For Dubai, Sharjah and Abu Dhabi products serving bilingual users I also handle Arabic prompt nuances — dialect versus Modern Standard Arabic, right-to-left formatting in outputs, and instructions that survive translation. A focused prompt audit with rewritten prompts and eval results is delivered in 3 working days from AED 800.
What's included
- Prompt audit report — Line-by-line review of your current system prompts with every weakness named and a fix proposed.
- Rewritten system prompts — Production-ready prompts with role, constraints, examples and edge-case handling, delivered as files for your repo.
- Eval harness — A repeatable test set built from your real inputs that scores any future prompt or model change in minutes.
- Model selection memo — Benchmarked recommendation across GPT, Claude and open-weight options with cost and latency per call.
- Structured output enforcement — JSON mode and tool-schema definitions so responses parse reliably every time.
- Arabic prompt review — Bilingual prompt variants checked for MSA versus dialect handling and right-to-left output formatting.
How it works
- 1Send your prompts and samples
You share current prompts plus 30 to 50 real input examples, including the failures that annoy you most.
- 2Audit and baseline
I score the existing setup on an eval harness built from your samples, so improvements are measured against a real baseline.
- 3Rewrite and re-score
Prompts are redesigned and every revision is run through the harness until the metrics clearly beat the baseline.
- 4Deliver and walk through
You receive the new prompts, the harness, the model memo and a call explaining exactly what changed and why.
Why work with me
| With me | Typical agency | |
|---|---|---|
| Before-and-after eval scores in the deliverable | Subjective "it looks better" | |
| Model-agnostic: GPT, Claude, open-weights | Single preferred vendor | |
| Turnaround for a full audit | 3 working days | 2–3 weeks |
| Eval harness left with your team |