Prompt Engineering & Optimization

Prompt Engineering & LLM Optimization

Prompt Engineering & LLM Optimization - Image 1

About This Service

Prompt Engineering and LLM Optimization for UAE Products

If your product already has an AI feature that is inconsistent, expensive or slow, this is the fix. I audit your existing prompts, redesign system prompts with clear roles, constraints and few-shot examples, and enforce structured outputs through JSON mode and tool schemas so downstream code stops breaking on malformed responses.

Changes are proven, not eyeballed. I build a lightweight eval harness from your real inputs and score every prompt revision against it, so you see before-and-after numbers for accuracy, format compliance and refusal rate. The same harness drives model selection — GPT versus Claude versus open-weight models — and cost and latency optimization, because the right prompt on a smaller model often beats a lazy prompt on the flagship at a fraction of the dirham cost.

For Dubai, Sharjah and Abu Dhabi products serving bilingual users I also handle Arabic prompt nuances — dialect versus Modern Standard Arabic, right-to-left formatting in outputs, and instructions that survive translation. A focused prompt audit with rewritten prompts and eval results is delivered in 3 working days from AED 800.

What's included

  • Prompt audit report — Line-by-line review of your current system prompts with every weakness named and a fix proposed.
  • Rewritten system prompts — Production-ready prompts with role, constraints, examples and edge-case handling, delivered as files for your repo.
  • Eval harness — A repeatable test set built from your real inputs that scores any future prompt or model change in minutes.
  • Model selection memo — Benchmarked recommendation across GPT, Claude and open-weight options with cost and latency per call.
  • Structured output enforcement — JSON mode and tool-schema definitions so responses parse reliably every time.
  • Arabic prompt review — Bilingual prompt variants checked for MSA versus dialect handling and right-to-left output formatting.

How it works

  1. 1
    Send your prompts and samples

    You share current prompts plus 30 to 50 real input examples, including the failures that annoy you most.

  2. 2
    Audit and baseline

    I score the existing setup on an eval harness built from your samples, so improvements are measured against a real baseline.

  3. 3
    Rewrite and re-score

    Prompts are redesigned and every revision is run through the harness until the metrics clearly beat the baseline.

  4. 4
    Deliver and walk through

    You receive the new prompts, the harness, the model memo and a call explaining exactly what changed and why.

Why work with me

With meTypical agency
Before-and-after eval scores in the deliverableSubjective "it looks better"
Model-agnostic: GPT, Claude, open-weightsSingle preferred vendor
Turnaround for a full audit3 working days2–3 weeks
Eval harness left with your team