Project Case Study

English to SQL Exploration

An exploration into turning natural-language questions into reliable SQL—focused on safety, explainability, and analytics outputs that non-technical users can trust.

Context

Business users often know what they want to measure, but not how to query the data. Analysts become a bottleneck for simple questions (“show me utilisation by client”, “compare hours this month vs last month”). The goal was to reduce friction without compromising data integrity or security.

Problem

  • High dependency on analysts: simple questions still required SQL skills and domain knowledge.
  • Ambiguity in language: terms like “utilisation”, “active matters”, and “this quarter” need consistent definitions.
  • Safety risk: free-form query generation can produce expensive, incorrect, or over-broad SQL.
  • Trust gap: users need to understand what was queried and why the results are correct.

Approach

  • Schema grounding: constrained the model to approved tables, columns, and business definitions.
  • Intent → query plan: generated a structured plan first (metrics, filters, time window, grouping) before SQL.
  • Guardrails: enforced read-only queries, row limits, safe joins, and blocked sensitive fields by policy.
  • Validation loop: SQL linting + explain step (“what this query does”) and retry on errors.
  • UX output: returned results as tables + simple charts, with plain-English explanation alongside.

Key Decisions

  • Explainability is mandatory: every SQL response includes a “what it did” summary.
  • Definitions matter: business terms are backed by a glossary and reused consistently.
  • Safety first: default to smaller scoped queries and require refinement for broader pulls.
  • Human override: analysts can promote approved queries into “trusted templates”.

Learnings

English-to-SQL works best when it behaves like a guided analyst—not a magic box. The more you formalise definitions, enforce guardrails, and show the logic, the more usable (and safer) the system becomes.

Next Enhancements

  • Clarifying questions when intent is ambiguous (“Which date range?”, “Which client group?”).
  • Cost-aware planning (estimate query weight and refuse expensive patterns by default).
  • Semantic layer for business metrics so “utilisation” is always computed the same way.

← Back to Projects