English-to-SQL: Turning Questions into Data

“Show me utilisation for Client X this quarter.” A human analyst knows what to do. An AI system must translate that into safe, accurate SQL. This is how you design it properly — without creating a security risk.

1. The Problem

LLMs can generate SQL easily. But raw SQL generation in production is dangerous:

Hallucinated table names
Missing joins
Incorrect aggregations
Accidental data modification
Security violations

So the question is not “Can the model write SQL?” The question is “Can we trust it?”

2. The Safe Architecture Pattern

The flow should look like this:

User asks a question in plain English.
Backend identifies allowed tables for that user.
Prompt includes only approved schema metadata.
LLM generates SQL.
Validation layer checks safety.
Query executes in read-only mode.
Results returned as structured output or chart.

Never give the model access to your full database schema blindly.

3. Schema Scoping (Critical)

Instead of exposing every table, pass only:

Relevant tables
Column names
Data types
Relationships

This reduces hallucination and enforces boundaries.

4. The Prompt Pattern

Role: SQL generator for reporting.
Context: You have these tables and columns: [list].
Task: Generate a query answering the user’s question.
Constraints: Only SELECT statements. No INSERT/UPDATE/DELETE. Limit to 200 rows.
Output: SQL query + explanation of joins + parameters.

5. Validation Layer (Non-Negotiable)

Before execution, validate:

Query begins with SELECT
No forbidden keywords
Tables are allowed
No dynamic SQL injection patterns

You can:

Parse the SQL
Whitelist allowed patterns
Enforce parameterisation

The LLM suggests. Your system decides.

6. Returning Results Properly

Don’t return raw SQL tables blindly. Convert results into:

Structured JSON
Summary paragraph
Bar chart / line chart
KPI summary

This improves usability and reduces misinterpretation.

7. Audit & Logging

Log:

User ID
Original English question
Generated SQL
Execution timestamp
Row count returned

This is essential for:

Compliance
Performance tuning
Debugging model errors

8. Common Mistakes

Letting the model “explore” the database
Skipping validation
Not limiting row counts
Ignoring cost implications of repeated queries

English-to-SQL works when you combine AI flexibility with engineering discipline.