Build With Me
English-to-SQL: Turning Questions into Data
“Show me utilisation for Client X this quarter.” A human analyst knows what to do. An AI system must translate that into safe, accurate SQL. This is how you design it properly — without creating a security risk.
English-to-SQL is not about generating queries. It’s about generating queries safely.
1. The Problem
LLMs can generate SQL easily. But raw SQL generation in production is dangerous:
- Hallucinated table names
- Missing joins
- Incorrect aggregations
- Accidental data modification
- Security violations
So the question is not “Can the model write SQL?” The question is “Can we trust it?”
2. The Safe Architecture Pattern
The flow should look like this:
- User asks a question in plain English.
- Backend identifies allowed tables for that user.
- Prompt includes only approved schema metadata.
- LLM generates SQL.
- Validation layer checks safety.
- Query executes in read-only mode.
- Results returned as structured output or chart.
3. Schema Scoping (Critical)
Instead of exposing every table, pass only:
- Relevant tables
- Column names
- Data types
- Relationships
This reduces hallucination and enforces boundaries.
4. The Prompt Pattern
Role: SQL generator for reporting.
Context: You have these tables and columns: [list].
Task: Generate a query answering the user’s question.
Constraints: Only SELECT statements. No INSERT/UPDATE/DELETE. Limit to 200 rows.
Output: SQL query + explanation of joins + parameters.
5. Validation Layer (Non-Negotiable)
Before execution, validate:
- Query begins with SELECT
- No forbidden keywords
- Tables are allowed
- No dynamic SQL injection patterns
You can:
- Parse the SQL
- Whitelist allowed patterns
- Enforce parameterisation
6. Returning Results Properly
Don’t return raw SQL tables blindly. Convert results into:
- Structured JSON
- Summary paragraph
- Bar chart / line chart
- KPI summary
This improves usability and reduces misinterpretation.
7. Audit & Logging
Log:
- User ID
- Original English question
- Generated SQL
- Execution timestamp
- Row count returned
This is essential for:
- Compliance
- Performance tuning
- Debugging model errors
8. Common Mistakes
- Letting the model “explore” the database
- Skipping validation
- Not limiting row counts
- Ignoring cost implications of repeated queries
Continue the Masterclass
Next: Automating AI Workflows with n8n.