Build With Me

English-to-SQL: Turning Questions into Data

“Show me utilisation for Client X this quarter.” A human analyst knows what to do. An AI system must translate that into safe, accurate SQL. This is how you design it properly — without creating a security risk.

English-to-SQL is not about generating queries. It’s about generating queries safely.

1. The Problem

LLMs can generate SQL easily. But raw SQL generation in production is dangerous:

  • Hallucinated table names
  • Missing joins
  • Incorrect aggregations
  • Accidental data modification
  • Security violations

So the question is not “Can the model write SQL?” The question is “Can we trust it?”

2. The Safe Architecture Pattern

The flow should look like this:

  1. User asks a question in plain English.
  2. Backend identifies allowed tables for that user.
  3. Prompt includes only approved schema metadata.
  4. LLM generates SQL.
  5. Validation layer checks safety.
  6. Query executes in read-only mode.
  7. Results returned as structured output or chart.
Never give the model access to your full database schema blindly.

3. Schema Scoping (Critical)

Instead of exposing every table, pass only:

  • Relevant tables
  • Column names
  • Data types
  • Relationships

This reduces hallucination and enforces boundaries.

4. The Prompt Pattern

Role: SQL generator for reporting.
Context: You have these tables and columns: [list].
Task: Generate a query answering the user’s question.
Constraints: Only SELECT statements. No INSERT/UPDATE/DELETE. Limit to 200 rows.
Output: SQL query + explanation of joins + parameters.

5. Validation Layer (Non-Negotiable)

Before execution, validate:

  • Query begins with SELECT
  • No forbidden keywords
  • Tables are allowed
  • No dynamic SQL injection patterns

You can:

  • Parse the SQL
  • Whitelist allowed patterns
  • Enforce parameterisation
The LLM suggests. Your system decides.

6. Returning Results Properly

Don’t return raw SQL tables blindly. Convert results into:

  • Structured JSON
  • Summary paragraph
  • Bar chart / line chart
  • KPI summary

This improves usability and reduces misinterpretation.

7. Audit & Logging

Log:

  • User ID
  • Original English question
  • Generated SQL
  • Execution timestamp
  • Row count returned

This is essential for:

  • Compliance
  • Performance tuning
  • Debugging model errors

8. Common Mistakes

  • Letting the model “explore” the database
  • Skipping validation
  • Not limiting row counts
  • Ignoring cost implications of repeated queries
English-to-SQL works when you combine AI flexibility with engineering discipline.

Continue the Masterclass

Next: Automating AI Workflows with n8n.

Next Article Back to Writing