Building a Production-Grade Text-to-SQL System
How to build a reliable natural language to SQL engine: schema awareness, query validation, error recovery, and safety guardrails for enterprise databases.
Introduction
Text-to-SQL is one of the most practically valuable LLM applications in enterprise settings. Finance analysts, operations teams, and business users could query databases directly without SQL expertise โ if it actually worked reliably.
The gap between "demo works" and "deployed to 5,000 employees" is enormous. This post covers the architecture for a system that actually earns trust.
Why Naive Text-to-SQL Fails
The common pattern:
# The demo approach โ breaks in production
prompt = f"Convert to SQL: {user_query}"
sql = llm.generate(prompt)
results = db.execute(sql) # ๐ฅFailure modes:
- Hallucinated table/column names โ model invents schema that doesn't exist
- Unsafe queries โ
DELETE,DROP,UPDATEfrom a "SELECT" intent - Dialect mismatch โ PostgreSQL vs. Snowflake vs. BigQuery syntax differences
- Ambiguous queries โ "show me recent sales" โ recent means what exactly?
Architecture: The Reliable Text-to-SQL Pipeline
User Query
โ
โผ
โโโโโโโโโโโโโโโ
โ Schema โ โ Dynamic schema injection from metadata store
โ Retrieval โ
โโโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโ
โ SQL โ โ LLM generates SQL with COT reasoning
โ Generation โ
โโโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโ
โ Validation โ โ Parse + validate before execution
โโโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโ
โ Safe โ โ Allowlist only SELECT, enforce row limits
โ Execution โ
โโโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโ
โ Result โ โ Summarize results in natural language
โ Explanation โ
โโโโโโโโโโโโโโโ
Step 1: Dynamic Schema Injection
Don't dump the entire schema into the prompt โ it's too large and distracts the model. Use semantic search to retrieve only the relevant tables:
async def get_relevant_schema(query: str, top_k: int = 5) -> str:
"""Retrieve only schema elements relevant to the query."""
query_embedding = await embed(query)
# Schema metadata pre-embedded at startup
relevant_tables = await schema_vector_store.search(
query_embedding, top_k=top_k
)
schema_prompt = ""
for table in relevant_tables:
schema_prompt += f"""
Table: {table.name}
Description: {table.description}
Columns:
{chr(10).join(f" - {col.name} ({col.type}): {col.description}" for col in table.columns)}
Sample values: {table.sample_values}
"""
return schema_promptStep 2: SQL Generation with Validation
SQL_GENERATION_PROMPT = """
You are an expert SQL analyst. Generate a SQL query for the following request.
## Available Schema
{schema}
## User Request
{user_query}
## Rules
- Use ONLY tables and columns listed in the schema above
- Always include a LIMIT clause (max 1000 rows)
- Use only SELECT statements โ never INSERT, UPDATE, DELETE, DROP, ALTER
- Add meaningful column aliases for readability
- Use CTEs for complex queries
## Response Format
<reasoning>
[Explain your query logic]
</reasoning>
<sql>
[Your SQL query here]
</sql>
"""
def validate_sql(sql: str) -> tuple[bool, str]:
"""Parse and validate SQL before execution."""
import sqlglot
try:
parsed = sqlglot.parse_one(sql)
except Exception as e:
return False, f"SQL parse error: {e}"
# Safety checks
if parsed.find(sqlglot.exp.Insert, sqlglot.exp.Update, sqlglot.exp.Delete, sqlglot.exp.Drop):
return False, "Only SELECT queries are allowed"
if not parsed.find(sqlglot.exp.Limit):
return False, "Query must include a LIMIT clause"
return True, "Valid"Step 3: Error Recovery
When validation fails, feed the error back to the LLM for self-correction:
async def generate_with_retry(query: str, schema: str, max_retries: int = 3) -> str:
error_context = ""
for attempt in range(max_retries):
prompt = SQL_GENERATION_PROMPT.format(
schema=schema,
user_query=query,
) + error_context
response = await llm.generate(prompt)
sql = extract_sql(response)
is_valid, error = validate_sql(sql)
if is_valid:
return sql
# Feed error back for self-correction
error_context = f"\n\nPrevious attempt failed: {error}\nPlease fix and try again."
raise ValueError(f"Failed to generate valid SQL after {max_retries} attempts")Key Takeaways
- Schema retrieval > schema dumping โ retrieve relevant tables semantically
- Always validate before execution โ
sqlglotis excellent for this - Restrict to SELECT only โ enforce via parse tree, not regex
- Self-correction loops dramatically improve reliability
References
- Gao et al., "Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation" (2023)
- Yu et al., "Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain SQL" (2018)
Written by
Rohit Raj
Senior AI Engineer @ American Express