Can LLMs replace Text-to-SQL?

5 min readOct 1, 2024

Here’s a quick post of my findings when experimenting with Text-to-SQL using Large Language Models (LLMs) and comparing them to traditional Text-to-SQL systems. This practical exploration aimed to evaluate the strengths and weaknesses of LLM-based SQL generation in different scenarios.

Objective:

The experiment aimed to explore how well LLMs (like OpenAI’s GPT models) can generate SQL queries from natural language inputs and compare the results with traditional Text-to-SQL systems that explicitly translate natural language queries into SQL.

Experimental Setup:

Tools Used

OpenAI GPT models for LLM-based SQL generation.
Traditional Text-to-SQL tool (Microsoft SQL Server with Full-Text Search and a rule-based Text-to-SQL generator).
Text2SQL.ai for its simplicity.

Dataset- A simple relational database containing three tables

Users (UserID, UserName, Age, Country)
Orders (OrderID, UserID, ProductID, OrderDate)
Products (ProductID, ProductName, Price)

Queries Tested

Basic queries: “Get all users who are from India.”
Moderate queries: “List the total number of orders made in 2023.”
Complex queries: “Find users who ordered products costing more than $100 in the last three months.”

Findings

1. Basic Query Generation

LLM Approach
LLMs did an excellent job of generating basic SQL queries. When asked for simple queries like “Get all users who are from India,” the response was spot-on.

Generated SQL

SELECT * FROM Users WHERE Country = 'India';

The LLM accurately translated the intent into SQL, and this worked consistently across different queries like selecting users by country, age, or filtering products by price.

Traditional Text-to-SQL
The traditional system also performed well on these basic queries. Since these queries follow common patterns (e.g., SELECT ... FROM ... WHERE), the rule-based system handled them without issues.

Conclusion
For basic queries, LLMs and traditional Text-to-SQL systems both performed accurately. However, LLMs had the added advantage of handling more ambiguous natural language inputs.

2. Moderate Query Generation

LLM Approach
For slightly more complex queries, such as “List the total number of orders made in 2023,” the LLM generated:

Generated SQL

SELECT COUNT(*) FROM Orders WHERE YEAR(OrderDate) = 2023;

The LLM understood the concept of date filtering and aggregation (counting orders), which worked well for these moderately complex scenarios. However, there were cases where the model misinterpreted the exact date format required by SQL.

Traditional Text-to-SQL
The traditional system also performed well here but required explicit natural language patterns (e.g., having the word “count” present). If the query was phrased in a more conversational way, the traditional system struggled, needing more fine-tuning.

Conclusion
LLMs are more adaptable to different phrasings of the query and handle date manipulation intuitively. The traditional Text-to-SQL system worked well but had limitations with less structured inputs.

3. Complex Query Generation

LLM Approach
For more complex queries like “Find users who ordered products costing more than $100 in the last three months,” the LLM generated:

Generated SQL

SELECT Users.UserName  FROM Users  JOIN Orders ON Users.UserID = Orders.UserID  JOIN Products 
ON Orders.ProductID = Products.ProductID  
WHERE Products.Price > 100  
AND Orders.OrderDate >= NOW() - INTERVAL 3 MONTH;

This output was impressive since it handled joins between multiple tables, date manipulation, and filtering by price. However, it wasn’t always perfect. Sometimes, it generated SQL syntax that wasn’t optimized or correct for specific databases (e.g., INTERVAL works for MySQL but not all SQL databases).

Traditional Text-to-SQL
The traditional system struggled more with complex queries, especially those involving multiple joins or date filtering. In some cases, I had to tweak the natural language input to fit pre-existing templates for it to produce the correct SQL.

Conclusion
LLMs were far superior for complex queries due to their understanding of context, relationships, and more complex language structures. Traditional systems would require significant engineering effort or hand-holding to achieve similar results.

4. Handling Schema Ambiguity

LLM Approach
When given queries with vague references (e.g., “Show me orders from last year”), the LLM struggled if the database schema wasn’t explicitly provided. It occasionally produced SQL referencing columns or tables that didn’t exist in the actual schema, requiring manual corrections.
Traditional Text-to-SQL
Traditional systems had a built-in understanding of the schema, meaning they rarely made mistakes about non-existent columns or tables but needed more rigid input.

Conclusion
LLMs require detailed schema information to be accurate, which could be a drawback in environments with frequent schema changes. Traditional systems handle this aspect better since they are tightly integrated with the schema.

5. Error Handling and Optimization:

LLM Approach
LLMs didn’t always generate optimized queries. For example, the generated SQL might work but wasn’t always efficient for large datasets. Sometimes, they didn’t account for specific database features like indexes, leading to suboptimal performance.
Traditional Text-to-SQL
Traditional systems tend to produce more optimized queries because they are built to adhere strictly to the underlying database’s features. They may lack the adaptability of LLMs but are often more efficient in execution.
Conclusion
LLMs can generate SQL that works, but it might not be the most performant option for large-scale or complex databases. Traditional systems have an edge in terms of query optimization.

Summary

LLM Strengths

Excellent for generating SQL queries from flexible, conversational language.
Superior in handling more complex and ambiguous inputs.
Capable of adapting to a variety of phrasings and providing immediate SQL drafts.

LLM Limitations

Not always schema-aware, which can lead to incorrect outputs.
Lacks query optimization, which may result in less efficient SQL.
Struggles with exact database-specific features without explicit training or fine-tuning.

Traditional Text-to-SQL Strengths

Schema-awareness and more reliable query generation based on predefined patterns.
Produces more optimized and efficient queries for large-scale data.

Traditional Text-to-SQL Limitations

Less flexible with conversational or ambiguous natural language inputs.
Struggles with complex, multi-table, or dynamic queries.

Final Verdict

While LLMs provide a much more user-friendly and flexible interface for generating SQL from natural language, they are not yet ready to fully replace traditional Text-to-SQL systems, especially in environments where high accuracy, query optimization, and schema awareness are critical. However, in less formal contexts (e.g., internal tools, analytics reports, or small databases), LLMs offer a highly effective way to quickly generate SQL with minimal effort.

Can LLMs replace Text-to-SQL?

Objective:

Experimental Setup:

Findings

1. Basic Query Generation

2. Moderate Query Generation

3. Complex Query Generation

4. Handling Schema Ambiguity

5. Error Handling and Optimization:

Summary

Final Verdict

Written by Gunjan

No responses yet