Here’s a quick post of my findings when experimenting with Text-to-SQL using Large Language Models (LLMs) and comparing them to traditional Text-to-SQL systems. This practical exploration aimed to evaluate the strengths and weaknesses of LLM-based SQL generation in different scenarios.
Objective:
The experiment aimed to explore how well LLMs (like OpenAI’s GPT models) can generate SQL queries from natural language inputs and compare the results with traditional Text-to-SQL systems that explicitly translate natural language queries into SQL.
Experimental Setup:
Tools Used
- OpenAI GPT models for LLM-based SQL generation.
- Traditional Text-to-SQL tool (Microsoft SQL Server with Full-Text Search and a rule-based Text-to-SQL generator).
- Text2SQL.ai for its simplicity.
Dataset- A simple relational database containing three tables
- Users (UserID, UserName, Age, Country)
- Orders (OrderID, UserID, ProductID, OrderDate)
- Products (ProductID, ProductName, Price)
- Queries Tested
- Basic queries: “Get all users who are from India.”
- Moderate queries: “List the total number of orders made in 2023.”
- Complex queries: “Find users who ordered products costing more than $100 in the last three months.”
Findings
1. Basic Query Generation
LLM Approach
LLMs did an excellent job of generating basic SQL queries. When asked for simple queries like “Get all users who are from India,” the response was spot-on.
Generated SQL
SELECT * FROM Users WHERE Country = 'India';
The LLM accurately translated the intent into SQL, and this worked consistently across different queries like selecting users by country, age, or filtering products by price.
Traditional Text-to-SQL
The traditional system also performed well on these basic queries. Since these queries follow common patterns (e.g., SELECT ... FROM ... WHERE
), the rule-based system handled them without issues.
Conclusion
For basic queries, LLMs and traditional Text-to-SQL systems both performed accurately. However, LLMs had the added advantage of handling more ambiguous natural language inputs.
2. Moderate Query Generation
LLM Approach
For slightly more complex queries, such as “List the total number of orders made in 2023,” the LLM generated:
Generated SQL
SELECT COUNT(*) FROM Orders WHERE YEAR(OrderDate) = 2023;
The LLM understood the concept of date filtering and aggregation (counting orders), which worked well for these moderately complex scenarios. However, there were cases where the model misinterpreted the exact date format required by SQL.
Traditional Text-to-SQL
The traditional system also performed well here but required explicit natural language patterns (e.g., having the word “count” present). If the query was phrased in a more conversational way, the traditional system struggled, needing more fine-tuning.
Conclusion
LLMs are more adaptable to different phrasings of the query and handle date manipulation intuitively. The traditional Text-to-SQL system worked well but had limitations with less structured inputs.
3. Complex Query Generation
LLM Approach
For more complex queries like “Find users who ordered products costing more than $100 in the last three months,” the LLM generated:
Generated SQL
SELECT Users.UserName FROM Users JOIN Orders ON Users.UserID = Orders.UserID JOIN Products
ON Orders.ProductID = Products.ProductID
WHERE Products.Price > 100
AND Orders.OrderDate >= NOW() - INTERVAL 3 MONTH;
This output was impressive since it handled joins between multiple tables, date manipulation, and filtering by price. However, it wasn’t always perfect. Sometimes, it generated SQL syntax that wasn’t optimized or correct for specific databases (e.g., INTERVAL
works for MySQL but not all SQL databases).
Traditional Text-to-SQL
The traditional system struggled more with complex queries, especially those involving multiple joins or date filtering. In some cases, I had to tweak the natural language input to fit pre-existing templates for it to produce the correct SQL.
Conclusion
LLMs were far superior for complex queries due to their understanding of context, relationships, and more complex language structures. Traditional systems would require significant engineering effort or hand-holding to achieve similar results.
4. Handling Schema Ambiguity
- LLM Approach
When given queries with vague references (e.g., “Show me orders from last year”), the LLM struggled if the database schema wasn’t explicitly provided. It occasionally produced SQL referencing columns or tables that didn’t exist in the actual schema, requiring manual corrections. - Traditional Text-to-SQL
Traditional systems had a built-in understanding of the schema, meaning they rarely made mistakes about non-existent columns or tables but needed more rigid input.
Conclusion
LLMs require detailed schema information to be accurate, which could be a drawback in environments with frequent schema changes. Traditional systems handle this aspect better since they are tightly integrated with the schema.
5. Error Handling and Optimization:
- LLM Approach
LLMs didn’t always generate optimized queries. For example, the generated SQL might work but wasn’t always efficient for large datasets. Sometimes, they didn’t account for specific database features like indexes, leading to suboptimal performance. - Traditional Text-to-SQL
Traditional systems tend to produce more optimized queries because they are built to adhere strictly to the underlying database’s features. They may lack the adaptability of LLMs but are often more efficient in execution. - Conclusion
LLMs can generate SQL that works, but it might not be the most performant option for large-scale or complex databases. Traditional systems have an edge in terms of query optimization.
Summary
LLM Strengths
- Excellent for generating SQL queries from flexible, conversational language.
- Superior in handling more complex and ambiguous inputs.
- Capable of adapting to a variety of phrasings and providing immediate SQL drafts.
LLM Limitations
- Not always schema-aware, which can lead to incorrect outputs.
- Lacks query optimization, which may result in less efficient SQL.
- Struggles with exact database-specific features without explicit training or fine-tuning.
Traditional Text-to-SQL Strengths
- Schema-awareness and more reliable query generation based on predefined patterns.
- Produces more optimized and efficient queries for large-scale data.
Traditional Text-to-SQL Limitations
- Less flexible with conversational or ambiguous natural language inputs.
- Struggles with complex, multi-table, or dynamic queries.
Final Verdict
While LLMs provide a much more user-friendly and flexible interface for generating SQL from natural language, they are not yet ready to fully replace traditional Text-to-SQL systems, especially in environments where high accuracy, query optimization, and schema awareness are critical. However, in less formal contexts (e.g., internal tools, analytics reports, or small databases), LLMs offer a highly effective way to quickly generate SQL with minimal effort.