Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Rise of SQL:the second programming language everyone needs to know (ieee.org)
100 points by b-man 7 hours ago | hide | past | favorite | 86 comments




I've loved and used Django ORM and SQLAlchemy for many years. It got me a long way in my career. But at this point I've sworn-off using query-builders and ORMs. I just write real, hand-crafted SQL now. These "any db" abstractions just make for the worst query patterns. They're easy and map nicely to your application language, but they're really terrible unless you want to put in the effort to meta-program SQL using whatever constructs the builder library offers you. CTEs? Windows? Correlated subqueries? It's a lot. And they're always lazy, so you never really know when the N+1s are going to happen.

Just write SQL. I figured this out when I realized that my application was written in Rust, but really it was a Postgres application. I use PG-specific features extensively. My data, and database, are the core of everything that my application does, or will ever do. Why am I caring about some convenient abstractions to make it easier to work with in Rust, or Python, or whatever?

Nah. Just write the good SQL for your database.


Anytime this topic comes up, this opinion is invariably at the top of the comments. However I've never seen a non-trivial application made this way. Mind sharing one? More than the query generation, I think people reach for ORMs for static typing, mapping, migrations, transactions, etc.

I'm not doubting that it can be done, I'm just curious to see how it's done.


I formerly worked for a travel company. It was the best codebase I've ever inherited, but even so there were select N+1's everywhere and page loads of 2+ seconds were common. I gradually migrated most of the customer-facing pages to use hand-written SQL and Dapper; getting most page loads below 0.5 seconds.

The resulting codebase was about 50kloc of C# and 10kloc of SQL, plus some cshtml and javascript of course. Sounds small, but it did a lot -- it contained a small CMS, a small CRM, a booking management system that paid commissions to travel agents and payments to tour operators in their local currencies, plus all sorts of other business logic that accumulates in 15+ years of operation. But because it was a monolith, it was simple and a pleasure to maintain.

That said, SQL is an objectively terrible language. It just so happens that it's typically the least of all the available evils.


YouTube is one from my experience. The team there had a pretty strong anti-orm stance. DB performance was an existential necessity during the early scaling. The object fetching and writing tended to be focused through a small number of function calls with well scrutinized queries and write through memcaching.

The company I work for is one such example. We write inline SQL in a Python Flask+Celery app which processes >$3bn of salaries a month. The stated goal from the CTO, who was an early engineer, is simplicity.

In addition to the great replies folks are sharing, I've found LLMs are quite good at authoring non-trivial SQL. Have effectively been using these to implemnt + learn so much about Postgres

Many great SQL examples have long existed on stackoverflow and similar sources, but until the recent past were buried by lower quality questions and answers or SEO spam.

You will find that if you check sources they are lifted almost verbatim. LLMs are a way to cut through the noise, but they are rarely "authoring" anything here.

It's wild how far a little marketing can go to sell the same or an arguably worse product that used to be free and less unethical.


I've worked on a few, nothing I can share. I don't mind using an data mappers like Dapper in C# that will give you concrete types to work against with queries. Easy enough with data types for parameterized inputs as well.

Every single time. Where are these developers? Orms are a god send 98% of the time. Sure, write some SQL from time to time, but the majority of the time just use the ORM.

We have a POS system where entire blogic is postgres functions.

There are many others as well. Sure Rails/Laravel/Django people use the ORM supplied by their framework, but many of us feel it's un-necessary and limiting.

Limiting because for example many of them don't support cte queries(rails only added it a couple of years ago). Plus it get weird when sometimes you have to use sql.raw because your ORM can't express what you want.

Also transactions are way faster when done in a SQL function than in code. I have also seen people do silly things like call startTransaction in code and the do a network request resulting in table lock for the duration of that call.

Some people complain that writing postgres functions make testing harder, but with pglite it's a non issue.

As an aside I have seen people in finance/healthcare rely on authorization provided by their db, and just give access to only particular tables/functions to a sql role owned by a specific team.


> Orms are a god send 98% of the time.

People who write percentages make shit up 98% of the time.

Or in other words: Source?


> Sure, write some SQL from time to time, but the majority of the time just use the ORM

So add another layer that has to be maintained/debugged when you don't have to?


I worked at a company where we used Dapper with plain SQL. Like the sibling commenter said, simplicity. There were never [ORM] issues to debug and queries could easily be inspected.

I love SQL and use it all day long to answer various business questions, but I would never use raw SQL in my code unless there is a good reason for it (sometimes there is). ORMs are there for maintainability, composability, type safety, migrations, etc.. trying to do all that with raw SQL strings doesn't scale in a large code base. You need something that IDE tools can understand and allow things like 'find all references', 'rename instances', compile time type checks, etc.. Raw SQL strings can't get you that. And managing thousands of raw SQL strings in a code base is not sustainable.

ORMs are one of those things that a lot of people think is a replacement for knowing SQL. Or that ORMs are used as a crutch. That has nothing to do with it. Very similar to how people here talked about TypeScript 10 years ago in a very dismissive way. Not really understanding its purpose. Most people haven't used something like Entity Framework either which is game changing level ORM. Massive productivity boost, and LINQ rivals SQL itself in that you can write very small yet powerful queries equivalent to much more complex and powerful SQL.


SQL is such a joy to work with compared to all the baggage ORMs bring. I’m not against ORMs but I like to keep them as thin as possible (mostly to map columns to data objects). I’ve been happily using JDBC and Spring Data JDBC (when I needed to use Repository pattern) for a long time in Java.

ORMs come with a lot of baggage that I prefer to avoid, but it probably depends on the domain. Take an e-commerce store with faceted search. You're pretty much going to write your own query builder if you don't use one off the shelf, seems like.

I once boasted about avoiding ORM until an experienced developer helped me to see that 100% hand‑rolled SQL and customer query builders is just you writing your own ORM by hand.

Since then I've embraced ORMs for CRUD. I still double-check its output, and I'm not afraid to bypass it when needed.


Exactly, and any good ORM will let you drop down to pure SQL if you need to for the weird cases.

Every Oracle rep I've ever met said every app should be a SQL app.

I've been using django & duckdb together, which keeps me from using the ORM. Was this a happy accident for me? For background, I have a scientist background; I don't have as much experience w/ software and designing database apps.

Indeed, Dapper, myBatis, jOOQ,...

Dapper is fantastic, and I'm happy to see it getting some love. It does exactly what I want: provides strongly-typed mapping and protects against SQL injection. It makes it easy to create domain-specific repositories without leaking anything.

In contrast, every company I've joined that used Entity Framework had enterprise products that ended up being a tightly coupled mess from IQueryable<T> being passed around like the world's favourite shotgun.


Dapper is an unmitigated joy for me. i get to write the best sql needed for the case and then let the micro-orm handle the rest.

The cargo-cult shibboleth of "never put business logic in your database" certainly didn't help, since a lot of developers just turned that into "never use stored procedures or views, your database is a dumb store with indexes."

A lot of people probably think it's better to keep database "easy to swap". Which is silly, its MUCH easier to change your application layer, than database.

There's value in not having to hunt in several places for business logic, having it all in one language, etc. I was ambivalent on the topic until I encountered an 12 page query that contained a naive implementation of the knapsack problem. As with most things dogma comes with a whole host of issues, but in this case I think it's largely benign and likely did more good than harm.

> hunt in several places for business logic

But that is the result of having multiple applications needing to enforce valid states in the database.

"Business logic" is a loose term. The database is the effective store for state so it must enforce states, eg by views, triggers, and procedures.

Other "business logic" can happen outside of the db in different languages. When individual apps need to enforce valid states, then complexity, code, etc grows exponentially.


Did that 12 page query have any automated tests?

genuinely curious, can you steel man stored procedures? views make intuitive sense to me, but stored procedures, much like meta-programming, needs to be sparingly used IMO.

At my new company, the use of stored procedures unchecked has really hurt part of the companies ability to build new features so I'm surprised to see what seems like sound advice, "don't use stored procedures", called out as a cargo cult.


My hunch is that the problems with stored procedures actually come down to version control, change management and automated tests.

If you don't have a good way to keep stored procedures in version control, test them and have them applied consistently across different environments (dev, staging, production) you quickly find yourself in a situation where only the high priests of the database know how anything works, and making changes is painful.

Once you have that stuff in git, with the ability to run automated tests and robust scripting to apply changes to all of your environments (I still think Django's migration system is the gold standard for this, though I've not seen that specifically used with stored procedures myself) their drawbacks are a lot less notable.


> My hunch is that the problems with stored procedures actually come down to > version control

Git? (and migrations)

> change management

Again. Just like any other code.

> and automated tests.

Just write an automated test like you write any other kind of test?


That's exactly what I'm saying. If you do those things stored procedures stop sucking.

It's also about separately scaling your business logic from the data layer

You give no reasons why you think it's a sound advice.

My experience is following

1) Tx are faster when they are executed a sql function since you cut down on network roundtrip between statements. Also prevents users from doing fancy shenanigans with network after calling startTransaction.

2) It keeps your business logic separated from your other code that does caching/authorization/etc.

3) Some people say it's hard to test sql functions, but since pglite it's a non issue IMO.

4) Logging is a little worse, but `raise notice` is your friend.

> At my new company, the use of stored procedures unchecked has really hurt part of the companies ability to build new features

Isn't it just because most engineers aren't as well versed in SQL as they are in other programming languages.


It’s about what you want to tie to which system. Let’s say you keep some data in memory in your backend, would you forbid engineers from putting code there too, and force it a layer out to the front end - or make up a new layer in between the front end and this backend just because some blogs tell you to?

If not, why would you then avoid putting code alongside your data at the database layer?

There are definitely valid reasons to not do it for some cases, but as a blanket statement it feels odd.

Stored procedures can do things like smooth over transitions by having a query not actually know or care about an underlying structure. They can cut down on duplication or round trips to the database. They can also be a nightmare like most cases where logic lives in the wrong place.


One of the few things I have used in programming and technology consistently for over 25 years is SQL. Almost no time spent learning how to organize and query data has been a waste in my career.

Bingo.

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships

Some quotes stick with you throughout your whole career.


Any recommended resources you wish you had encountered earlier?

Hmm, I sort of learned ad-hoc. Joe Celko's books were good back in the day. I never read something a lot later that was an "aha" for me. I think I was a little resistant to "NoSQL" databases for a while but eventually they made sense to me. I can't think of a single resource or turning point. There are probably some very good books out there now. The key thing is not /everything/ has to be SQL. And SQL databases like Postgres and SQLite can be used for a lot more than SQL now. Also, don’t be afraid to just throw protos/JSON/whatever into a database with no or mininal schema to get going. But manage data design debt ruthlessly, it can haunt you.

My biggest learnings:

Don't prematurely normalize data, but if it is obvious it can always stay normalized, normalize it. Read the normal forms. Learn about indexing and how data is actually being stored on disk. Just knowing about indexes is a huge advantage even today. Understand and know when to use different styles of data storage: row oriented, column oriented, key value, bigtable style (2d key value), document (rare). Pick good systems. Spend more time than you think you should designing your data. The system is often easy if the data is right. Learn ACID and CAP theorem. Learn when and where you can trade on fundamental database principles in your data model for performance or ease of development. Honestly, a lot of this stuff senior engineers at big tech are just expected to know these days, but it still isn't really obvious and not everyone has big tech problems. Still if you know how to solve the problems at scale and you can get out of your own way it is much easier to write smaller systems (most problems people have).

So in terms of resources, go learn about each of those concepts. Read papers. Ask an LLM about them. Play with databases and storage systems. Maybe try to write your own simple database. Go read about how people design massively scaled distributed systems and what systems they use to manage data. Just like with programming languages, be flexible and open minded. Read about how distributed systems work (CAP theorem). Almost all data systems make tradeoffs in that realm to meet cost/performance/implementation goals.


Thanks for writing this out - I appreciate it

table of contents of the manual of the RDBMS your project is using is a good start and this is not a joke. most senior engineers (by job title, anyway) haven't gone that far.

Knowing SQL, and knowing how to optimize queries has pretty much paid my paycheck for 20 years.

> the second programming language everyone needs to know

Do they though? I've been writing SQL for over twenty years, and my experience is that LLMs have been better at writing it than I am for at least most of 2025, for most use cases. I have zero doubt that I will only be writing SQL when I want to for fun no later than sometime 2027.


People who don't already know it still need to learn it if they wanna manage LLMs writing it. Anything else is reckless. So the original point stands.

Agreed with that. As with writing SQL by hand you have to be very specific with instructing an LLM. There are many ways to get to a solution in SQL all present different tradeoffs and corner cases. I found that people that don't understand SQL and the basic of a given schema produce garbage both by hand and with LLMs

> I've been writing SQL for over twenty years, and my experience is that LLMs have been better at writing it than I am for at least most of 2025

Wow, bad career choices?


The mere existence of Pandas makes me extremely grateful for SQL, because my job would be absolute hell if I had to use pandas or a similar syntax. It’s hard to overemphasize just how perfect SQL is for the job that it does.

Agree that Pandas is horribly irregular - the only worse query language I’ve had to work with is Mongo’s. After about a decade of regular Pandas use, switching to Polars was such a relief. It’s not perfect since it’s slightly limited by being a Python library rather than an embedded query language but it’s so much better designed than Pandas - even ignoring the huge performance improvement. In my circle, Pandas is being abandoned en mass for Polars.

I don't think SQL is "perfect" and I'm not sure it's rational to even be saying that. For instance, why is it that the syntax for an SQL query is "select A from B" when many SQL-inspired syntaxes have switched to something like "from B select A" to make it more compositional?

The relational model is pretty simple though. Pandas is an awful mess.


SQL is great, but what is even better is a SOTA client. https://github.com/elixir-dbvisor/sql the BEAM can give you superpowers that no other platform can, handle massive concurrency with the performance that rivals bare metal and c https://erlangforums.com/t/elixir-dbvisor-sql-needs-a-sota-p...

A recent article in the space: What Goes Around Comes Around... And Around... | July 1, 2024 | 30 comments | https://news.ycombinator.com/item?id=40846883

The basic thesis is that the relational model and SQL has been the prevailing choice for database management systems for decades and that won't change soon.

Resubmitted because it's a good one: https://news.ycombinator.com/item?id=46359878


If you do backend web development in 99% of software companies then being very good at whatever your RDBMS is is a superpower.

It's definitely worth learning SQL very well, but you also need to learn the data structures your RDBMS uses, how queries translate into operations on that data, and what those operations look like in a query plan.

You can go surprisingly far with just that knowledge.

A great resource is https://use-the-index-luke.com/


SQL has been the main skill I have relied upon my entire career. Yes, I have worked with Pandas and other data libraries; my take away from working with Pandas is it is a pretty language but obfuscates the relational database with a non-relational lanuguage. Relational databases require a relational language which is what SQL is.

I am quite found of PL/SQL, and stored procedures, no need to waste network bandwith with what can be done on the database.

'The original microservices'

Microservices are older than those, hence why Sun had the motto "The network is the computer". :)

By the time you are referring to, we were already on the classical 3 tier architecture, the

There are indeed Web frameworks for RDMS, that allow to expose the database as microservices, like Oracle's APEX, which grew out of Oracle's Visual Basic version, which used PL/SQL instead of BASIC.


The CM DB group YT channel is good place to learn about the basics and advanced topics: https://www.youtube.com/@CMUDatabaseGroup

I find them great for database development, but haven't seen practical "how to use SQL" type advice from Andy.

I’ve always hated SQL, but fortunately LLMs write it so well that it’s effectively become a read-only language now. You just need to know enough to check the output.

I agree. Claude Code writes superb SQL queries for very complex data. I was dealing with PostgreSQL recently, and it improved the query from 30 seconds to 5 seconds. I couldn't figure it out myself.

How do you present the interrelations between the tables when you're dealing with complex table structures?

Prompting with documentation and examples works. In an agentic tool having an MCP server for the db helps assuming it is a straightforward schema with explicitly defined relationships. Also helps if the tables correspond to entities in a natural way.

Use a tool to do that. Try https://visualdb.com it can send the relationships and table definitions to AI.

We're pretty locked down around AI tools. Right now we can only really use GH Copilot, it's been ok so far though it's funny to see it suggest edits then suggest the opposite on the next time it review the PR if you accept them.

sonnet 4.5 was really bad at anything more than simple queries. even GPT 5 was not great. gemini was consistently good even at 2.5; caught multiple bugs in outputs of either. I haven't tested Opus 4.5 properly at SQL yet, but I've got a feeling Anthropic doesn't prioritize it in training and google does.

Learning SQL basically launched my career as a professional SWENG. Once I knew SQL, I found ways to apply it in even non-technical jobs.

This is a nice coincidence.

I’ve been heads-down on publishing a JavaScript full-stack metaframework before the end of the year. However, in the past two weeks I’ve been goaded by Claude Code to extract and publish a separate database client because my vision includes Django-style admin/forms. The idea is to use Zod to define tables, and then use raw SQL fragments with JavaScript template tags. The library adds a tiny bit of magic for the annoying parts of SQL like normalizing join objects, taking care of SELECT clauses and validating writes.

I’m only using it internally right now, but I think this approach is promising. Zod is fantastic for this use-case, and I’m sad I’ve only just discovered it.

https://github.com/bikeshaving/zen


I notice that the top image is of Transact SQL, the Sybase/Microsoft dialect. This is not a formal standard, and I suggest against its use.

https://en.wikipedia.org/wiki/Transact-SQL

SQL/PSM is a general ISO standard that grew out of Oracle PL/SQL, is rooted in ADA, and is implemented by a large range of databases.

https://en.wikipedia.org/wiki/SQL/PSM

Standards are important.


Folks, the article is from 3 years ago, 2022.

Somewhat tangential to the article, but why is SQL considered a programming language?

I understand that's the convention according to the IEEE and Wikipedia [1], but the name itself - Structured Query Language - reveals that its purpose is limited by design. It's a computer language [2] for sure, but why programming?

[1] https://en.wikipedia.org/wiki/List_of_programming_languages

[2] https://en.wikipedia.org/wiki/Computer_language


With support for Common Table Expressions (CTE), SQL becomes a Turing complete language. To be honest, it makes me somewhat uncomfortable that a query sent to a DB server could be non-terminating and cause a server thread to enter an infinite loop. On the other hand, the practical difference between a query that contains an infinite loop and one that runs for days is probably negligible.

To be honest, I'd like to chip in that it is technically possible to write brainf*ck, an esoteric programming language but nonetheless, its a programming language

https://www.reddit.com/r/SQL/comments/81barp/i_implemented_a...

Btw this runs in sqlite, you can try it yourself if you are interested.

Source: I was thinking of creating a programming language paradigm like sqlite/smalltalk once where resumed execution/criu like possibilities were built in. Let me know if someone knows something like this too. I kinda gave up on the project but I knew that there was this one language which supported this paradigm but it was very complicated to understand and had a lot of other first time paradigm like the code itself / the ast tree is sort of like a database itself but so the tangential goes.


What is your definition of 'programming language'?

It should have arrays, and loops and conditionals.

Slightly simplistic: table rows cover arrays, recursive CTEs cover loops, and JOIN/WHERE cover conditionals.

Because stored procedures do exist, and there isn't a single production quality RDMS that doesn't go beyond DDL and DML, adding structured programming extensions.

Also, even within the standard itself, it allows for declarative programming.


Also SQL is not turing complete. I see it more as a descriptive language like e.g. html is a language but not a programming language.

This is completely wrong. The SQL spec isn't Turning complete but multiple DBs provide Turing complete extensions (like pgplsql) that make it so. Also, even without the extensions, it is still very much a programming language by any definition of the term. Like most takes on SQL, it is more about your understanding (or lack thereof) of SQL.

It can do loops and recursion. It can use as much memory as it is allowed. It can do general programming via functions and stored procedures.

It can't do loops. Unless you're talking about extensions to SQL such as PL/SQL and T-SQL.

Because "programming language" is an adjective or a descriptive term. Whatever looks like a programming language, can be called a programming language.

With LLMs, you should be able to just query in English and have LLMs transpile from English to SQL.

SQL largely is plain English, which is one of its design choices.

I've always gravitated towards query languages and SQL is one of my favourites. I've never really understood the need for ORMs and other abstractions but then I'm not a software developer.

If I was going to chose a "third" language I'd say regex.


Data is the new oil or gold, SQL is the tool, the language to interact with it.

Put it together, it's pure gold!


I much prefer Kusto query language. SQL needs a few tweaks so that it's more type safe and supports auto completion. Some engines support From-first which is a good start.

without CTE, is SQL a programming language?

IEEE Spectrum is full of uninspiring blogspam, like this post.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: