A Consideration of SQL ORMs
Conventional wisdom in modern software development generally endorses the use of an ORM.
“Any project that doesn’t use an ORM will end up with a poorly implemented partial ORM.”
Most developers are familiar with the dreaded SQL injection attack - an ORM provides solid protection, while exposing a comfortable OOP interface for application developers:
from blog.models import Blog, Entry entry = Entry.objects.get(pk=1) cheese_blog = Blog.objects.get(name="Cheddar Talk") entry.blog = cheese_blog entry.save()
Example from the Django framework
But ORMs have a dark side - they hide (but do not reduce) complexity, and the interface they expose doesn’t correspond particularly well to the actual work done by the database. SQL is declarative, already obfuscating execution - ORMs hide even more of the computational work behind class abstractions.
When you get into SQL, it is valuable to spend some time with the query planner. Run enough EXPLAIN statements and you start to see how your declarations are actually executed. Table scans, index-less conditions, expensive joins - SQL has a few typical sources of big costs, and you’ll start developing an instinct for them.
ORMs make it harder to develop that instinct. With an ORM you can retrieve a many-to-many relationship in a single line - the coder might not even know that’s what they’re doing, much less how it is executed in the database. Put some ORM interactions behind utility methods or decorators, have one route that hits a few of those, and you can end up with hundreds of lines of inefficient SQL executed on every page load.
For many applications, the database is the most critical and heavily-used piece of architecture. Overloading it with inefficient queries (just to let developers use classes and objects instead of SQL) can lead to a very bad place. I’ve seen several ORM-backed applications where typical user flow was significantly slower due to cumbersome data queries generated by the ORM.
There isn’t a quick fix for this either - removing an ORM is usually untenable, and optimizing without reducing functionality is a painstaking process. Adding a few indices can help - but if what you really need to do is change how your schema is arranged, you’re in for a lot of work.
Meditating on this, I’m leaning towards a somewhat-unconventional-wisdom perspective - prepared statements should be preferred to ORMs for all but the most straightforward CRUD applications.
import mysql.connector connection = mysql.connector.connect(database='blog') cursor = connection.cursor(prepared=True) get_blog = "SELECT id FROM blog WHERE name = %s" update_entry = "UPDATE entry SET blog = %s WHERE id = %s" cursor.execute(get_blog, ("Cheddar Talk",)) blog_id = cursor.fetchone()[0] cursor.execute(update_blog_entry, (blog_id, 1)) cursor.commit()
Based on the above Django example and the MySQL Python connector
The specific implementation can vary (as it can with ORMs), but the general principle - make your developers actually write SQL. Have strongly typed parameterization, so static checks still prevent SQL injection. But don’t hide the database more than SQL itself already does - encourage developers to interact with it, and to understand their queries. Admittedly this results in more code (at least for this example) - but also, more understanding.
The main positive impact is that developers will better understand the performance impact of the code they write. They’ll also be better at debugging and ad hoc fixes, as they’ll be able to work directly with the database. And if some terrible and inefficient query makes it into the codebase somehow, factoring it out will be much easier than fixing a whole ORM.
Prepared statements have drawbacks (besides writing SQL, which I’m positioning as a plus) - most notably, they don’t give you a data structure to put your results in, while ORMs involve declaring a class for every table. But there are other excellent ways to specify typed objects - and with prepared statements, you only have to do so for classes that are actually useful (and just let the less frequent or more-blobby things live in key-value structures).
Uncoupling “how it’s represented as a business-logic object” from “how it’s stored in the database” lets you write classes that are more efficient - taking only what they need, and not forcing tables or relationships in the database to conform to them (which instead can be designed to optimize for the storage and processing of the database).
Of course, you can do these things with ORMs too - but it involves adding more code to your system, rather than less. The happy path of an ORM is when business logic and data storage perfectly align - in reality, this does not occur.
ORMs do offer some perks - they may claim compatibility with different database backends, and they naturally encourage DRY code (which requires more discipline with prepared statements). But in my experience, supposedly-backend-agnostic tools only “just work” for trivial cases - and relying on any tool to ensure code quality is always inadequate. Good code is a process.
So - next time you’re building an application, ask yourself - is what I’m doing super-simple CRUD? If so, go with an ORM! If not, give prepared statements a try. It may take some more planning, but hopefully you’ll build a better system - and you’ll definitely learn something.
Comments