~gallant (Posts about engineering)https://gallant.dev/categories/engineering.atom2024-01-15T22:54:22ZAaron GallantNikolaA Consideration of SQL ORMshttps://gallant.dev/posts/a-consideration-of-sql-orms/2023-12-03T17:04:13-08:002023-12-03T17:04:13-08:00Aaron Gallant<figure><img src="https://gallant.dev/images/wizard_computer_engineer.png"></figure> <div><p>Conventional wisdom in modern software development generally endorses the use of an ORM.</p>
<blockquote>
<p>"Any project that doesn’t use an ORM will end up with a poorly implemented partial ORM."</p>
</blockquote>
<p>Most developers are familiar with the dreaded <a href="https://xkcd.com/327/">SQL injection attack</a> - an ORM provides solid protection, while exposing a comfortable OOP interface for application developers:</p>
<div class="code"><pre class="code literal-block"><span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Blog</span><span class="p">,</span> <span class="n">Entry</span>
<span class="n">entry</span> <span class="o">=</span> <span class="n">Entry</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">cheese_blog</span> <span class="o">=</span> <span class="n">Blog</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">"Cheddar Talk"</span><span class="p">)</span>
<span class="n">entry</span><span class="o">.</span><span class="n">blog</span> <span class="o">=</span> <span class="n">cheese_blog</span>
<span class="n">entry</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
</pre></div>
<blockquote>
<p><em>Example from <a href="https://docs.djangoproject.com/en/4.2/topics/db/queries/">the Django framework</a></em></p>
</blockquote>
<p>But ORMs have a dark side - they hide (but do not reduce) <a href="https://grugbrain.dev">complexity</a>, and the interface they expose doesn’t correspond particularly well to the actual work done by the database. SQL is declarative, already obfuscating execution - ORMs hide even more of the computational work behind class abstractions.</p>
<p><a href="https://gallant.dev/posts/a-consideration-of-sql-orms/">Read more…</a> (4 min remaining to read)</p></div><p>Conventional wisdom in modern software development generally endorses the use of an ORM.</p>
<blockquote>
<p>"Any project that doesn’t use an ORM will end up with a poorly implemented partial ORM."</p>
</blockquote>
<p>Most developers are familiar with the dreaded <a href="https://xkcd.com/327/">SQL injection attack</a> - an ORM provides solid protection, while exposing a comfortable OOP interface for application developers:</p>
<div class="code"><pre class="code literal-block"><span class="kn">from</span> <span class="nn">blog.models</span> <span class="kn">import</span> <span class="n">Blog</span><span class="p">,</span> <span class="n">Entry</span>
<span class="n">entry</span> <span class="o">=</span> <span class="n">Entry</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">cheese_blog</span> <span class="o">=</span> <span class="n">Blog</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">"Cheddar Talk"</span><span class="p">)</span>
<span class="n">entry</span><span class="o">.</span><span class="n">blog</span> <span class="o">=</span> <span class="n">cheese_blog</span>
<span class="n">entry</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
</pre></div>
<blockquote>
<p><em>Example from <a href="https://docs.djangoproject.com/en/4.2/topics/db/queries/">the Django framework</a></em></p>
</blockquote>
<p>But ORMs have a dark side - they hide (but do not reduce) <a href="https://grugbrain.dev">complexity</a>, and the interface they expose doesn’t correspond particularly well to the actual work done by the database. SQL is declarative, already obfuscating execution - ORMs hide even more of the computational work behind class abstractions.</p>
<!-- TEASER_END -->
<p><img alt='A wizard computer engineer using a terminal to interact with a massive database of global information, rendered via Stable Diffusion in "futuristic-biomechanical cyberpunk" style' src="https://gallant.dev/images/wizard_computer_engineer.png"></p>
<p>When you get into SQL, it is valuable to spend some time with the query planner. Run enough EXPLAIN statements and you start to see how your declarations are actually executed. Table scans, index-less conditions, expensive joins - SQL has a few typical sources of big costs, and you’ll start developing an instinct for them.</p>
<p>ORMs make it harder to develop that instinct. With an ORM you can retrieve a <a href="https://en.wikipedia.org/wiki/Many-to-many_(data_model)">many-to-many relationship</a> in a single line - the coder might not even know that's what they're doing, much less how it is executed in the database. Put some ORM interactions behind utility methods or decorators, have one route that hits a few of those, and you can end up with hundreds of lines of inefficient SQL executed on every page load.</p>
<p>For many applications, the database is the most critical and heavily-used piece of architecture. Overloading it with inefficient queries (just to let developers use classes and objects instead of SQL) can lead to a very bad place. I’ve seen several ORM-backed applications where typical user flow was significantly slower due to cumbersome data queries generated by the ORM.</p>
<p>There isn’t a quick fix for this either - removing an ORM is usually untenable, and optimizing without reducing functionality is a painstaking process. Adding a few indices can help - but if what you really need to do is change how your schema is arranged, you're in for a lot of work.</p>
<p>Meditating on this, I’m leaning towards a somewhat-unconventional-wisdom perspective - prepared statements should be preferred to ORMs for all but the most straightforward <a href="https://en.wikipedia.org/wiki/Create%2C_read%2C_update_and_delete">CRUD</a> applications.</p>
<div class="code"><pre class="code literal-block"><span class="kn">import</span> <span class="nn">mysql.connector</span>
<span class="n">connection</span> <span class="o">=</span> <span class="n">mysql</span><span class="o">.</span><span class="n">connector</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">database</span><span class="o">=</span><span class="s1">'blog'</span><span class="p">)</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">(</span><span class="n">prepared</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">get_blog</span> <span class="o">=</span> <span class="s2">"SELECT id FROM blog WHERE name = </span><span class="si">%s</span><span class="s2">"</span>
<span class="n">update_entry</span> <span class="o">=</span> <span class="s2">"UPDATE entry SET blog = </span><span class="si">%s</span><span class="s2"> WHERE id = </span><span class="si">%s</span><span class="s2">"</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="n">get_blog</span><span class="p">,</span> <span class="p">(</span><span class="s2">"Cheddar Talk"</span><span class="p">,))</span>
<span class="n">blog_id</span> <span class="o">=</span> <span class="n">cursor</span><span class="o">.</span><span class="n">fetchone</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="n">update_blog_entry</span><span class="p">,</span> <span class="p">(</span><span class="n">blog_id</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">commit</span><span class="p">()</span>
</pre></div>
<blockquote>
<p><em>Based on the above Django example and <a href="https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorprepared.html">the MySQL Python connector</a></em></p>
</blockquote>
<p>The specific implementation can vary (as it can with ORMs), but the general principle - make your developers actually write SQL. Have strongly typed parameterization, so static checks still prevent SQL injection. But don’t hide the database more than SQL itself already does - encourage developers to interact with it, and to understand their queries. Admittedly this results in more code (at least for this example) - but also, more understanding.</p>
<p>The main positive impact is that developers will better understand the performance impact of the code they write. They'll also be better at debugging and ad hoc fixes, as they'll be able to work directly with the database. And if some terrible and inefficient query makes it into the codebase somehow, factoring it out will be much easier than fixing a whole ORM.</p>
<p>Prepared statements have drawbacks (besides writing SQL, which I’m positioning as a plus) - most notably, they don’t give you a data structure to put your results in, while ORMs involve declaring a class for every table. But there are other <a href="https://docs.pydantic.dev/latest/">excellent ways to specify typed objects</a> - and with prepared statements, you only have to do so for classes that are actually useful (and just let the less frequent or more-blobby things live in key-value structures).</p>
<p>Uncoupling "how it's represented as a business-logic object" from "how it's stored in the database" lets you write classes that are more efficient - taking only what they need, and not forcing tables or relationships in the database to conform to them (which instead can be designed to optimize for the storage and processing of the database).</p>
<p>Of course, you can do these things with ORMs too - but it involves adding <em>more</em> code to your system, rather than less. The <a href="https://en.wikipedia.org/wiki/Happy_path">happy path</a> of an ORM is when business logic and data storage perfectly align - in reality, this does not occur.</p>
<p>ORMs do offer some perks - they may claim compatibility with different database backends, and they naturally encourage <a href="https://en.wikipedia.org/wiki/Don't_repeat_yourself">DRY code</a> (which requires more discipline with prepared statements). But in my experience, supposedly-backend-agnostic tools only "just work" for trivial cases - and relying on any tool to ensure code quality is always inadequate. Good code is a <a href="https://xkcd.com/844/">process</a>.</p>
<p>So - next time you're building an application, ask yourself - is what I'm doing super-simple CRUD? If so, go with an ORM! If not, give prepared statements a try. It may take some more planning, but hopefully you'll build a better system - and you'll definitely learn something.</p>Superstition - an engineering anti-patternhttps://gallant.dev/posts/superstition-an-engineering-anti-pattern/2022-09-11T15:46:17-07:002022-09-11T15:46:17-07:00Aaron Gallant<figure><img src="https://gallant.dev/images/superstition_mountain.jpg"></figure> <div><p>Design patterns are reusable approaches to common problems - anti-patterns are
the same, but ineffective (and even counterproductive).</p>
<p>An example of a design pattern is the
<a href="https://en.wikipedia.org/wiki/Singleton_pattern">singleton</a> - we often want to
allow access to a resource, without incurring the cost of building it over and
over. Computationally, this means restricting the instantiation of a class to a
singleton, and then coordinating access to it where needed.</p>
<p><a href="https://en.wikipedia.org/wiki/Anti-pattern">Anti-patterns</a> can include code
smells (like the infamous <a href="https://wiki.c2.com/?SpaghettiCode">"spaghetti"</a>),
but are often less technical. Any process that inhibits effective development is
cast as an anti-pattern. Long meetings, analysis paralysis, planning too much
and planning too little - all this and more is categorized and enumerated as
reasons for poor engineering outcomes.</p>
<p>Generalizing failures from producers to their ecosystem is a good thing. Yet
this approach has a blind spot - nontechnical failures can also occur in purely
technical contexts. And so I introduce a new engineering anti-pattern -
superstition.</p>
<p><a href="https://gallant.dev/posts/superstition-an-engineering-anti-pattern/">Read more…</a> (3 min remaining to read)</p></div><p>Design patterns are reusable approaches to common problems - anti-patterns are
the same, but ineffective (and even counterproductive).</p>
<p>An example of a design pattern is the
<a href="https://en.wikipedia.org/wiki/Singleton_pattern">singleton</a> - we often want to
allow access to a resource, without incurring the cost of building it over and
over. Computationally, this means restricting the instantiation of a class to a
singleton, and then coordinating access to it where needed.</p>
<p><a href="https://en.wikipedia.org/wiki/Anti-pattern">Anti-patterns</a> can include code
smells (like the infamous <a href="https://wiki.c2.com/?SpaghettiCode">"spaghetti"</a>),
but are often less technical. Any process that inhibits effective development is
cast as an anti-pattern. Long meetings, analysis paralysis, planning too much
and planning too little - all this and more is categorized and enumerated as
reasons for poor engineering outcomes.</p>
<p>Generalizing failures from producers to their ecosystem is a good thing. Yet
this approach has a blind spot - nontechnical failures can also occur in purely
technical contexts. And so I introduce a new engineering anti-pattern -
superstition.</p>
<!-- TEASER_END -->
<p><img alt="Superstition Mountain" src="https://gallant.dev/images/superstition_mountain.jpg"></p>
<p><em>"Don't touch that, it might break!"</em></p>
<p><em>"I don't know why that is there, but it doesn't hurt. Keep doing it."</em></p>
<p><em>"This syntax is weird, but it works. I guess I'll just reuse it."</em></p>
<p>Even in my initial draft of this post, I fell prey to superstition. I used
abstruse ``LaTeX quotes'' instead of straightforward "quotation marks."</p>
<p>Shown in code:</p>
<div class="code"><pre class="code literal-block">``LaTeX quotes''
"quotation marks"
</pre></div>
<p>Why did I fall back to old, cumbersome ways? Because with past systems, it was
necessary. My current setup doesn't require it - but it still works, even though
a newer approach is now possible.</p>
<p>This is sometimes called "legacy" or "backwards compatibility" - and
psychologically, it is an immensely good thing. It doesn't take that many
rotations around the sun to become "legacy", and any system that is human-facing
should be empathic and avoid unnecessary changes. Newer is not necessarily
better, and much iteration is simply churn.</p>
<p>But in the technical layer - the code that talks to code, and is written by
people who are well-paid to understand it - anachronisms are simply incorrect.
If a better approach is developed, it should be utilized. This does not mean
that all new things are improved - but it does mean that all improvements should
be employed.</p>
<p>When improvements are deferred, the cost compounds. Future developers will have
even more to untangle if they want to update the system, which further pushes
them to rationalize not doing so. After all, they have sprint goals to meet, and
are rewarded for shipping features, not engaging in code archaeology. They rely
on frameworks and builtins, but do not have the time to understand them.</p>
<p>This situation isn't simply technical debt - it's more of a technical <em>drift</em>.
The issues are accumulated, yes, but also varied. Some of the codebase may be
relatively healthier, leading to uncertainty on where or what to refactor.</p>
<p>The resulting behavior overall is, as described at the opening, superstition.
Even though engineers are tasked to build deterministic systems, when met with
these challenges they find themselves indulging in all manners of "magical"
thinking.</p>
<p>If you write code, strive to take the time to understand it too. If you don't,
appreciate that those who do aren't just
<a href="https://xkcd.com/722/">manipulating pixels</a> - they're building a system, and
that system requires care and maintenance.</p>
<p>Computational systems are <em>not</em> magic - but, if treated poorly, they (like a car
that is never maintained) <em>are</em> unreliable. If it is a prototype, for an
ephemeral purpose or few users, maybe that's okay - but if it is a critical
system or real product, both the engineers and the users deserve better.</p>