What is Micro ORM?

In software development, we all know very well what an ORM is:

A tool or usually a program library that implements object-relational mapping (ORM, O/RM), or O/R mapping.

This mapping is a two-way or bi-directional mapping:

  1. From the database to the database client (your application). DB -> APP

Or, to be more specific: from the query results - to your application memory structures, such as object instances and other structures.

  1. From the database client (your application) to the database. APP -> DB

Or, to be more specific: from the query and command programming constructs in your application language (such as C# Linq command and queries) - to query that your database understands and can run such as SQL typically.

The best-known and most widely used ORM in the .NET ecosystem is, of course, the Microsoft Entity Framework, which is, of course, the full-blown ORM that implements wo-way (bi-directional) mappings.

On the other hand, the Micro ORM means it's just a lightweight mapper that maps only one way and one way only. From the query results - to your application memory structures, such as object instances and other structures.

Micro ORM is a term invented and popularized by Dapper, the best-known MicroORM built by Stackoverflow. In practice, that means that when using Micro ORM - all SQL has to be in a raw string command (a raw SQL). For example:

public class Film
{
    public int FilmId { get; set; }
    public string Title { get; set; }
    public int ReleaseYear { get; set; }
    public decimal RentalRate { get; set; }
}

var film = connection
    .Read<Film>(@"
        select 
            film_id, title, release_year, rental_rate 
        from
            film
        where
            film_id = @id", id)
    .Single();
public class Film
{
    public int FilmId { get; set; }
    public string Title { get; set; }
    public int ReleaseYear { get; set; }
    public decimal RentalRate { get; set; }
}

var film = connection
    .Read<Film>(@"
        select 
            film_id, title, release_year, rental_rate 
        from
            film
        where
            film_id = @id", id)
    .Single();

In this example we can see that:

  • SQL command is contained in a string, usually hand-written.
  • Query results are mapped to a database client (program) memory structure - a class named Film.

For most software developers, this is a counter-intuitive, unproductive and generally undesired approach. But yet, it obviously persists. Let's explore reasons why is that so...

Reasons for Micro ORM

Those reasons, at least in my mind, fall into two major categories:

  • Language and Productivity Reasons
  • Performance Reasons

1) Language and Productivity

This may be a surprise to some, but SQL is, actually, a higher-level language than anything you might using in your program:

  • In the example above that is C# and C# is listed as an example of the third generation (3GL) language. [source]
  • In contrast, SQL is listed as an example of the fourth generation (4GL) language. [source]

What does that mean?

Letting your 3GL language to write your 4GL language at the runtime is same as writting assembly language that generates C/C++ at the runtime. It makes no sense.

SQL is a data language and data operations can be simple, complicated or, honestly, insane.

So, yes, the simplest so-called CRUD (create, read, update and delete) operations may very well make sense to do with the ORM. But, SQL in general is a very strange language. It takes little time to learn and almost a lifetime to master.

That translates into the following realities:

Database professionals properly trained with SQL - are often many times more productive with SQL than anything ORM libraries can come up with. Even with simple CRUD commands like in that example above (with the help of modern tools of course).

The fact is that SQL is internationally standardized by the International Organization for Standardization (ISO) [source]. Sure, there are different providers with different interpretations of the same standard, which translates into different dialects of the same standard, but nothing even close to widely different approaches to different ORM libraries where nothing close to standard exists.

Secondly, an even more important fact - is that SQL is a declarative language. You are supposed to simply declare what is it that you want from your data, and the RDBMS engine (your database) should figure out the how part for you. There is a strong emphasis on what, rather than how. And the how part is what is really hard and unproductive.

In practice, that means that we, as SQL developers, are supposed to focus on what we need to get out of the data. While, at the same time the engine abstracts unimportant details from us, such as operating systems, devices and even algorithms.

You are supposed just to declare your intent with data and the engine will figure out the best and most optimized algorithm.

If you ever wondered what is the so-called execution plan for your queries, well, that is it, it's your engine finding out the optimal algorithm for your intent. It doesn't mean it's perfect, but it's good enough. And, actually, that's the element of fifth-generation languages (5GL), not even fourth. Quote: "A fifth-generation programming language (5GL) is a high-level programming language based on problem-solving using constraints given to the program, rather than using an algorithm written by a programmer". [source].

On the other hand, programming languages that are used as database clients (your program in C#) - are lower languages that are imperative and functional, where functions are implicitly coded, (rather than declared), in every step required to solve a problem. This translates directly to productivity.

I highly recommend checking out this tech-talk presentation that clearly demonstrates a dramatic difference in productivity. [link]

SQL is like a Swiss tool for data and anything data-related. I don't think that today, after 50-ish years of development, there is anything that SQL can't do with data. I don't think there is such a thing.

On the other hand, ORM tools and libraries are quite young. At least relatively to SQL. I think that is quite possible that one day those libraries will catch up with SQL features and cover them

In any case, it's quite possible that ORM libraries may bridge this gap one day. That would allow us to use those libraries to express the same intent over data operations, but unfortunately, we're not there yet. Not even close.

So what happens when the library misses a feature from the SQL? One of those two things:

  1. Fallback to raw SQL and Micro ORM approach. Microsoft Entity Framework has finally allowed one-way mapping to instances that are not an official part of the model, which allows for this approach, without a need for the extra Micro ORM library.

  2. Another "manual" approach that involves:

    1. Downloading the desired portion of the data to the client.
    2. Write an appropriate algorithm to process that downloaded portion of data.

This second thing (manually downloading a portion of data) is, obviously, very much suboptimal. In terms of productivity and as well as performance. Trained database profesionals would probably write and test that query before the application developer even declares such a function. This approach is detrimental to performance and in many cases it represents a serious application choke point.

Which brings me to the second point:

2) Performances

Modern ORM libraries that implement dynamic SQL query/command generation (as part of two-way mapping) can impact application performance in many different ways. Sometimes those impacts can be subtle and sometimes even dramatic, but that usually depends on the amount of data being processed.

They usually fall into two major categories:

1) Technical limitations of the library

There's already a performance penalty to start with. Since all commands and queries have to be generated by the library, that generation is not free.

However, that is a minuscule impact, and the latest versions of Microsoft Entity Framework are trying to solve this overhead with a concept of so-called "compiled queries", where that performance penalty is moved to compile time. [source].

I already described above how it is indeed possible to have a situation where a command or query can not be translated to SQL at all. And, therefore, the only way to implement the desired feature is to download a portion of data to the database client and perform the desired operation there (on the client).

A good example of how that reality has traditionally impacted performances are implementation of the so-called active record pattern and the update or delete operation using that pattern.

In the Wikipedia article (source) this criticism isn't even mentioned, but, here is what happens:

  • Download the desired record and map the result into the memory instance.
  • Execute an update or delete command for that record.

So, there are two commands in two steps - the first is select, and the second is update or delete, which will utilize the network to issue a database call twice (latency time, bandwidth time twice) and will also scan the same table twice.

As opposed to simple commands:

  • update my_table set value = @new where id = @id
  • delete my_table where id = @id

One command, one database call and one table scan. No download of anything.

Again, the latest versions of Microsoft Entity Framework are trying to solve this overhead with the new APIs. (source).

Well, it took them only 15 years from the first version, better later than never I suppose. And, Microsoft Entity Framework is the only full-blown ORM that I've heard of, that is even addressed this issue, but someone may correct me on this.

Fine, Microsoft Entity Framework has got this active record performance thing covered. But, how about...

  • How about updating or deleting records in a table as a result of another query?
  • How about updating or deleting records in a table as a result of a hierarchical query on the same table to avoid reference errors in the self-referencing table?
  • How about different window analytical functions?
  • How about grouping sets, data rollups and cubes with totals and subtotals?
  • How about lateral joins where you can join on a subquery?
  • How about distinct on a specific set from expression?
  • How about filtering on the aggregate function?

To be fair, many of these different concepts are just analytical functionalities, and that doesn't mean that the new API won't be introduced in the future to address these things and concepts. It only took them 15 years to properly address the active record performance issue.

In the meanwhile, in order to avoid the raw SQL (which is what many developers tend to do), the only way to implement these issues is to download a portion (often a large portion) and perform processing out of the database.

But wait, that is not all.

Modern SQL systems are very much capable of executing analytical scripts and returning some analytical results.

Complex data processing often requires processing in steps and saving results in intermediary stores for future steps such as temporary tables and table variables for example. A good example would be a hypothetical scenario where something like this:

  1. First step - fetch the data from tables and apply filtering.
  2. Save results into the intermediary store for later use.
  3. Use that intermediary store to calculate some analytics and statistics.
  4. Use that intermediary store to generate a data view, like a page for example.
  5. Return both datasets together.

All of these are possible with modern SQL and yes, temporary tables are a) very fast and cheap and can used as variables and b) part of international SQL ISO standard (source).

I cannot even begin to imagine how some ORM would even try to implement something like this. Theoretically, it could be done in 5 different steps, with 5 subsequent database calls, while the script is still one, single database call (consisting of multiple steps) and it will always be faster, no matter what.

2) Human factor

Besides all of these technical limitations mentioned above, there is always, always a human factor.

For instance, the N+1 antipattern is a well-documented and dreaded problem that can dramatically impact performance. Essentially, a situation where the ORM library (or developer) executes a select query and for each record in results creates and executes another one. So that makes N+1 in total queries (instead of just one).

If you look it up, there are countless blogs and articles on how to avoid N+1 pitfalls and errors with the Microsoft Entity Framework for example. So it's safe to say, it's a common problem that is sometimes unintentional and sometimes even from lack of experience.

It appears that ORMs make CRUD operation painfully simple and easy, but also N+1 bottleneck simple and easy, which is even more painful. Literally.

Another common issue is so-called over-selecting. Instead of returning exactly what your program needs, the tendency is to return an entire width of the records, which results in downloading many times more data from the database server into the database client. Yes, every time you do a select from your program and fetch some data, you are downloading from the database server (except when using SQLite of course). So, instead of downloading a couple of kilobytes program downloads a couple of megabytes. Instead of downloading a couple of megabytes, the program downloads a couple of gigabytes... So the issue?

From my experience, these mistakes happen quite often, even with decades of experience. Because they are easy to make. And then many developers tend to treat the database as just another memory construct, which leads to serious performance degradations.

But, is it all really necessary?

So, the Micro ORM approach is necessary after all, as we can see.

But is the library itself necessary?

Although I developed Norm MicroORM for my project needs I must admit I rarely ever use it.

Why?

This may be a personal preference, but manual mapping isn't that hard at all, at least, in my experience.

There are a couple of useful extensions I usually use, and it may look as simple as this:

public static class Extensions
{
    private static void AddParameters(DbCommand cmd, object parameters)
    {
        foreach (var prop in parameters.GetType().GetProperties())
        {
            var param = cmd.CreateParameter();
            param.ParameterName = prop.Name;
            param.Value = prop.GetValue(parameters) ?? DBNull.Value;
            cmd.Parameters.Add(param);
        }
    }
    
    // single values into tuples 4 tuples
    public static IEnumerable<(T1, T2, T3, T4)> Read<T1, T2, T3, T4>(
        this DbConnection connection, 
        string command, 
        object parameters = null)
    {
        using var cmd = connection.CreateCommand();
        cmd.CommandText = command;
        if (parameters != null)
        {
            AddParameters(cmd, parameters);
        }
        using var reader = cmd.ExecuteReader();
        while (reader.Read())
        {
            yield return (
                reader.GetFieldValue<T1>(0),
                reader.GetFieldValue<T2>(1),
                reader.GetFieldValue<T3>(2),
                reader.GetFieldValue<T4>(3));
        }
    }

    // custom reader callback values into tuples
    public static IEnumerable<T> Read<T>(
        this DbConnection connection, 
        string command, 
        Func<DbDataReader, T> callback, 
        object parameters = null)
    {
        using var cmd = connection.CreateCommand();
        cmd.CommandText = command;
        if (parameters != null)
        {
            AddParameters(cmd, parameters);
        }
        using var reader = cmd.ExecuteReader();
        while (reader.Read())
        {
            yield return callback(reader);
        }
    }

    // etc
}
public static class Extensions
{
    private static void AddParameters(DbCommand cmd, object parameters)
    {
        foreach (var prop in parameters.GetType().GetProperties())
        {
            var param = cmd.CreateParameter();
            param.ParameterName = prop.Name;
            param.Value = prop.GetValue(parameters) ?? DBNull.Value;
            cmd.Parameters.Add(param);
        }
    }
    
    // single values into tuples 4 tuples
    public static IEnumerable<(T1, T2, T3, T4)> Read<T1, T2, T3, T4>(
        this DbConnection connection, 
        string command, 
        object parameters = null)
    {
        using var cmd = connection.CreateCommand();
        cmd.CommandText = command;
        if (parameters != null)
        {
            AddParameters(cmd, parameters);
        }
        using var reader = cmd.ExecuteReader();
        while (reader.Read())
        {
            yield return (
                reader.GetFieldValue<T1>(0),
                reader.GetFieldValue<T2>(1),
                reader.GetFieldValue<T3>(2),
                reader.GetFieldValue<T4>(3));
        }
    }

    // custom reader callback values into tuples
    public static IEnumerable<T> Read<T>(
        this DbConnection connection, 
        string command, 
        Func<DbDataReader, T> callback, 
        object parameters = null)
    {
        using var cmd = connection.CreateCommand();
        cmd.CommandText = command;
        if (parameters != null)
        {
            AddParameters(cmd, parameters);
        }
        using var reader = cmd.ExecuteReader();
        while (reader.Read())
        {
            yield return callback(reader);
        }
    }

    // etc
}

This may vary on project needs, but the typical usage may look something like this:

public class Film
{
    public int FilmId { get; set; }
    public string Title { get; set; }
    public int ReleaseYear { get; set; }
    public decimal RentalRate { get; set; }
}

var query = @"
    select 
            film_id, title, release_year, rental_rate 
        from
            film
        where
            film_id = @id";

var film1 = connection.Read<int, string, int, decimal>(query, new { id = 999 })
    .Select(t => new Film
    {
        FilmId = t.Item1,
        Title = t.Item2,
        ReleaseYear = t.Item3,
        RentalRate = t.Item4
    })
    .Single();

var film2 = connection.Read<Film>(query, r => new Film
        {
            FilmId = (int)r["film_id"],
            Title = (string)r["title"],
            ReleaseYear = (int)r["release_year"],
            RentalRate = (decimal)r["rental_rate"]
        },
        new { id = 999 })
    .Single();
public class Film
{
    public int FilmId { get; set; }
    public string Title { get; set; }
    public int ReleaseYear { get; set; }
    public decimal RentalRate { get; set; }
}

var query = @"
    select 
            film_id, title, release_year, rental_rate 
        from
            film
        where
            film_id = @id";

var film1 = connection.Read<int, string, int, decimal>(query, new { id = 999 })
    .Select(t => new Film
    {
        FilmId = t.Item1,
        Title = t.Item2,
        ReleaseYear = t.Item3,
        RentalRate = t.Item4
    })
    .Single();

var film2 = connection.Read<Film>(query, r => new Film
        {
            FilmId = (int)r["film_id"],
            Title = (string)r["title"],
            ReleaseYear = (int)r["release_year"],
            RentalRate = (decimal)r["rental_rate"]
        },
        new { id = 999 })
    .Single();

To be clear, the code above is just an example. Real-world code could include more overloads, perhaps execute methods, some null checking, etc, but, the principle is the same.

And, yes, this is the manual mapping. And yes, this is my preference these days.

With modern tools such as the Copilot, it is absurdly simple and easy to write and maintain.

If that is not your cup of tea, take a look a performance benchmarks on this website between various Micro ORM libraries and different approaches.

The question is, are even Micro ORM necessary? Perhaps not? All I can say is that I prefer the approach described above.

Conclusion

Full-blown, two-way mapping ORM libraries, like the Microsoft Entity Framework, indeed do have their distinct and important advantages:

Advantage 1: Strongly Typed Schema

Since data models are usually declared as classes in a full-blown ORM, and if that language is strongly typed (not all are), naturally, all queries and commands in that language will be also strongly typed.

That means that on every schema change, all code has to be also adjusted in order to be able to be built, which means, errors on schema change are caught early and refactoring is simplified.

This may be pretty significant, but the small downside is that double schema (one in your program and the other in the database) has to be maintained (usually through automatic migrations).

Advantage 2: Simplified Database Provider Change

Full-blown ORMs should negate any difference between different SQL dialects between different vendors, which should enable easy switching between different providers.

In theory, of course. In practice, this is a very rare event, and it is never easy or simple.


So, in conclusion, SQL is a strange language. Very easy to get started with, and decades, even a lifetime to master. I believe that is very important to master your tools, and then select the right tool for the job.

I hope I managed to explain why Micro ORMs exist, what are they used for, and what are some of the alternative approaches, with pros and cons.

Thank you for reading.

To receive notifications about new posts and updates, consider subscribing to my LinkdIn page:
vb-software linkedin

You will receive notifications about new posts on your LinkedIn feed.
Comments