How DDD is screwing up your SQL

This morning, I came across an interesting article about DDD and SQL on the internet named How SQL is screwing up your DDD. Normally I would post this article on my LinkedIn page with my own thoughts, but the subject matter is way too interesting, so, I'll just make my own blogpost.

The main premise of the article is an exercise, a code kata, in data modeling. Quote:

The goal is to model a collection of value object items, where the collection maintains a ā€œdefaultā€ item. It forces you to consider things like

How do you handle selecting which item is the default? How do you avoid the possibility of the collection ending up with no default item? How do you handle the case where the default item is removed from the collection? How do you avoid the possibility of the collection having no items at all?

A hypothetical example given is the Customer entity, which possesses a collection of Address.

It appears that the ideal domain model in a domain-driven world for this problem would be something like this:

public class Address
{
  public int Id { get; private set; }
  public string Street { get; private set; }
  public string City { get; private set; }
  
  public int CustomerId { get; private set; }
  public Customer Customer { get; private set; }

  // Other properties and methods omitted for brevity
}

public class Customer
{
  public int Id { get; private set; }

  private readonly List<Address> _addresses = new List<Address>();
  public IEnumerable<Address> Addresses => _addresses.AsEnumerable();

  public Address DefaultAddress { get; set; }

  // Other properties and methods omitted for brevity
}
public class Address
{
  public int Id { get; private set; }
  public string Street { get; private set; }
  public string City { get; private set; }
  
  public int CustomerId { get; private set; }
  public Customer Customer { get; private set; }

  // Other properties and methods omitted for brevity
}

public class Customer
{
  public int Id { get; private set; }

  private readonly List<Address> _addresses = new List<Address>();
  public IEnumerable<Address> Addresses => _addresses.AsEnumerable();

  public Address DefaultAddress { get; set; }

  // Other properties and methods omitted for brevity
}

However, the article then proceeds to discuss persistence (to the database) and, because of that fact this solution is somehow "problematic", and just because of the persistence and databases, a completely different model is then needed:

public class Address : ValueObject
{
    public string Street { get; private set; }
    public string City { get; private set; }

    // Other properties and methods omitted for brevity
}

public class CustomerAddress
{
  public int Id { get; private set; }
  public Address Details { get; private set; }

  public int CustomerId { get; private set; }
  public Customer Customer { get; private set; }

  // Other properties and methods omitted for brevity
}

public class Customer
{
  public int Id { get; private set; }

  private readonly List<CustomerAddress> _addresses = new List<CustomerAddress>();
  public IEnumerable<CustomerAddress> Addresses => _addresses.AsEnumerable();

  // This is owned type, not a navigation.
  public Address DefaultAddress { get; private set; } = Address.Empty;

  // Other properties and methods omitted for brevity
}
public class Address : ValueObject
{
    public string Street { get; private set; }
    public string City { get; private set; }

    // Other properties and methods omitted for brevity
}

public class CustomerAddress
{
  public int Id { get; private set; }
  public Address Details { get; private set; }

  public int CustomerId { get; private set; }
  public Customer Customer { get; private set; }

  // Other properties and methods omitted for brevity
}

public class Customer
{
  public int Id { get; private set; }

  private readonly List<CustomerAddress> _addresses = new List<CustomerAddress>();
  public IEnumerable<CustomerAddress> Addresses => _addresses.AsEnumerable();

  // This is owned type, not a navigation.
  public Address DefaultAddress { get; private set; } = Address.Empty;

  // Other properties and methods omitted for brevity
}

Quote:

The solution is to create a wrapping entity class around the value object item that would link the items of the collection, to the collection. Why do we need that extra layer? Iā€™ll tell you why: Relational databases.

That extra layer is, actually, a classic many-to-many relationship, which is a way of saying that any number of customers can have any number of addresses and vice-versa. As we can see Entity Framework ORM supports that type of modeling with that extra layer, but newer versions even support omitting that "extra layer" without navigation, by simply defining something like this:

public class Address
{
  // ...

    public List<Customer> Customers { get; } = [];
}

public class Customer
{
  // ...

  public List<Address> Addresses { get; } = [];
}

// ...

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Customer>()
        .HasMany(e => e.Addresses)
        .WithMany(e => e.Customers);
}
public class Address
{
  // ...

    public List<Customer> Customers { get; } = [];
}

public class Customer
{
  // ...

  public List<Address> Addresses { get; } = [];
}

// ...

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Customer>()
        .HasMany(e => e.Addresses)
        .WithMany(e => e.Customers);
}

That "extra layer" is sometimes referred to as the "junction table" or "navigation table", and is what many DDD proponents consider to be a part of the so-called "impedance mismatch", meaning, it is not part of the "domain model" in any shape or form, and it is part of the "persistence to database". The database thing, if you will, and as such shouldn't be present in a domain model and it's not part of domain modeling.

The article then continues to express disgust with that "extra layer":

But a part of me canā€™t help being a little ā€œgrossedā€ out by this.

Because "Within the DDD corpus, you are generally encouraged to think about the domain first..." and that "extra layer" thing is, obviously, a part of "persistence" and "the ideal DDD model, in code, is quite different than the ideal relational database model".

The article then finishes with the following questions:

Are we forever doomed to bastardize our nice, simple, DDD models, just so we can persist them in relational databases?

Are we?

Is that "extra layer" really necessary? Does it really, mess up the domain purity?

Let's discuss.

Domain Model

Let's assume this is the ideal domain model as the article suggests (no "extra layers"):

public class Address
{
  public int Id { get; private set; }
  public string Street { get; private set; }
  public string City { get; private set; }
  
  public int CustomerId { get; private set; }
  public Customer Customer { get; private set; }

  // Other properties and methods omitted for brevity
}

public class Customer
{
  public int Id { get; private set; }

  private readonly List<Address> _addresses = new List<Address>();
  public IEnumerable<Address> Addresses => _addresses.AsEnumerable();

  public Address DefaultAddress { get; set; }

  // Other properties and methods omitted for brevity
}
public class Address
{
  public int Id { get; private set; }
  public string Street { get; private set; }
  public string City { get; private set; }
  
  public int CustomerId { get; private set; }
  public Customer Customer { get; private set; }

  // Other properties and methods omitted for brevity
}

public class Customer
{
  public int Id { get; private set; }

  private readonly List<Address> _addresses = new List<Address>();
  public IEnumerable<Address> Addresses => _addresses.AsEnumerable();

  public Address DefaultAddress { get; set; }

  // Other properties and methods omitted for brevity
}

One of the requirements states:

How do you handle the case where the default item is removed from the collection?

The core assumption here is in this model, it should be impossible to remove the address from the customer address collection - if that address is referenced as the default address in that customer.

Why do we want to do that?

Well, obviously, that would consist of an illogical state of the model, and thus our model would be invalid (it supports illogical states). So, want to prevent that.

Fine, but the next question: what is really our collection of addresses here:

private readonly List<Address> _addresses = new List<Address>();
public IEnumerable<Address> Addresses => _addresses.AsEnumerable();
private readonly List<Address> _addresses = new List<Address>();
public IEnumerable<Address> Addresses => _addresses.AsEnumerable();

It's a generic list, obviously. What is the generic list? It's a basic memory structure.

This list, however, is exposed as enumerable which supports only iterations, but if we know its actual implementation is a list - we can still cast it as a list and remove any item we want.

Let's remove the default address:

var defaultAddr = customer.Addresses.Single(a => customer.DefaultAddress.Id == a.Id);
(customer.Addresses as List<Address>).Remove(defaultAddr);
var defaultAddr = customer.Addresses.Single(a => customer.DefaultAddress.Id == a.Id);
(customer.Addresses as List<Address>).Remove(defaultAddr);

In this case, nothing happens. We safely removed the address with the same ID as the default address, and the default address doesn't exist in the collection. But, the default address is still there because they are different memory object instances.

So, our domain model, which is basically just a memory data model - doesn't give us any guarantees against illogical states.

Let's take a closer look the the relational model now.

Relational Model

This is the relational model for the example above. Once the data model is understood it takes less than a few minutes to write it up.

begin;

create table addresses (
    address_id int generated always as identity primary key,
    street text not null,
    city text not null
);

create table customers (
    customer_id int generated always as identity primary key,
    name text not null,
    address_id int not null references addresses
);

create table customer_addresses (
    customer_id int not null references customers,
    address_id int not null references addresses,
    primary key (customer_id, address_id)
);

alter table customers add constraint fk_customer_addresses
foreign key (customer_id, address_id) 
references customer_addresses;

end;
begin;

create table addresses (
    address_id int generated always as identity primary key,
    street text not null,
    city text not null
);

create table customers (
    customer_id int generated always as identity primary key,
    name text not null,
    address_id int not null references addresses
);

create table customer_addresses (
    customer_id int not null references customers,
    address_id int not null references addresses,
    primary key (customer_id, address_id)
);

alter table customers add constraint fk_customer_addresses
foreign key (customer_id, address_id) 
references customer_addresses;

end;

What we see here is this:

  • Table with addresses.
  • Table with customers. References address as the default address.
  • Reference table between customers and addresses.
  • Many customers can have many addresses.
  • A customer must have unique addresses (primary key).
  • The default customer address must be one of the customer addresses (composite foreign key).

What we don't see here is this:

  • Files.
  • Directories.
  • Persistence.

The previous domain model was explicitly a memory model. It used explicitly memory device structures and concepts such as lists and objects . They don't have anything to do with the domain either.

And, finally, this relational model gives us guarantees that this data model will not have illogical states under any circumstances:

  • The default address must exist and must be one of the customer addresses.
  • Any attempt to remove an address from a customer which is the default address will be stopped in the tracks. It's impossible.
  • The illogical states are not permitted and data integrity is preserved. Forever. Deal with it.

So this relational model actually our real domain model?

By the definition of Wikipedia - in software engineering, a domain model is a conceptual model of the domain that incorporates both behavior and data.

This is no doubt a conceptual model, but it doesn't have any behavior. Not yet anyway, we will just have to add some.

Now first, the conceptual model, and this is undoubtedly a conceptual model - is by its nature a general-purpose model. It's just a concept of reality adopted for the computer representation. To add some behavior we will need some use cases - how we are going to use this model for example. This, of course, can be anything, from the reporting to displaying it on the user interface and filling up the forms for example.

But, for the purpose of this article, I will assume that the use case is to transform (project) into the ideal domain model described above and in the referenced article. And that's a big assumption, use cases aren't anything like that. In any case:

--
-- Returns a set of customers. Usage:
-- select get_customers(1) - returns customer id=1
-- select get_customers(null) - returns all customers
--
create function get_customers(_customer_id int)
returns setof record
language sql
as $$
select
    customer_id as id,
    name,
    address_id as default_address_id,
    street as default_street,
    city as default_city
from 
    customers
    join addresses using (address_id)
where 
    _customer_id is null or customer_id = _customer_id
$$;

--
-- Returns a set of adresses for the customer. Usage:
-- select get_customer_addresses(1) - returns adresses customer id=10
--
create function get_customer_addresses(_customer_id int)
returns setof record
language sql
as $$
select
    addresses.address_id,
    street,
    city,
    customers.customer_id is not null as is_default
from 
    addresses
    join customer_addresses using (address_id)
    left join customers using (address_id, customer_id)
where 
    customer_id = _customer_id
$$;

--
-- Returns json with all customers, all adresses and the customwer default address. Usage:
-- select get_all_customers_json()
--
create function get_all_customers_json()
returns json
language sql
as $$
select coalesce(json_agg(sub), '[]'::json)
from (
    select
        c.customer_id as id,
        c.name,
        json_build_object(
            'id', def.address_id,
            'street', def.street,
            'city', def.city
        ) as default_address,
        json_agg(addr.*) as addresses
    from 
        example.customers c
        join example.addresses def using (address_id)
        left join example.customer_addresses ca using (customer_id)
        left join example.addresses addr on ca.address_id = addr.address_id
    group by
        c.customer_id,
        c.name,
        def.address_id
) sub
$$;
--
-- Returns a set of customers. Usage:
-- select get_customers(1) - returns customer id=1
-- select get_customers(null) - returns all customers
--
create function get_customers(_customer_id int)
returns setof record
language sql
as $$
select
    customer_id as id,
    name,
    address_id as default_address_id,
    street as default_street,
    city as default_city
from 
    customers
    join addresses using (address_id)
where 
    _customer_id is null or customer_id = _customer_id
$$;

--
-- Returns a set of adresses for the customer. Usage:
-- select get_customer_addresses(1) - returns adresses customer id=10
--
create function get_customer_addresses(_customer_id int)
returns setof record
language sql
as $$
select
    addresses.address_id,
    street,
    city,
    customers.customer_id is not null as is_default
from 
    addresses
    join customer_addresses using (address_id)
    left join customers using (address_id, customer_id)
where 
    customer_id = _customer_id
$$;

--
-- Returns json with all customers, all adresses and the customwer default address. Usage:
-- select get_all_customers_json()
--
create function get_all_customers_json()
returns json
language sql
as $$
select coalesce(json_agg(sub), '[]'::json)
from (
    select
        c.customer_id as id,
        c.name,
        json_build_object(
            'id', def.address_id,
            'street', def.street,
            'city', def.city
        ) as default_address,
        json_agg(addr.*) as addresses
    from 
        example.customers c
        join example.addresses def using (address_id)
        left join example.customer_addresses ca using (customer_id)
        left join example.addresses addr on ca.address_id = addr.address_id
    group by
        c.customer_id,
        c.name,
        def.address_id
) sub
$$;

Alright, that's a lot of behavior. Note that the last function get_all_customers_json returns the JSON with exactly the same ideal domain model described above and in the referenced article.

So, now, if we have:

  • Enforced data model integrity, the meaning we can't have illogical states.
  • The desired behavior is implemented and tested.

I have some questions now:

  • Is our relational model, actually, our real domain model?
  • I know that is not Object-oriented. But, does it have to be?
  • Is the object orientation among our requirements? Has it ever been?
  • Do users care about the object orientation?

Object-oriented is a good technique, but is one that is limited to computer memory only. SQL isn't. In fact, SQL is a higher data language that abstracts devices such as persistence devices and even computer memory. No such thing as files, devices, directories, memory, heap or stack. None of that. All that is left is your model and your logic. And a few indexes too, but nothing is perfect.

My main question is, do we need a use case functionality that actually maps to another model in memory?

Conclusion

What we call the relational model, DDD people usually refer as to the database model. But in fact, it is a conceptual model of reality as the logical model - adapted for database representation.

What DDD people call the domain model, is, in my opinion, the same thing but with one important difference: a conceptual model of reality as the logical model - adapted for the memory representation.

Modern software design tries to model both - create both models and then map the first to the other in an effort to solve all problems with the object-oriented approach. That is the root of all problems. Either don't use RDBMS or use RDBMS and stop trying to solve problems with the Object-Oriented approach.

On the other hand, most of modern web applications are, what call - the data-intense applications.

They don't actually solve any complicated problems per se. Well I guess, some of them do, but most of them deal with a lot of data in a relatively simple manner: show data on the grid, page through the grid, paint the dashboard, show some charts, fill up some forms, show data in different ways, etc.

Inserting another model between the relational model (which is needed to enforce some rules and integrity, as we have seen in this example) makes software development much harder than it realistically should be.

It took just a few minutes to create that relational model described above, and then an additional 5 to 10 minutes to add some behavior in the form of projections from the relational model (SQL functions) to get the results I want, or rather, that application needs. And I am an average SQL developer (who likes to blog apparently).

Just learn modern SQL. That is all.

To receive notifications about new posts and updates, consider subscribing to my LinkdIn page:
vb-software linkedin

You will receive notifications about new posts on your LinkedIn feed.
Comments