On an Ecto Preload Dilemma

In this blog post I'm going to defend the position that you should try to avoid using Ecto.Repo.preload/3. I am going to propose an approach utilizing secondary contexts and Ecto.Query.preload/3.

Here's an example of the type of code snippet I will be arguing for.

defmodule Twitter.Social.Posts do
  alias Twitter.Social.Post

  import Ecto.Query

  def get!(id, preloads \\ []) do
    Post |> preload(^preloads) |> Repo.get!(id)
  end
end

In what follows, I assume we are working with a SQL DB, like PostgreSQL.

Ecto.Query and Ecto.Repo

The purpose of Ecto.Query is to construct queries. You could write SQL:

SELECT * FROM users;

Or you could use the Ecto.Query DSL to write SQL-in-Elixir:

Ecto.Query.from(User)

The purpose of Ecto.Repo is to execute queries, most often supplied by Ecto.Query. This requires knowledge of the database (adapter) in order to succeed, which is dependent on your application's configuration. Thus, Ecto.Repo's functionality consists almost entirely of callbacks that are implemented on your behalf in the module that calls use Ecto.Repo, which by default is something like MyApp.Repo, where MyApp is your application's name. This is perhaps a bit confusing, so just remember that Ecto.Repo is a glorified database client executing glorified SQL queries from Ecto.Query.

query = Ecto.Query.from(User)
MyApp.Repo.all(query)

Given the above, it is then interesting to observe overlapping functionality between these two Ecto contexts, i.e., Ecto.Repo.preload and Ecto.Query.preload. But, consider the nature of preloading. It is concerned with fetching associated data and attaching it to the schema struct. You can construct a sufficiently complex query with joins to achieve this outcome, or you can make multiple calls to the DB to achieve this outcome.

Recommendation

I recommend using Ecto.Query.preload over Ecto.Repo.preload. In particular, you should aim to use Ecto.Query.preload along with joins, since it can result in a single query ran against the DB. Let's take a look with a simple example of artists and albums.

The Schemas

defmodule Music.Artist do
  use Ecto.Schema

  schema "artists" do
    field :name, :string
  end

end

defmodule Music.Album do
  use Ecto.Schema
  alias Music.{Artist}

  schema "albums" do
    field :name, :string
    belongs_to :artist, Artist
  end
end

The Queries

Below is using Ecto.Repo.preload.

q = from a in Album
Repo.all(q) |> Repo.preload(:artist)

... QUERY OK ...
SELECT a0."id", a0."name", a0."artist_id" FROM "albums" AS a0 []

... QUERY OK ...
SELECT a0."id", a0."name", a0."id" FROM "artists" AS a0 WHERE (a0."id" = ANY($1)) [[4, 3]]

Below is using Ecto.Query.preload, and is identical to the above query.

q = from a in Album, preload: :artist
Repo.all(q)

... QUERY OK ...
SELECT a0."id", a0."name", a0."artist_id" FROM "albums" AS a0 []

... QUERY OK ...
SELECT a0."id", a0."name", a0."id" FROM "artists" AS a0 WHERE (a0."id" = ANY($1)) [[4, 3]]

Below is using Ecto.Query.preload with joins. Notice how it is only a single query!

q = from a in Album, join: r in assoc(a, :artist), preload: [artist: r]
Repo.all(q)

... QUERY OK ...
SELECT a0."id", a0."name", a0."artist_id", a1."id", a1."name" FROM "albums" AS a0 INNER JOIN "artists" AS a1 ON a1."id" = a0."artist_id" []

With the above query, you need to manually associate the join with the preload, i.e., preload: [artist: r], otherwise the query engine will not understand your intention to load the joined association, and instead leave it unloaded.

Now that we've seen the upsides of using Ecto.Query.preload, let's look at the "downsides" of using Ecto.Repo.preload.

Two Repo.preload Issues

The first issue with Repo.preload is that you are forced to double-query the database. It's hitting the database once to fetch the primary schema, and then a second time to fetch the association (see the queries above for more). Query.preload without joins results in the same queries, but, with joins, you can be more efficient.

The second issue with Repo.preload is that, if you are using it, it means you've already fetched the primary data. Why didn't you fetch all the data you needed to in the first query? Why can't you edit that original query, instead of tacking on a Repo.preload after the fact? There are good replies to these questions. But, I suspect that, more often than not, Repo.preload is a "code smell" and really you should consider a refactor instead of tacking on more Repo.preloads. After all, it has the dangerously convenient feature of skipping the query if the association is already loaded, which can lead to repeated calls to Repo.preload "just to be sure". If you find yourself using Repo.preload in multiple places in your codepaths, it might indicate that you lack confidence in the shape of your data as it moves through your codebase. Speaking for myself, I found that when I was using Repo.preload, I was essentially addressing the "symptom" of not having a particular association, instead of addressing the "illness" in my codebase of not having a good, systemic way to fetch the data I needed.

My Current Approach to Preloads

The first part of my approach is utilizing the concept of Devon Estes's secondary context. Without going into too much detail (I recommend reading his blog post), secondary contexts help isolate where preload functionality lives. Now, even if you have multiple preload calls, they are only within the secondary context file for a given schema, which will make refactoring easier later.

The second part of my approach involves the fact that Ecto.Query supports two different syntaxes. The more common "keyword syntax", and the less common macro/expression/pipe-based syntax that is designed for usage with the pipe operator. The last example in the Ecto.Query.preload/3 documentation shows how you can chain a Query.preload/3 onto an otherwise ordinary Query or Repo call. The power of this is in the fact that you can construct a query that knows it needs particular associations within a schema before you even know which row in the database you want to request.

The final piece of the puzzle is picking which function in your secondary context to extend with preloads. I use get! below, but obviously this will be at your discretion. Here is the final result:

defmodule Twitter.Social.Posts do
  alias Twitter.Social.Post
  import Ecto.Query

  def get!(id, preloads \\ []) do
    Post |> preload(^preloads) |> Repo.get!(id)
  end
end

Note that this function supports complex association requests given that it's simply a passthrough to the underlying preload API. Now you can write things like:

post = Posts.get!(3, [comments: :author])
post.comments |> Enum.at(0) |> &(&1.author.name).()

Conclusion

I wanted to share this approach because I thought it was an interesting solution to the preload dilemma I was experiencing. I hope this can help others that might be in a similar situation. Thanks for reading!