Multi-Column Index

👨‍💼 As we've accumulated more and more users who have more and more notes, we've found that our query for the user list is getting slower and slower and using more and more memory and CPU. We need to optimize this query.
I need you to do some analysis on the query and determine what indexes we need to add to optimize it.
🐨 Before you go straight to making the one line code change in , please read through these instructions to do some analysis.
If you'd like to observe the performance issue, you can update the seed script to generate more users and more notes per user. This could make the seed script take a very very long time (several minutes). You should notice major slow downs by having 15000 users each with 200-300 notes. You can also comment out the images bit if you want it to run faster since images shouldn't affect this. If you do this, a query with no search term should take ~8 seconds.

Identify the problem query

If you run the query by opening , you should see the query that was executed in the terminal output. It should log something like this:
SELECT user.id, user.username, user.name, image.id AS imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE ?
OR user.name LIKE ?
ORDER BY (
        SELECT updatedAt
        FROM Note
        WHERE ownerId = user.id
        ORDER BY updatedAt DESC
        LIMIT 1
) DESC
LIMIT 50;
If you seeded your database with lots of data, you should also notice it took a long time for your data to come back. Yikes!

Analyze the problem query

The ? symbols in that query represent interpolated values. To use this query in the sqlite EXPLAIN QUERY PLAN command, we need to replace those with real values. So, let's run this query with some values:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id AS imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE "%kody%"
OR user.name LIKE "%kody%"
ORDER BY (
        SELECT updatedAt
        FROM Note
        WHERE ownerId = user.id
        ORDER BY updatedAt DESC
        LIMIT 1
) DESC
LIMIT 50;
That gives us this output:
QUERY PLAN
|--SCAN user
|--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
|--CORRELATED SCALAR SUBQUERY 1
|  |--SEARCH Note USING INDEX Note_ownerId_idx (ownerId=?)
|  `--USE TEMP B-TREE FOR ORDER BY
`--USE TEMP B-TREE FOR ORDER BY
Huh... Yeah, there are a couple things going on in there. Let's build up the query a bit at a time so we can talk about each bit.
🐨 Let's start with this:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name
FROM User AS user
LIMIT 50;
That should give you this:
QUERY PLAN
`--SCAN user
Remember what a SCAN with no index means? It says "read every record." But it's only happening because there's no WHERE or ORDER BY clause.
🐨 Add an ORDER BY clause:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name
FROM User AS user
ORDER BY user.username
LIMIT 50;
QUERY PLAN
`--SCAN user USING INDEX User_username_key
Great, we're using an index, so that'll be much more efficient.
🐨 Let's try it out with the name column (which is not indexed):
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name
FROM User AS user
ORDER BY user.name
LIMIT 50;
This gives us:
QUERY PLAN
|--SCAN user
`--USE TEMP B-TREE FOR ORDER BY
So here's something new. Because the user.name column is not indexed, the database has to scan the user without an index, but in order to sort the data by user.name, it has to store all the user's names in a temporary "B-TREE" data structure so it can do the sorting for the order by. This is going to eat up some memory for the space for the data structure, and CPU for performing the comparison.
But we're getting side-tracked. We're not ordering by the user's name. Let's get back on track.
🐨 We'll add back the ORDER BY user.username for a second so we can see what the LEFT JOIN does to our query:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
ORDER BY user.username
LIMIT 50;
And that gives us:
QUERY PLAN
`--SCAN user USING INDEX User_username_key
At first this may surprise you, but the optimizer has determined that it can skip the LEFT JOIN entirely because we're not referencing it anywhere else in the query.
🐨 Add image.id in the select
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
ORDER BY user.username
LIMIT 50;
That will give us:
QUERY PLAN
|--SCAN user USING INDEX User_username_key
`--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
Super, with that we get a "SEARCH" using an index on the foreign key, so that should be quick enough. This is because the ON predicate is on the indexed column.
🐨 Alrighty, let's add the WHERE clause with the LIKE in:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE '%kody%'
ORDER BY user.username
LIMIT 50;
QUERY PLAN
|--SCAN user USING INDEX User_username_key
`--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
Whoops! It is using an index for the ORDER BY, but it's not telling us that it's likely not using an index for the LIKE (according to the rules).
🐨 Let's add the OR for the user.name:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE '%kody%'
OR user.name LIKE '%kody%'
ORDER BY user.username
LIMIT 50;
QUERY PLAN
|--SCAN user USING INDEX User_username_key
`--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
It's still leveraging the index for the ORDER BY, but it's not telling us that it's likely not using an index for the LIKE.
🐨 Before we put the subquery in the ORDER BY, let's look at the subquery on its own:
EXPLAIN QUERY PLAN
SELECT updatedAt
FROM Note
WHERE ownerId = "some_id"
ORDER BY updatedAt DESC
LIMIT 1;
QUERY PLAN
|--SEARCH Note USING INDEX Note_ownerId_idx (ownerId=?)
`--USE TEMP B-TREE FOR ORDER BY
Here we are again with the TEMP B-TREE for the ORDER BY. This is because the updatedAt column is not indexed. So for every note for the user, it has to store the updatedAt in a temporary data structure so it can sort it to find the one that was most recently updated. Definitely something fishy here...
🐨 Now, let's put that query as a subquery in the ORDER BY:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE '%kody%'
OR user.name LIKE '%kody%'
ORDER BY (
  SELECT updatedAt
  FROM Note
  WHERE ownerId = user.id
  ORDER BY updatedAt DESC
  LIMIT 1
) DESC
LIMIT 50;
QUERY PLAN
|--SCAN user
|--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
|--CORRELATED SCALAR SUBQUERY 1
|  |--SEARCH Note USING INDEX Note_ownerId_idx (ownerId=?)
|  `--USE TEMP B-TREE FOR ORDER BY
`--USE TEMP B-TREE FOR ORDER BY
Now EXPLAIN is (finally) showing that we're scanning because there's no index in use on the user table (it was scanning the whole time, silly). But, it's now also showing a TEMP B-TREE for the ORDER BY, and because that's a subquery, that will happen for every user 😱. Very very bad.
We need to optimize the ORDER BY sub query, and then the whole query should run much faster. Because that query is against the Note table, the index we need will go on the Note model.

Identify the index

Remember the photo album metaphor? In that example, the first folder you want is the one where you're trying to filter things. That's often the column most frequently used in your queries. While individual indexes can certainly be beneficial for corresponding single-column queries, for multi-column queries, a well-designed composite index is often more efficient.
Generally, in a composite (multi-column) index, start with the column that is most frequently used in your WHERE clause and can be most utilized, then add frequently ORDER BY columns to the end of the index.
This often means you start with "bigger buckets" and get more specific as you go.
While these rules are not hard and fast, they are a good starting point. You should always test your queries to see if they're using the indexes you expect them to use and if they're performing well.
So in our case, we look at the WHERE and then the ORDER BY to determine our indexes. We're referencing the updatedAt in the ORDER BY and the ownerId in the WHERE. We need to combine these columns in a single index to optimize this query.
🐨 So, let's sort first by the ownerId and then by the updatedAt.
🐨 and add an index for the ownerId and updatedAt.
🐨 Now run:
npx prisma db push
Remember, this does not update your migration file. You'll want to do that when you're ready to commit to this. We're just testing things out for now.
If you'd like, you can execute this command in SQLite and it will show you all the indexes active in the database:
.indexes
That should give you something like this:
NoteImage_noteId_idx                   sqlite_autoindex_NoteImage_1
Note_ownerId_idx                       sqlite_autoindex_Note_1
Note_ownerId_updatedAt_idx             sqlite_autoindex_UserImage_1
UserImage_userId_key                   sqlite_autoindex_User_1
User_email_key                         sqlite_autoindex__prisma_migrations_1
User_username_key
Our new one is the one called Note_ownerId_updatedAt_idx!
🐨 Now, let's run the query plan again:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE '%kody%'
OR user.name LIKE '%kody%'
ORDER BY (
   SELECT updatedAt
   FROM Note
   WHERE ownerId = user.id
   ORDER BY updatedAt DESC
   LIMIT 1
) DESC
LIMIT 50;
QUERY PLAN
|--SCAN user
|--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
|--CORRELATED SCALAR SUBQUERY 1
|  `--SEARCH Note USING COVERING INDEX Note_ownerId_updatedAt_idx (ownerId=?)
`--USE TEMP B-TREE FOR ORDER BY
Sweet! We've just eliminated the TEMP B-TREE on the Note table! And now our Note Search is using our new index (learn more about what "covering index means" below). This is a huge win because the query doesn't have to read every note for every user!
Note we do still have a non-indexed scan for the user, but based on the requirements of a wildcard prefix in the LIKE, an index won't help anyway.
Incidental Covering Index
If you're curious, we just happen to have what's known as a "covering index" here, which means SQLite doesn't even have to read any records from the Note table. It can just read the index and get the data it needs. This is because all the columns used by the query are covered by the index and is highly efficient. You can learn more about this from the SQLite Query Planner docs
🐨 Now go ahead and try searching users again:
Checking our logs, we're getting about 30ms per query when there's no search term, and just a couple milliseconds when there is a search term. Going from 8 seconds to 30ms is over 250x faster! Nice!
Login to get access to the exclusive discord channel.
  • 💾data
    How often do you need to run a migration?
    Scott 🌌:
    I just finished the Data Migrations workshop (I come from a No-SQL background). Let's say I add a n...
    • 1
    1 · 3 days ago
  • general
    npm install everytime I setup a new playground
    Duki 🌌:
    Is it normal that I have to run `npm install` in my playground directory, everytime I setup the play...
    • 1
    2 · 14 days ago
  • general
    Welcome to EpicWeb.dev! Say Hello 👋
    Kent C. Dodds ◆ 🚀🏆🌌:
    This is the first post of many hopefully!
    • 18
    80 · 6 days ago
  • 💾data
    Code snippets in workshop app
    remich 🌌:
    What is used to include the code snippets in the workshop app? I'd like to have syntax-highlighted P...
    • 1
    1 · a month ago
  • 💾data
    a very large insert
    osmancakir 🚀 🌌:
    hi all, I need to perform seeding to my deployment. I followed the steps in the database documentati...
    • 1
    2 · 6 months ago
  • 💾data
    📝forms
    🔭foundations
    Reviewing foundations, Mutations, Actions
    silvanet 🚀 🌌:
    Forgive me for this. I went over the file size limit. I don't want to sign up for being able to exce...
    • 1
    2 · 6 months ago
  • general
    Migration to Vite: Server-only module referenced by client
    Fabian 🌌:
    Hi, I'm working on migrating to Vite following the remix docs (https://remix.run/docs/en/main/guides...
    • 1
    1 · 3 months ago
  • 💾data
    Undesired behaviors while running the data-modeling workshop
    juliano.brasil 🌌:
    After pulling the latest updates I started experiencing 2 undesired behaviors while running the wor...
    • 1
    2 · 5 months ago
  • 💾data
    📝forms
    Getting a TS error that is not present in the course files
    OtterlyPunk:
    So I'm working in parallel and I'm feeling the problem is I'm using a new version of something in my...
    • 2
    12 · 8 months ago
  • general
    Remix Vite Plugin
    Binalfew 🚀 🌌:
    <@105755735731781632> Now that remix officially supports vite (though not stable) what does it mean...
    • 1
    3 · a year ago
  • 💾data
    Prisma Typed SQL
    Kent C. Dodds ◆ 🚀🏆🌌:
    Hey <@&1123674184959148043> friends, check out the new Typed SQL feature in Prisma! I've added an ex...
    • 2
    • 2
    0 · 3 months ago
  • general
    🔭foundations
    Solutions video on localhost:5639 ?
    quang 🚀 🌌:
    Hi, so I'm having a hard time navigating (hopefully will be better with time) The nav on epicweb.de...
    • 1
    9 · a year ago
  • 💾data
    Multi-Column Index - How does it eliminate the B-tree?
    Darren 🚀 🌌:
    So I just finished the Query Optimization - 02 Multi-Column Index lesson and was confused by the fol...
    • 1
    2 · 4 months ago
  • general
    Epicshop is now social and mobile friendly!
    Kent C. Dodds ◆ 🚀🏆🌌:
    I'm excited to announce that now the Epic Web workshops are mobile friendly! https://foundations.ep...
    • 🎉2
    0 · 5 months ago
  • 💾data
    📝forms
    🔭foundations
    How can I do this?
    silvanet 🚀 🌌:
    Viewing the Intro (from the Workshop) for Mutations, the course has an embedded video where Kent exp...
    • 1
    3 · 6 months ago
  • 💾data
    Query optimization for queries generated by prisma
    Allan 🌌:
    Hi everyone! I finished the data modeling workshop this week and was inspired to do some query optim...
    • 1
    2 · 6 months ago
  • 💾data
    general
    📝forms
    🔭foundations
    double underscore?
    trendaaang 🌌:
    What with the `__note-editor.tsx`? I don't see that in the Remix docs and I don't remember Kent talk...
    • 1
    2 · 6 months ago
  • 🔐auth
    💾data
    08. ROLE-BASED ACCESS / 02. ROLES SEED - migration to local vs production
    Fabian 🌌:
    So I'm in a bit over my head with this one, in particular with how migration works in a local env vs...
    • 1
    1 · 6 months ago
  • 💾data
    data modeling 06/02 is missing part of the prompt
    Allan 🌌:
    In Data Modeling 06/02 the files include `server/index.ts` but there are no guiding emojis, no promp...
    • 1
    2 · 6 months ago
  • 💾data
    Why does TS not like Buffer being used when the return type of the blob appears to be Buffer?
    OtterlyPunk:
    I'm looking at the return types from Prisma and it appears that the `petImage.blob` is a `Buffer` ty...
    • 2
    5 · 7 months ago
  • 💾data
    🔐auth
    Prisma batching in the Require Authorized (07/03/solution)
    ajara 🌌 🚀:
    When I did the `requireUser` function in `auth.server.ts` I thought about using `requireUserId` func...
    • 1
    4 · 7 months ago
  • 💾data
    🔭foundations
    Handle Missing Data - Error when i'm using invariantResponse
    Luan.ibarra 🌌:
    Hello, I'm trying to use the invariantResponse utility from // "#app/utils/misc.ts" to do this in a...
    • 2
    2 · 7 months ago
  • 💾data
    Data Modeling: a real world scenario
    trendaaang 🌌:
    Kent, In the data modeling workshop, you mentioned that polymorphism is generally a bad idea in dat...
    • 1
    3 · 7 months ago
  • 🔭foundations
    💾data
    general
    📝forms
    🔐auth
    Native Logging
    trendaaang 🌌:
    I was thinking that it could be useful to log every CRUD operation to help track down errors. Is tha...
    • 1
    6 · 7 months ago
  • 💾data
    📝forms
    Issues with child routes when loading a modal with actions (server/client)
    OtterlyPunk:
    I've set up a mini-project to go through the first set of videos and I'm having issues with the form...
    • 1
    7 · 8 months ago
  • general
    The video play is pretty laggy currently
    QzCurious 🌌:
    I thought I should tag you for this <@105755735731781632>. Please take a look if something wrong.
    • 2
    9 · 8 months ago
  • general
    New Workshop Scheduled
    Kent C. Dodds ◆ 🚀🏆🌌:
    Hey Epic Web devs! I wanted to let you know before everyone else on here: https://www.epicweb.dev/ev...
    • 2
    0 · 8 months ago