Multi-Column Index

πŸ‘¨β€πŸ’Ό As we've accumulated more and more users who have more and more notes, we've found that our query for the user list is getting slower and slower and using more and more memory and CPU. We need to optimize this query.
I need you to do some analysis on the query and determine what indexes we need to add to optimize it.
🐨 Before you go straight to making the one line code change in , please read through these instructions to do some analysis.
If you'd like to observe the performance issue, you can update the seed script to generate more users and more notes per user. This could make the seed script take a very very long time (several minutes). You should notice major slow downs by having 15000 users each with 200-300 notes. You can also comment out the images bit if you want it to run faster since images shouldn't affect this. If you do this, a query with no search term should take ~8 seconds.

Identify the problem query

If you run the query by opening , you should see the query that was executed in the terminal output. It should log something like this:
SELECT user.id, user.username, user.name, image.id AS imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE ?
OR user.name LIKE ?
ORDER BY (
        SELECT updatedAt
        FROM Note
        WHERE ownerId = user.id
        ORDER BY updatedAt DESC
        LIMIT 1
) DESC
LIMIT 50;
If you seeded your database with lots of data, you should also notice it took a long time for your data to come back. Yikes!

Analyze the problem query

The ? symbols in that query represent interpolated values. To use this query in the sqlite EXPLAIN QUERY PLAN command, we need to replace those with real values. So, let's run this query with some values:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id AS imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE "%kody%"
OR user.name LIKE "%kody%"
ORDER BY (
        SELECT updatedAt
        FROM Note
        WHERE ownerId = user.id
        ORDER BY updatedAt DESC
        LIMIT 1
) DESC
LIMIT 50;
That gives us this output:
QUERY PLAN
|--SCAN user
|--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
|--CORRELATED SCALAR SUBQUERY 1
|  |--SEARCH Note USING INDEX Note_ownerId_idx (ownerId=?)
|  `--USE TEMP B-TREE FOR ORDER BY
`--USE TEMP B-TREE FOR ORDER BY
Huh... Yeah, there are a couple things going on in there. Let's build up the query a bit at a time so we can talk about each bit.
🐨 Let's start with this:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name
FROM User AS user
LIMIT 50;
That should give you this:
QUERY PLAN
`--SCAN user
Remember what a SCAN with no index means? It says "read every record." But it's only happening because there's no WHERE or ORDER BY clause.
🐨 Add an ORDER BY clause:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name
FROM User AS user
ORDER BY user.username
LIMIT 50;
QUERY PLAN
`--SCAN user USING INDEX User_username_key
Great, we're using an index, so that'll be much more efficient.
🐨 Let's try it out with the name column (which is not indexed):
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name
FROM User AS user
ORDER BY user.name
LIMIT 50;
This gives us:
QUERY PLAN
|--SCAN user
`--USE TEMP B-TREE FOR ORDER BY
So here's something new. Because the user.name column is not indexed, the database has to scan the user without an index, but in order to sort the data by user.name, it has to store all the user's names in a temporary "B-TREE" data structure so it can do the sorting for the order by. This is going to eat up some memory for the space for the data structure, and CPU for performing the comparison.
But we're getting side-tracked. We're not ordering by the user's name. Let's get back on track.
🐨 We'll add back the ORDER BY user.username for a second so we can see what the LEFT JOIN does to our query:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
ORDER BY user.username
LIMIT 50;
And that gives us:
QUERY PLAN
`--SCAN user USING INDEX User_username_key
At first this may surprise you, but the optimizer has determined that it can skip the LEFT JOIN entirely because we're not referencing it anywhere else in the query.
🐨 Add image.id in the select
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
ORDER BY user.username
LIMIT 50;
That will give us:
QUERY PLAN
|--SCAN user USING INDEX User_username_key
`--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
Super, with that we get a "SEARCH" using an index on the foreign key, so that should be quick enough. This is because the ON predicate is on the indexed column.
🐨 Alrighty, let's add the WHERE clause with the LIKE in:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE '%kody%'
ORDER BY user.username
LIMIT 50;
QUERY PLAN
|--SCAN user USING INDEX User_username_key
`--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
Whoops! It is using an index for the ORDER BY, but it's not telling us that it's likely not using an index for the LIKE (according to the rules).
🐨 Let's add the OR for the user.name:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE '%kody%'
OR user.name LIKE '%kody%'
ORDER BY user.username
LIMIT 50;
QUERY PLAN
|--SCAN user USING INDEX User_username_key
`--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
It's still leveraging the index for the ORDER BY, but it's not telling us that it's likely not using an index for the LIKE.
🐨 Before we put the subquery in the ORDER BY, let's look at the subquery on its own:
EXPLAIN QUERY PLAN
SELECT updatedAt
FROM Note
WHERE ownerId = "some_id"
ORDER BY updatedAt DESC
LIMIT 1;
QUERY PLAN
|--SEARCH Note USING INDEX Note_ownerId_idx (ownerId=?)
`--USE TEMP B-TREE FOR ORDER BY
Here we are again with the TEMP B-TREE for the ORDER BY. This is because the updatedAt column is not indexed. So for every note for the user, it has to store the updatedAt in a temporary data structure so it can sort it to find the one that was most recently updated. Definitely something fishy here...
🐨 Now, let's put that query as a subquery in the ORDER BY:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE '%kody%'
OR user.name LIKE '%kody%'
ORDER BY (
  SELECT updatedAt
  FROM Note
  WHERE ownerId = user.id
  ORDER BY updatedAt DESC
  LIMIT 1
) DESC
LIMIT 50;
QUERY PLAN
|--SCAN user
|--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
|--CORRELATED SCALAR SUBQUERY 1
|  |--SEARCH Note USING INDEX Note_ownerId_idx (ownerId=?)
|  `--USE TEMP B-TREE FOR ORDER BY
`--USE TEMP B-TREE FOR ORDER BY
Now EXPLAIN is (finally) showing that we're scanning because there's no index in use on the user table (it was scanning the whole time, silly). But, it's now also showing a TEMP B-TREE for the ORDER BY, and because that's a subquery, that will happen for every user 😱. Very very bad.
We need to optimize the ORDER BY sub query, and then the whole query should run much faster. Because that query is against the Note table, the index we need will go on the Note model.

Identify the index

Remember the photo album metaphor? In that example, the first folder you want is the one where you're trying to filter things. That's often the column most frequently used in your queries. While individual indexes can certainly be beneficial for corresponding single-column queries, for multi-column queries, a well-designed composite index is often more efficient.
Generally, in a composite (multi-column) index, start with the column that is most frequently used in your WHERE clause and can be most utilized, then add frequently ORDER BY columns to the end of the index.
This often means you start with "bigger buckets" and get more specific as you go.
While these rules are not hard and fast, they are a good starting point. You should always test your queries to see if they're using the indexes you expect them to use and if they're performing well.
So in our case, we look at the WHERE and then the ORDER BY to determine our indexes. We're referencing the updatedAt in the ORDER BY and the ownerId in the WHERE. We need to combine these columns in a single index to optimize this query.
🐨 So, let's sort first by the ownerId and then by the updatedAt.
🐨 and add an index for the ownerId and updatedAt.
🐨 Now run:
npx prisma db push
Remember, this does not update your migration file. You'll want to do that when you're ready to commit to this. We're just testing things out for now.
If you'd like, you can execute this command in SQLite and it will show you all the indexes active in the database:
.indexes
That should give you something like this:
NoteImage_noteId_idx                   sqlite_autoindex_NoteImage_1
Note_ownerId_idx                       sqlite_autoindex_Note_1
Note_ownerId_updatedAt_idx             sqlite_autoindex_UserImage_1
UserImage_userId_key                   sqlite_autoindex_User_1
User_email_key                         sqlite_autoindex__prisma_migrations_1
User_username_key
Our new one is the one called Note_ownerId_updatedAt_idx!
🐨 Now, let's run the query plan again:
EXPLAIN QUERY PLAN
SELECT user.id, user.username, user.name, image.id as imageId
FROM User AS user
LEFT JOIN UserImage AS image ON user.id = image.userId
WHERE user.username LIKE '%kody%'
OR user.name LIKE '%kody%'
ORDER BY (
   SELECT updatedAt
   FROM Note
   WHERE ownerId = user.id
   ORDER BY updatedAt DESC
   LIMIT 1
) DESC
LIMIT 50;
QUERY PLAN
|--SCAN user
|--SEARCH image USING INDEX UserImage_userId_key (userId=?) LEFT-JOIN
|--CORRELATED SCALAR SUBQUERY 1
|  `--SEARCH Note USING COVERING INDEX Note_ownerId_updatedAt_idx (ownerId=?)
`--USE TEMP B-TREE FOR ORDER BY
Sweet! We've just eliminated the TEMP B-TREE on the Note table! And now our Note Search is using our new index (learn more about what "covering index means" below). This is a huge win because the query doesn't have to read every note for every user!
Note we do still have a non-indexed scan for the user, but based on the requirements of a wildcard prefix in the LIKE, an index won't help anyway.
Incidental Covering Index
If you're curious, we just happen to have what's known as a "covering index" here, which means SQLite doesn't even have to read any records from the Note table. It can just read the index and get the data it needs. This is because all the columns used by the query are covered by the index and is highly efficient. You can learn more about this from the SQLite Query Planner docs
🐨 Now go ahead and try searching users again:
Checking our logs, we're getting about 30ms per query when there's no search term, and just a couple milliseconds when there is a search term. Going from 8 seconds to 30ms is over 250x faster! Nice!

Please set the playground first

Loading "Multi-Column Index"
Loading "Multi-Column Index"
Login to get access to the exclusive discord channel.
  • general
    Welcome to EpicWeb.dev! Say Hello πŸ‘‹
    Kent C. Dodds β—† πŸš€πŸ†πŸŒŒβš‘:
    This is the first post of many hopefully!
    • 18
    86 Β· 2 years ago
  • general
    Modals / Dialogs
    Lucas Wargha πŸš€ 🌌:
    It seems like modals and dialogs are becoming a hot topic on my team lately. I haven’t found a solid...
    • βœ…1
    3 Β· 4 months ago
  • general
    epic stack website initial load at home page is unstyled (sometimes)
    osmancakir πŸš€ 🌌:
    Sometimes (especially when it is loaded first time on a new browser etc.) I see this unstyled versio...
    • βœ…1
    10 Β· 7 months ago
  • general
    Resource / Api endpoints on epic stack / RR7
    Lucas Wargha πŸš€ 🌌:
    Hi everyone! Quick question for those using the Epic Stack: How are you handling resource routes ...
    • βœ…1
    2 Β· 6 months ago
  • general
    Epic stack using tanstack form
    Lucas Wargha πŸš€ 🌌:
    https://github.com/epicweb-dev/epic-stack/compare/epicweb-dev:main...wargha:feature/tanstack-form-ex...
    • βœ…1
    3 Β· 6 months ago
  • general
    Init command outdated on the EpicWeb website
    Virgile πŸ† 🌌:
    Hi everyone. I've initialized a new epic-stack project yesterday. Following instructions from http...
    • βœ…1
    3 Β· 6 months ago
  • general
    Mark as complete, resets the first time you click it.
    Daniel V.C πŸš€ 🌌:
    Not sure if anyone else has had this issue, as i've not seen anyone else talk about it, but I find ...
    • βœ…1
    8 Β· 7 months ago
  • πŸ’Ύdata
    general
    πŸ“forms
    πŸ”­foundations
    double underscore?
    trendaaang 🌌:
    What with the `__note-editor.tsx`? I don't see that in the Remix docs and I don't remember Kent talk...
    • βœ…1
    2 Β· a year ago
  • general
    Keeping Epic Stack Projects Free on Fly – Any Tips?
    Lucas Wargha πŸš€ 🌌:
    I’ve been experimenting with the Epic Stack and deploying some dummy projects on Fly. I noticed that...
    • βœ…1
    0 Β· 7 months ago
  • πŸ’Ύdata
    general
    πŸ“forms
    πŸ”­foundations
    Creating Notes
    Scott 🌌 πŸ†:
    Does anybody know in what workshop we create notes? I would like to see the routing structure. So fa...
    • βœ…1
    2 Β· 9 months ago
  • πŸ’Ύdata
    Prisma extension not showing my db
    oaguinaga 🌌 πŸš€:
    I noticed that in the data modeling module, Kent can visualize Prisma tables directly inside the edi...
    • βœ…1
    2 Β· 9 months ago
  • πŸ”­foundations
    πŸ’Ύdata
    general
    πŸ“forms
    πŸ”auth
    Thank you for the inspiration
    Binalfew πŸš€ 🌌 ⚑:
    <@105755735731781632> I wanted to thank you for the incredible knowledge I gained from your Epic Web...
    • ❀️1
    1 Β· 9 months ago
  • πŸ’Ύdata
    How often do you need to run a migration?
    Scott 🌌 πŸ†:
    I just finished the Data Migrations workshop (I come from a No-SQL background). Let's say I add a n...
    • βœ…1
    1 Β· a year ago
  • general
    npm install everytime I setup a new playground
    Duki 🌌:
    Is it normal that I have to run `npm install` in my playground directory, everytime I setup the play...
    • βœ…1
    2 Β· a year ago
  • πŸ’Ύdata
    Code snippets in workshop app
    remich 🌌:
    What is used to include the code snippets in the workshop app? I'd like to have syntax-highlighted P...
    • βœ…1
    1 Β· a year ago
  • πŸ’Ύdata
    a very large insert
    osmancakir πŸš€ 🌌:
    hi all, I need to perform seeding to my deployment. I followed the steps in the database documentati...
    • βœ…1
    2 Β· a year ago
  • πŸ’Ύdata
    πŸ“forms
    πŸ”­foundations
    Reviewing foundations, Mutations, Actions
    silvanet πŸš€ 🌌 ⚑:
    Forgive me for this. I went over the file size limit. I don't want to sign up for being able to exce...
    • βœ…1
    2 Β· a year ago
  • general
    Migration to Vite: Server-only module referenced by client
    Fabian 🌌:
    Hi, I'm working on migrating to Vite following the remix docs (https://remix.run/docs/en/main/guides...
    • βœ…1
    1 Β· a year ago
  • πŸ’Ύdata
    Undesired behaviors while running the data-modeling workshop
    juliano.brasil 🌌:
    After pulling the latest updates I started experiencing 2 undesired behaviors while running the wor...
    • βœ…1
    2 Β· a year ago
  • πŸ’Ύdata
    πŸ“forms
    Getting a TS error that is not present in the course files
    OtterlyPunk:
    So I'm working in parallel and I'm feeling the problem is I'm using a new version of something in my...
    • βœ…2
    12 Β· 2 years ago
  • general
    Remix Vite Plugin
    Binalfew πŸš€ 🌌 ⚑:
    <@105755735731781632> Now that remix officially supports vite (though not stable) what does it mean...
    • βœ…1
    3 Β· 2 years ago
  • πŸ’Ύdata
    Prisma Typed SQL
    Kent C. Dodds β—† πŸš€πŸ†πŸŒŒβš‘:
    Hey <@&1123674184959148043> friends, check out the new Typed SQL feature in Prisma! I've added an ex...
    • 2
    • 2
    0 Β· a year ago
  • general
    πŸ”­foundations
    Solutions video on localhost:5639 ?
    quang πŸš€ 🌌:
    Hi, so I'm having a hard time navigating (hopefully will be better with time) The nav on epicweb.de...
    • βœ…1
    9 Β· 2 years ago
  • πŸ’Ύdata
    Multi-Column Index - How does it eliminate the B-tree?
    Darren πŸš€ 🌌:
    So I just finished the Query Optimization - 02 Multi-Column Index lesson and was confused by the fol...
    • βœ…1
    2 Β· a year ago
  • general
    Epicshop is now social and mobile friendly!
    Kent C. Dodds β—† πŸš€πŸ†πŸŒŒβš‘:
    I'm excited to announce that now the Epic Web workshops are mobile friendly! https://foundations.ep...
    • πŸŽ‰2
    0 Β· a year ago