Query Optimization
Loading "Intro to Query Optimization"
Run locally for transcripts
An enormous number of performance optimizations in web applications can be best
accomplished by optimizing database queries. There are many stories of
situations where companies will spend millions of dollars scaling hardware to
ridiculous specifications and numbers. Or adding multiple levels of caching. And
adding a bunch of UI logic to improve the perceived performance for users who
have to wait on the database queries (kinda like what Disneyland does by adding
entertainment to the ride lines). All in the effort to reducing the impact of
performance problems.
And in many of these stories, a highly paid consultant will come in and perform
some of the analysis you're about to learn, add a handful of database indexes
(sometimes even one is enough) and take the query from a minute or two to a
couple milliseconds. Then the consultant charges a ton of money and moves on to
the next one.
This is not hyperbole. This is real life. Of course you always want to make your
UI have a nice pending experience (no amount of database optimization will
control your user's network speed), but you can certainly make a big impact on
your database queries by adding thoughtful optimizations in the form of indexes
and optimized queries.
On the subject of optimized queries, it's outside the scope of this workshop to
go further into SQL than we already have. Most web developers won't need to know
more about SQL than what we've already covered. But you should know that you
definitely can go deeper if you need to.
We're going to focus on indexing.
Database indexes Metaphor
Here's a useful metaphor for database indexes I've got from
Tyler Benfield.
Imagine a "Photos" directory on your computer. At first, as you started taking
photos, you just put them all in there with no sorting on whatsoever. Filenames
are auto-generated (via
cuid()
or similar) so sorting by the filename doesn't
even help.At first, this is fine. When you want to find photos from your trip to Rome last
week, it takes a little bit of searching but it's not a big deal.
But then you start traveling more. You go to Salt Lake City, and then Madrid,
then Lagos, then Goa, and more and over the course of a year you've traveled
many places and added many photos.
Let's also add that your memory is very bad (or maybe just hyper-optimized?) so
you have no way to remember when you went to these different places or if you
even went at all. Every time you look at the photos directory you're starting
from scratch.
Now your lack of organization is very painful. Your friends lose interest before
you are able to find the photo of you standing on the Great Wall of China. So
you decide something must be done.
Luckily, you have meta data associated to the photos so you start to sort them
into categories. The most sensible category you can think of is to sort them by
the month they were taken. So you make folders for
2023-06
, 2023-07
, etc.In our example, instead of moving the files to those folders, you create
symlinks to the files. So the files don't move to the folders. They're just easy
to find from the folders.
So the next time someone asks you what you did last summer, you're able to find
those photos very quickly.
This is an example of an index on a column.
Then one day a friend asks you whether you've ever been to the Alps. You're
right back in the same place you were as if there were no folders (index) at
all. So you add another set of folders with file symlinks for the location. Now
you can much more easily find the photo of you at any location (like your trip
to the San Diego Zoo).
This is a second index on the same table. You can have more than one.
But you continue traveling and sometimes you go to the same place more than
once. One day you're looking for pictures of a trip to Amsterdam in 2022. Which
folder will you look in?
So you decide to create new folders that combine both the location and the date.
2022-04/Amsterdam
, 2023-07/Amsterdam
, etc. But after thinking about it a bit
you find yourself searching for photos by location first and then sorting by
date, so you change it to Amsterdam/2022-04
, Amsterdam/2023-07
, etc.This is called a composite (or multi-column) index
An index on a database column is extra data (managed by the database) that the
database can reference when performing queries rather than looking at the
individual records themselves. It has massive potential to speed up your queries
by a huge margin.
Things to consider
Just like in our photos example, when you create an index, there are two things
you should think about:
- It takes more space
- It takes more time to add new records
In a typical web application you almost always will want to optimize for
speed of reads over storage cost and speed of writes. This is because most web
applications are read-heavy. That is, there are many more reads than writes.
And storage costs are relatively minimal.
Regarding storage space, there are many factors that go into the storage space
taken up by an index, but most of the time it will be less than the size of the
column it's indexing. But that's a good metric to measure the size of your index
against. For multi-column indexes, you may think it's the size of the sum of the
two columns, but it's likely going to be quite a bit less (just like in our
example, the sub-folders didn't need to have the parent folder name in them).
Additionally, users are typically more forgiving of a "save" operation taking a
bit more time than a "search" operation.
Oh, and it turns out that adding an index can sometimes speed up writes as well.
Consider:
UPDATE user SET name = 'Alice' WHERE username = 'alicerocks';
If you didn't have an index on the
username
field, the database would have to
look at every single record in the users
table to find the one with the
username
of alicerocks
. But if you have an index on the username
field,
the database can look up the record very quickly and then update it.In web applications, you should default to adding indexes that speed up
queries and only if it becomes a performance problem should you consider
removing them.
Things to index by default
By default, Prisma indexes the unique fields of each table (like the primary key
field
id
). This is great because we very often look up a record by its ID. But
there are other things we should consider indexing by default as well.Foreign keys are a great example. If you have a
User
table and a Post
table
and each post has a userId
column, you should index the userId
column on the
Post
table. This will make it much faster to find all the posts for a given
user.By default, you should probably index foreign keys for fields which are not
unique.
Some databases (MySQL) will index foreign keys by default, but SQLite, Postgres,
and SQL Server do not. Prisma strives to be consistent with our database so
that's why it doesn't index foreign keys by default, but it's more often than
not a good choice.
Prisma automatically creates an index on fields that have
@unique
on them,
so if your foreign key has @unique
, then you don't need to add an index for
that one.Another thing to consider indexing, is anything that appears in your
WHERE
clause or ORDER BY
clause. If you're filtering or sorting by a column, you
can probably speed up the query quite a bit by adding an index on that column.
And in some cases you'll want to add a multi-column index (for the "find by X,
then sort those by Y scenario").Identifying index opportunities
Unfortunately this is more of an art than a science (otherwise databases and
ORMs would just do it for us). But there are some things you can do to identify
why a particular query is slow.
Watch out for memory or CPU spikes. When the database is doing a full table
scan, it's going to be using a lot of memory and CPU. If you see spikes in
either of those, it's a good sign that you need to add an index in one of your
queries.
One way to find and understand optimization opportunities is to use
EXPLAIN QUERY PLAN
to see what the database plans to do for the query. This
will show you what indexes it's using (if any at all). Things to watch out for
in the output are "SCAN" without an index. This means the database is going to
look at every single record in the table. For tables with very few records this
doesn't matter, but for tables with millions of records, this can be very slow.Here's a simple example:
EXPLAIN QUERY PLAN SELECT * FROM user WHERE name = 'Alice';
Here the
name
is not indexed so the database is going to do a full table scan:QUERY PLAN
`--SCAN user
EXPLAIN QUERY PLAN SELECT * FROM user WHERE username = 'alicerocks';
Here, the
username
is unique and therefore automatically indexed by Prisma's
migrate command, so the database is able to do a quick lookup using the index
Prisma made for us:QUERY PLAN
`--SEARCH user USING INDEX User_username_key (username=?)
We'll go deeper on this analysis in the exercise.
Indexing in Prisma
As mentioned, Prisma will automatically add an index to your migration for
unique fields. If you check the migrations sql file you'll find code for them at
the bottom:
-- CreateIndex
CREATE UNIQUE INDEX "User_email_key" ON "User"("email");
-- CreateIndex
CREATE UNIQUE INDEX "User_username_key" ON "User"("username");
-- CreateIndex
CREATE UNIQUE INDEX "UserImage_userId_key" ON "UserImage"("userId");
When you want to manually add an index, you can use the
@@index
attribute:model Starport {
id String @id @default(cuid())
name String
locationId String
location Location @relation(fields: [locationId], references: [id])
@@index([locationId])
}
This will add an index on the
locationId
foreign key.If you want to add a multi-column index, you can do that as well:
model Starport {
id String @id @default(cuid())
name String
locationId String
location Location @relation(fields: [locationId], references: [id])
@@index([locationId, name])
}
This will add an index on the
locationId
foreign key and the name
field.
This type of index would be useful if you wanted to look up star ports by their
locationId and then sort them by name.You also have options to name the index, and the sort order for the index as
well. You can read more about that in
the Prisma docs.
Running SQL commands
So far, we've only run SQL commands through Prisma. You can continue to do this
if you like. For example, you could create a new file:
import { PrismaClient } from '@prisma/client'
const prisma = new PrismaClient()
const result = await prisma.$queryRaw`
EXPLAIN QUERY PLAN
SELECT * from note where ownerId = 1
`
console.log(result)
This will work just fine.
However, if you'd like to run SQL commands directly against the database, you
will need to
download and install the
sqlite3
CLI.I personally use Homebrew for doing this on macOS. If
you're on Windows, please make a suggestion here for how to do this!
Installation on Ubuntu & Debian distributions:
sudo apt-get update
sudo apt-get install sqlite3
Check version to confirm installation:
sqlite3 --version
Once you have that installed, then you can run
sqlite3
from the command line:sqlite3 ./prisma/data.db
sqlite> EXPLAIN QUERY PLAN SELECT * from note where ownerId = 1;
Conclusion
There's much more to indexes and optimizations (like SQLite's
covering index feature) and
understanding SQLite's
query optimizer. But for most of you
building regular web apps, you don't need to go quite that deep unless you
really want to.