Query Optimization

An enormous number of performance optimizations in web applications can be best accomplished by optimizing database queries. There are many stories of situations where companies will spend millions of dollars scaling hardware to ridiculous specifications and numbers. Or adding multiple levels of caching. And adding a bunch of UI logic to improve the perceived performance for users who have to wait on the database queries (kinda like what Disneyland does by adding entertainment to the ride lines). All in the effort to reducing the impact of performance problems.

And in many of these stories, a highly paid consultant will come in and perform some of the analysis you're about to learn, add a handful of database indexes (sometimes even one is enough) and take the query from a minute or two to a couple milliseconds. Then the consultant charges a ton of money and moves on to the next one.

This is not hyperbole. This is real life. Of course you always want to make your UI have a nice pending experience (no amount of database optimization will control your user's network speed), but you can certainly make a big impact on your database queries by adding thoughtful optimizations in the form of indexes and optimized queries.

On the subject of optimized queries, it's outside the scope of this workshop to go further into SQL than we already have. Most web developers won't need to know more about SQL than what we've already covered. But you should know that you definitely can go deeper if you need to.

We're going to focus on indexing.

Database indexes Metaphor

Here's a useful metaphor for database indexes I've got from Tyler Benfield.

Imagine a "Photos" directory on your computer. At first, as you started taking photos, you just put them all in there with no sorting on whatsoever. Filenames are auto-generated (via cuid() or similar) so sorting by the filename doesn't even help.

At first, this is fine. When you want to find photos from your trip to Rome last week, it takes a little bit of searching but it's not a big deal.

But then you start traveling more. You go to Salt Lake City, and then Madrid, then Lagos, then Goa, and more and over the course of a year you've traveled many places and added many photos.

Let's also add that your memory is very bad (or maybe just hyper-optimized?) so you have no way to remember when you went to these different places or if you even went at all. Every time you look at the photos directory you're starting from scratch.

Now your lack of organization is very painful. Your friends lose interest before you are able to find the photo of you standing on the Great Wall of China. So you decide something must be done.

Luckily, you have meta data associated to the photos so you start to sort them into categories. The most sensible category you can think of is to sort them by the month they were taken. So you make folders for 2023-06, 2023-07, etc.

In our example, instead of moving the files to those folders, you create symlinks to the files. So the files don't move to the folders. They're just easy to find from the folders.

So the next time someone asks you what you did last summer, you're able to find those photos very quickly.

This is an example of an index on a column.

Then one day a friend asks you whether you've ever been to the Alps. You're right back in the same place you were as if there were no folders (index) at all. So you add another set of folders with file symlinks for the location. Now you can much more easily find the photo of you at any location (like your trip to the San Diego Zoo).

This is a second index on the same table. You can have more than one.

But you continue traveling and sometimes you go to the same place more than once. One day you're looking for pictures of a trip to Amsterdam in 2022. Which folder will you look in?

So you decide to create new folders that combine both the location and the date. 2022-04/Amsterdam, 2023-07/Amsterdam, etc. But after thinking about it a bit you find yourself searching for photos by location first and then sorting by date, so you change it to Amsterdam/2022-04, Amsterdam/2023-07, etc.

This is called a composite (or multi-column) index

An index on a database column is extra data (managed by the database) that the database can reference when performing queries rather than looking at the individual records themselves. It has massive potential to speed up your queries by a huge margin.

Things to consider

Just like in our photos example, when you create an index, there are two things you should think about:

It takes more space
It takes more time to add new records

In a typical web application you almost always will want to optimize for speed of reads over storage cost and speed of writes. This is because most web applications are read-heavy. That is, there are many more reads than writes. And storage costs are relatively minimal.

Regarding storage space, there are many factors that go into the storage space taken up by an index, but most of the time it will be less than the size of the column it's indexing. But that's a good metric to measure the size of your index against. For multi-column indexes, you may think it's the size of the sum of the two columns, but it's likely going to be quite a bit less (just like in our example, the sub-folders didn't need to have the parent folder name in them).

Additionally, users are typically more forgiving of a "save" operation taking a bit more time than a "search" operation.

Oh, and it turns out that adding an index can sometimes speed up writes as well. Consider:

UPDATE user SET name = 'Alice' WHERE username = 'alicerocks';

If you didn't have an index on the username field, the database would have to look at every single record in the users table to find the one with the username of alicerocks. But if you have an index on the username field, the database can look up the record very quickly and then update it.

In web applications, you should default to adding indexes that speed up queries and only if it becomes a performance problem should you consider removing them.

Things to index by default

By default, Prisma indexes the unique fields of each table (like the primary key field id). This is great because we very often look up a record by its ID. But there are other things we should consider indexing by default as well.

Foreign keys are a great example. If you have a User table and a Post table and each post has a userId column, you should index the userId column on the Post table. This will make it much faster to find all the posts for a given user.

By default, you should probably index foreign keys for fields which are not unique.

Some databases (MySQL) will index foreign keys by default, but SQLite, Postgres, and SQL Server do not. Prisma strives to be consistent with our database so that's why it doesn't index foreign keys by default, but it's more often than not a good choice.

Prisma automatically creates an index on fields that have @unique on them, so if your foreign key has @unique, then you don't need to add an index for that one.

Another thing to consider indexing, is anything that appears in your WHERE clause or ORDER BY clause. If you're filtering or sorting by a column, you can probably speed up the query quite a bit by adding an index on that column. And in some cases you'll want to add a multi-column index (for the "find by X, then sort those by Y scenario").

Identifying index opportunities

Unfortunately this is more of an art than a science (otherwise databases and ORMs would just do it for us). But there are some things you can do to identify why a particular query is slow.

Watch out for memory or CPU spikes. When the database is doing a full table scan, it's going to be using a lot of memory and CPU. If you see spikes in either of those, it's a good sign that you need to add an index in one of your queries.

One way to find and understand optimization opportunities is to use EXPLAIN QUERY PLAN to see what the database plans to do for the query. This will show you what indexes it's using (if any at all). Things to watch out for in the output are "SCAN" without an index. This means the database is going to look at every single record in the table. For tables with very few records this doesn't matter, but for tables with millions of records, this can be very slow.

Here's a simple example:

EXPLAIN QUERY PLAN SELECT * FROM user WHERE name = 'Alice';

Here the name is not indexed so the database is going to do a full table scan:

QUERY PLAN
`--SCAN user

EXPLAIN QUERY PLAN SELECT * FROM user WHERE username = 'alicerocks';

Here, the username is unique and therefore automatically indexed by Prisma's migrate command, so the database is able to do a quick lookup using the index Prisma made for us:

QUERY PLAN
`--SEARCH user USING INDEX User_username_key (username=?)

We'll go deeper on this analysis in the exercise.

Indexing in Prisma

As mentioned, Prisma will automatically add an index to your migration for unique fields. If you check the migrations sql file you'll find code for them at the bottom:

-- CreateIndex
CREATE UNIQUE INDEX "User_email_key" ON "User"("email");

-- CreateIndex
CREATE UNIQUE INDEX "User_username_key" ON "User"("username");

-- CreateIndex
CREATE UNIQUE INDEX "UserImage_userId_key" ON "UserImage"("userId");

When you want to manually add an index, you can use the @@index attribute:

model Starport {
  id         String      @id @default(cuid())
  name       String
  locationId String
  location   Location @relation(fields: [locationId], references: [id])

  @@index([locationId])
}

This will add an index on the locationId foreign key.

If you want to add a multi-column index, you can do that as well:

model Starport {
  id         String      @id @default(cuid())
  name       String
  locationId String
  location   Location @relation(fields: [locationId], references: [id])

  @@index([locationId, name])
}

This will add an index on the locationId foreign key and the name field. This type of index would be useful if you wanted to look up star ports by their locationId and then sort them by name.

You also have options to name the index, and the sort order for the index as well. You can read more about that in the Prisma docs.

Running SQL commands

So far, we've only run SQL commands through Prisma. You can continue to do this if you like. For example, you could create a new file:

import { PrismaClient } from '@prisma/client'

const prisma = new PrismaClient()

const result = await prisma.$queryRaw`
EXPLAIN QUERY PLAN
SELECT * from note where ownerId = 1
`
console.log(result)

This will work just fine.

However, if you'd like to run SQL commands directly against the database, you will need to download and install the sqlite3 CLI.

I personally use Homebrew for doing this on macOS. If you're on Windows, please make a suggestion here for how to do this!

Installation on Ubuntu & Debian distributions:

sudo apt-get update
sudo apt-get install sqlite3

Check version to confirm installation:

sqlite3 --version

Once you have that installed, then you can run sqlite3 from the command line:

sqlite3 ./prisma/data.db

sqlite> EXPLAIN QUERY PLAN SELECT * from note where ownerId = 1;

Conclusion

There's much more to indexes and optimizations (like SQLite's covering index feature) and understanding SQLite's query optimizer. But for most of you building regular web apps, you don't need to go quite that deep unless you really want to.