In these articles, you will see me reference execution plans a lot – talk about them, provide screenshots etc. I assume that the reader has some level of knowledge about what an execution plan is, how to view them and how to understand them. This post is aimed at those readers who do not have that knowledge and it is a primer on the topic.
There are deeper resources out there and I’ll link some of them below, but I wanted to explain these concepts in my own words as a foundation for everything else on this blog.
My knowledge of execution plans comes from working with Microsoft SQL Server, though all relational database systems use them (sometimes by other names) and there will be differences between database systems. This article focuses on how they appear and work in SQL Server, though the concepts may apply to other platforms, that is outside the scope of this article.
The SQL Language
To start to understand execution plans we need to start with what the SQL language is (and isn’t)
SQL is a declarative language – you declare an end state (your desired query results) but not the programmatic method to process the data to get to those results. In languages like C#, Python and C++, to name drop a few, you write code which a compiler or interpreter will execute in the way you tell it – you may choose to compare two arrays by sorting them both then scanning for differences, you may decide to build a hash table, you may use a foreach loop, a while loop or maybe some other approach. In the SQL language the “how” is decided by the database management system itself and specifically a component of the system related to query processing and optimization.
In a nutshell – you tell the database system the query results you wish to see via a SQL query and the database engine “does the magic” in terms of how to retrieve that data in an efficient way.
What Happens When we Press F5
So with the above said, when we write a query to specify a result set, what happens behind the scenes to perform the aforementioned “Magic”:
SQL Server goes through a few stages once we hit that F5 key
- Parses the query
- Checks the T-SQL is valid syntax
- Binding
- Checks the objects, columns etc we have specified exist in the database
- Simplification / Optimization
- This is the “magic” where the optimizer decides how to retrieve the query results from the storage layer where our data is stored and present those results to client that issued the query
- Execution
- Executes the query as per “how” it decided to get the results and displays the results to the user
The above is a deliberately brief overview, there are authors and articles who go into much greater detail than me on the detail but as is often the case, that topic alone is huge!
So….What is an Execution Plan
So where does the execution plan come into all this? The execution plan is the output of the Simplification / Optimization stage above and is what is created by SQL Server’s query optimizer component, this is known as Plan Compilation.
I like to think of the query optimizer as a bit like a Sat-Nav (GPS): we tell the Sat-Nav our desired destination and it goes off and analyses a number of different routes and comes back with one it thinks is the “best”
The SQL Server query optimizer is similar – we give it our destination (the results) and it analyzes a number of different “routes” to retrieve the data and presents those results to us in a way it thinks is the “best”
There are a few things to note:
- The Sat-Nav probably does not analyse every single possible route (I know very little of the internals of such systems, hence “probably”) I suspect it ignores little side roads that offer minimal benefit to the journey
- The query optimizer is similar – it does not analyze every single possible route to the data as doing so would take a very long (possibly infinite) amount of time. This is the result of a process called dynamic programming which is beyond the scope of this article
- SQL Server will not necessarily provide the “best” route to the data but one that it thinks is a suitably efficient route to the data in the time available
- Whereas a Sat-Nav may suggest changes to the route on the way if it thinks there is a better route based on live conditions, a database system sticks to the “route” it has chosen and will not make changes on the way (Though there are arguably cases in SQL Server 2019+ where this differs slightly due to adaptive joins, though again this is beyond the scope of this article)
What’s a Route?
The term “route” is my own terminology for the sake of the Sat-Nav analogy in this post, it is not a term I’ve ever heard used within query optimization. I’ve mentioned SQL Server analyses different “routes” to the data but what does a “route” look like?
Consider the following simple Query from the Stack Overflow Database, I am using the small 2010 version provided under cc-by-sa 4.0 license from Stack Exchange Data Dump and running in compatibility level 130 on SQL Server 2022 on a VM with 8 cores and 46GB RAM.
The query returns all the posts that have been made by users called John, along with comments on those posts.
SELECT p.Title,
u.DisplayName,
c.[Text]
FROM dbo.Posts p
JOIN dbo.Users u
ON p.OwnerUserId = u.Id
JOIN dbo.Comments c
ON c.PostId = p.Id
WHERE u.DisplayName = 'John';
If I were to ask you, a human how we would solve this, how would you do it?
Some options would be
Option 1
- Join the Posts table to the Users table
- Join that resulting set to the Comments table
- Filter the results down to Johns
Option 2
- Join the Comments table to the Users table
- Join that resulting set to the Posts table
- Filter the results down to Johns
Option 3
- Filter the Users table down to Johns
- Join the Users table to the Posts table
- Join the resulting set to the comments table
Option 4
- Filter the Users table down to Johns
- Join the Users table to the Comments table
- Join the resulting set to the Posts table
So we can see there are different ways we could satisfy this fairly trivial query and I haven’t even touched on indexes and other things that could play a part but hopefully this explains what I mean by “routes”
Now we know the “routes” how do we know which one is “best”?
As a human, you may have some knowledge about the data to assist your decision:
- How many users are there?
- How many of those are called John?
- How many users have made posts?
- How many posts does an average user make?
The answers to the above would influence the decision you make on which option is “best” you would probably also look at some of those rather unpalatable options and think “no way in the world am I doing it that way”
SQL Server has this information available to it via a mechanism called statistics which it uses to estimate which “route” is the most efficient. Statistics is a large topic on its own but for now be aware that this metadata plays a huge part in SQL Server’s decision making process.
How Would SQL Server do It?
Sometimes I hear people say that to process the query below
SELECT *
FROM dbo.Posts p
JOIN dbo.Users u
ON p.OwnerUserId = u.Id
WHERE u.DisplayName = 'John';
means SQL Server first reads the Posts table, then joins to the Users table and then filters the results those with a DisplayName of John. I’ve also heard people say that the following logically equivalent query would be more efficient because it means less rows are joined together and those users we are not interested in are filtered out at the join stage:
SELECT *
FROM dbo.Posts p
JOIN dbo.Users u
ON p.OwnerUserId = u.Id AND
u.DisplayName = 'John';
So how can we prove which of these queries is more efficient?
Enter the Execution Plan…
We can prove which of these queries is more efficient by analysing what SQL Server actually did to return the results – what “route” did it take for each and which was more efficient?
To find this out, we look at the execution plan. To get the execution plan, we need to enable it within the SQL Server Management Studio window, this can be done by opening the Query menu and toggling on Include Actual Execution Plan:

Now when we run a query in this window, we see a new tab upon completion. Lets run the first of our two queries:
SELECT *
FROM dbo.Posts p
JOIN dbo.Users u
ON p.OwnerUserId = u.Id
WHERE u.DisplayName = 'John';
and we see the new tab:

If we select that tab, we see our execution plan:

The execution plan here is quite simple, I have no additional non-clustered indexes on my tables so we are limiting the number of potential “routes” the optimizer can take (more about indexes in another post)
The execution plan shows us how SQL Server chose to execute the query. Execution plans are read to right to left, top to bottom, though some people read them the other way round too. The top right node represents the first thing that SQL Server did.
So stepping through the plan, here is what SQL Server did
- Read the entire users table (Clustered Index Scan) and returned the Johns (this information is hidden in a tooltip which I have added to the screenshot)
- Read the entire posts table (Clustered Index Scan)
- Joined the results together using a hash match algorithm
Simply, that’s it. There is a whole host of other information we can look at and dissect and there are some numbers in the screenshot I haven’t mentioned but that will be for another time as this post intends to be a brief introduction.
So which is the most efficient way of writing the query above?
If I run the full batch:
SELECT *
FROM dbo.Posts p
JOIN dbo.Users u
ON p.OwnerUserId = u.Id
WHERE u.DisplayName = 'John'
SELECT *
FROM dbo.Posts p
JOIN dbo.Users u
ON p.OwnerUserId = u.Id AND
u.DisplayName = 'John'
and then look at the execution plan, we now see two execution plans – one per query:

One other thing we can see is that they are the same shape. For both queries, SQL Server has executed the query in the exact same way and so both queries are equivalent in terms of performance. OK, so there is probably more to such a comparison than just the plan shape but what we have proved here is that it didn’t matter how we wrote the query, SQL Server chose what it thought was the best way to execute both queries and it executed them in the same way.
Estimated Execution Plans vs Actual Execution Plan
One thing I have deliberately neglected to mention until now is that there are two types of execution plan – estimated execution plan and actual execution plan.
An estimated execution plan is the plan that SQL Server compiles as a result of the optimization I have mentioned and as the name suggests, it is based on estimations (such as estimating how many users are called John to decide whether it is worth filtering before or after the join)
The estimated execution plan can be opened from the query menu:

This isn’t a toggle like the Actual Execution plan, clicking it will bring up the estimated execution plan straight away and getting an estimated execution plan will not execute the query.
In contrast, the actual execution plan is the same shape as the estimated execution plan but it includes actual information that is only known once the query has run (so an actual execution plan requires the query to be executed) There are also some nuances in versions of SQL Server 2017+ with the introduction of adaptive joins that perform one join type or another based on runtime row thresholds but I’ll leave that aside for now.
Plan Caching
As query optimization can be an expensive process, SQL Server looks to do it as little as possible, therefore when a plan is compiled, it will be stored in the Plan Cache. The Plan Cache is a reserved space in SQL Server’s memory where it keeps these execution plans so it can re-use them and doesn’t have to recompile them each time a query is run. The first time a query is run, the plan is compiled and goes into the plan cache, the next time the query runs, the plan is retrieved from the cache and is the plan SQL Server will use to execute the query and doesn’t need to recompile it and perform all the estimation again.
The plan cache will be pruned if the server comes under memory pressure and is cleared on service restarts. Plans will also get “aged out” as the cache becomes full, removing the old and lesser used plans to make way for new ones
Conclusion
In this article we have had a brief introduction to what an execution plan is, why SQL Server and other database systems use them, when it builds them and how we as users might use them. There will be a follow up article that looks at them in more detail, covering the detail we have deliberately skipped over in order to keep this article as brief as possible
References / Further Reading
David J. DeWitt – SQL Query Optimization: Why is it so Hard to Get Right?
floy – Introduction to Postgres Query Planning
Microsoft – Query processing architecture guide
