As a software developer, all of us come across SQL queries and databases at least a few times in the development process. All of us have written multiple queries. Starting with the most simple select queries to very complex nested query statements and procedures. Wouldn’t it be exciting to know about how the SQL server goes about executing these queries? Also you might have noticed that sometimes our SQL query takes longer that expected to return the result? Why query is slow? The reason for this delay might be found out in the SQL Query Execution plans.
The SQL server surely must use some strategy in order to execute the queries and fetch data efficiently from the database. This task of interpreting the query and analyzing it is done by a component called Query Optimizer. This query optimizer takes the query, database schema (tables) and indexes as inputs and generates the Execution plan for the query.
What are SQL Query Execution plans?
A simple definition of an execution plan can be given as: The query optimizer's way to determine, efficient execution of the SQL statement presented before it, is known as an SQL query execution plan.
Almost all the doubts regarding the execution of the query, optimal usage of indexes, number of tables scans during a query execution can be solved by taking a look at the execution plan for the query. This blog is an attempt to understand the execution plans and know how they can be useful while writing a good query.
Before we go into the details of an execution plan, let's have a quick peek at how the execution plan looks like. There can off course be slight differences in how different SQL clients present them. But this is just to have an idea of what we intend to study:
The SQL statement goes through the following stages during execution:
Let us take a look at these phases in short,
Once the query is submitted to the SQL Server for the first time, it is checked whether the query properly formed. The syntax of the query is correct. This is known as parsing of the query.
Now, what points are considered while checking the syntax of the query? Points such as, resolving names of different tables, columns, objects within the query string, identifying datatypes for the columns and location of different functions like aggregate functions in the query.
Important point to note here is that this parsing is done only for DML queries. DML queries are Data Manipulation Logic queries. These queries are which perform addition, deletion, updation and selection operations on the data in the database. Thus there can be multiple ways to perform these operations effectively. In contrast to this, the DDL queries. These are the Data Definition Logic Queries. These queries concern with the creation deletion and updation of database or table structures. There is only one way to correctly create/update/delete tables or database structures and hence parsing of these queries is not necessary.
This process is known as query parsing, and the output for this process is a parse tree. The parse tree contains the logical steps as to how the query needs to be executed. This parse tree is passed on to the query optimizer.
As we are speaking of execution plans, this section is quite important for us. The generation and evaluation of an execution plan is done by the optimizer. The optimizer decides how to execute the given SQL statement efficiently. Depending upon application of joins, usage of indexes in the query and the cost to perform these operations (in terms of CPU processing and I/O cycles)the optimizer selects an efficient execution plan. Hence, it is also known as a cost-based plan.
The optimizer generates many execution plans and evaluates them against each other to select the most efficient one. It decides the efficiency of the plan depending upon minimum resource utilization(CPU and I/O) and how quickly it returns the result for the query.
The optimizer uses a CPU intensive program to evaluate these execution plans. Therefore if the SQL statement presented to the optimizer is simple then the optimizer generates a single trivial-plan. This execution plan might not be very quick to return results, but as compared to the resource usage for the evaluation of different plans for the query, this plan is then beneficial.
How does it optimizes?
Now, one might think, optimizer is just a program, how is it able to decide efficiency of the execution plan? For this purpose, there is a lot of data about the different indexes, columns and their frequency of occurrence in a SQL statement present in the SQL server. These are known as statistics. Every time a query is executed the data about the Where clause, the Joins, the Indexes is added to these statistics.
Depending upon the query tree received from the parser and these statistics the optimizer determines the execution time required for every step of the query tree. It can then evaluate the different possibilities of execution (execution plans). These possibilities can be by rearranging joins, testing different varieties of joins and indexes etc. to find out the efficient execution plan.
Each logical operation in the query tree is assigned an estimated amount of time. The sum total of the execution time of all the operations in a query then becomes the estimated cost of execution for the query.
An important thing to note here is, all the execution time calculated here is an estimate of the time that might be needed to perform an operation. Given an infinite amount of time and complete, up-to-date statistics, the optimizer would find the perfect plan for executing the query.
These generated plans are then stored in the memory of the server in the plan-cache. Before the plan is actually stored it is checked whether the identical plan already exists in the cache[read more in Execution plan reuse]. If an identical plan is found in the cache, that plan is used for execution and the newly generated plan is not stored.
Once the execution plan is generated, the query is actually executed, according to the plan. We shall learn about the details of execution later as it doesn’t concern our execution plans, except to note that the carefully generated execution plan may change during the actual execution process.
Why would this happen?
- The statistics used to generate the plan were old and out of date or they have changed from the time the optimizer generated the execution plan. Or
- Execution engine comes across that the query crosses the threshold for parallel execution.
The results of the query are returned depending upon the successful execution of the actual execution plan.
So far in the discussion we came across two terms: Estimated execution plan and Actual execution plan. Let us dig a bit deeper into why these two plans are required. Firstly, these can be defined as below:
Estimated query execution plans: Estimated plans provide an estimation of the work that SQL server is expected to perform to get the result.
Actual query execution plans: Actual execution plans are generated after the SQL queries are executed. Thus, an actual execution plan contains run-time information, such as the actual resource usage and run-time warnings if any.
Estimated and Actual SQL query execution plans
In the general scenario, estimated and actual execution plans do not tend to differ. But sometimes it may happen that the estimated and actual execution plans are different. Let's see some factors that may play a role in the generation of these differences in the execution plans.
As discussed in the Optimizer section above, the SQL server maintains statistics(metadata) for the SQL queries being executed. Also, these statistics are subject to change on every query execution. For example, when new data is added, or existing data is deleted or updated the values that represent the indexes in the metadata might change. Over a longer period of time the statistics may no longer represent the actual data appropriately.
Furthermore, if the server picks up the existing execution plan to work with, then the changed metadata might affect the query execution. Hence to optimize the solution, the actual execution plan comes into play. This plan modifies the existing execution in order to fit to the changed statistics and structures.
Not having run time data:
The execution plans are made before the actual queries are executed. Thus, there are some points where we create temporary tables or views in our query. But these are only created during the execution of the query. Hence, the estimated plans do not take them into account. Consequently generating a faulty estimation plans.
In such cases the actual execution plans are more helpful instead.
Reuse of an execution plan
As said in the discussion above, the optimizer stores the plan into the plan-cache after generation. But if there is already an actual plan present in the cache identical to the current one, then the current plan is not stored. So said, each plan is stored once in the memory of the server.
If parallelism is supported for a query, then there is a possibility to have more than one execution plans for that query. But these plans contain different set of operations that can be executed parallelly.
Thus we can say the execution plans are available for reuse. But there is a specific period of time after which the execution plans are removed from the memory. Each time a plan is stored it is associated with a concept of "Age". This is calculated depending upon the execution cost of the plan and the number of times it is used.
Age = (estimated execution cost of plan) * (No. of times plan used)
The estimated plan is removed from the memory when:
- There is more memory requirement
- The Age factor has reached 0
The memory management of the server scans all the caches including the plan-cache at regular intervals to perform the cleanup.
Fast track reading:
- The query optimizer's way to determine, efficient execution of the SQL statement presented before it, is known as an execution plan
- The SQL statement goes through the following stages during execution: Parsing, Optimizing, Execution
- Checking whether the query properly formed is known as parsing of the query
- Generation and evaluation of an execution plan is done by the query optimizer
- Efficiency of the plan is decided depending upon minimum resource utilization(CPU and I/O) and how quickly it returns the result for the query
- These generated plans are then stored in the memory of the server in the plan-cache
- Estimated plans provide an estimation of the work that SQL server is expected to perform to get the result.
- Actual execution plans are generated after the SQL queries are executed. Thus, an actual execution plan contains run-time information, such as the actual resource usage and runtime warnings if any.