There are many factors to speeding up stored procedures. Knowing these in advance can save you and your company many hours of headaches. Using these practices when writing procedures, you can turn your SQL Server into a fine-tuned machine.
Know your tables
The biggest way you can help SQL Server speed up your queries, is by helping it determine how to limit the result set as soon as possible. One of the biggest factors in this is in knowing the tables you are joining together.
This translates to:
- How many records are in your tables?
- What are the best ways to join your tables?
- What are the existing indexes on the tables?
Knowing the number of records in a table is very important. When joining tables using inner joins, you always want to join the smallest table first. Why you ask? Say you do not use a WHERE clause and you simply join three tables together. If you start your FROM clause with the largest table, then the first table SQL sees is a very large table, so it sets up an execution plan for a very large result set. Then let’s say the last of the 3 tables only has 10 records, by the time SQL visits that table it’s too late, the execution plan has been set to output 100,000 records, whereas if you specified the first table with 10 records after the FROM clause, the execution plan may be entirely different. This first table is known as the driver table.
There is a disclaimer to this. The SQL optimizer maintains statistics for all the tables, and despite the order of your joins, SQL can and will reverse these to achieve the best plan. If this is the case you say, then why join smallest first? Because the optimizer can only be so smart. You need to take the guess work away from the optimizer. I’ve seen many instances where the query got so complex that the optimizer simply utilized the join order.
From now on, you should consider writing queries using a top-down approach. Eliminate the most records by joining the smallest tables first. Also do the same in the where clause. Use the most limiting where clause statements as soon as you can.
Utilize indexes
One of the simplest ways to become a hero in your organization is to add indexes. While this is very simple, it is often overlooked. This index tutorial will help.
Find common queries
If you have a large procedure, it may be likely you also have the same query being executed multiple times. If this is the case, then instead of performing the same query multiple times, dump the results into a temp table and reference the temp table in the rest of your queries. This is critical for speed. There have been many queries I have optimized in the past that run the same query a few times in the stored procedure.
Always go SET based
This means, never use while loops or cursors. This should actually be #1 on the list, however I’m hoping you already know this. There should be no reason to use loops or cursors. It has been proven time and time again that everything can be solved using SET based methods. SET based just means a simple select statement.
Use Exists instead of JOIN
If you are doing an Inner Join to a table just to eliminate a result set, and the table you are joining is not contributing any columns to the select list, then you are creating a Worktable in the background and using extra IO resources. Use exists instead.
Avoid Distinct
The DISTINCT clause is basically a big group by. Well, it is slightly more optimal, however you get the point. Do not use this unless you can’t avoid it. If you do find yourself using it, then it is very likely you have a bad table design somewhere.
Limit the Select list
Returning too many columns can have a drastic effect on your query. Not only will it increase you chances for bookmark lookups (or key lookups), but the network and disk latency add to the query. Not to mention you will be squeezing more data into your buffer cache.
Index temp tables
Temp tables are treated just like permanent tables according to SQL. They can have indexes & statistics. The only downfall is that they often cause recompiles for the statement when the result sets differ. To counter this use table variables if you have to.
Learn execution plans
Bad times in execution include:
- table & clustered index scans
- large number of output rows – dictated by thick arrows
- parallelism
- Hash-Joins
- Sorts
- Key Lookups (bookmark lookups)
- Table spool – eager spool
Avoid poor performing techniques
CTE’s – Common table expressions
Instead of using CTE’s use temp tables. (Yeah I said it). They perform badly. They do have a good use, and that is recursion.
Table Variables
Yeah I said this too. I know some developers will give me flack for this, however I have rarely seen a table variable perform well. Yes, if you have under 1000 rows, consider it. Otherwise, I suggest #temp tables.