Brian's Blog

items I see across my tribes

TSQL Formatter

July 22
by briancarter 22. July 2010 07:51

I’ve tried a few add-ins to format my SQL.  There are many online formatters (beautifiers) available – but I don’t use them because who knows what they are doing with your procs.

I’ve been using SQL PRetty and like the addon: http://www.dpriver.com/products/sqlpp/ssms_index.php

Categories: Development

SQL Server UDF for Random Number

July 22
by briancarter 22. July 2010 07:37

Can’t use RAND() in function.  Combine a view + UDF to make it work.  See attached SQL Script.

SQLRandomNumberUDF.sql (2.96 kb)

Categories: Development

How to Write Optimized Stored Procedures

July 22
by briancarter 22. July 2010 07:15

There are many factors to speeding up stored procedures. Knowing these in advance can save you and your company many hours of headaches. Using these practices when writing procedures, you can turn your SQL Server into a fine-tuned machine.

Know your tables

The biggest way you can help SQL Server speed up your queries, is by helping it determine how to limit the result set as soon as possible. One of the biggest factors in this is in knowing the tables you are joining together.

This translates to:

  1. How many records are in your tables?
  2. What are the best ways to join your tables?
  3. What are the existing indexes on the tables?

Knowing the number of records in a table is very important. When joining tables using inner joins, you always want to join the smallest table first. Why you ask? Say you do not use a WHERE clause and you simply join three tables together. If you start your FROM clause with the largest table, then the first table SQL sees is a very large table, so it sets up an execution plan for a very large result set. Then let’s say the last of the 3 tables only has 10 records, by the time SQL visits that table it’s too late, the execution plan has been set to output 100,000 records, whereas if you specified the first table with 10 records after the FROM clause, the execution plan may be entirely different. This first table is known as the driver table.

There is a disclaimer to this. The SQL optimizer maintains statistics for all the tables, and despite the order of your joins, SQL can and will reverse these to achieve the best plan. If this is the case you say, then why join smallest first? Because the optimizer can only be so smart. You need to take the guess work away from the optimizer. I’ve seen many instances where the query got so complex that the optimizer simply utilized the join order.

From now on, you should consider writing queries using a top-down approach. Eliminate the most records by joining the smallest tables first. Also do the same in the where clause. Use the most limiting where clause statements as soon as you can.

Utilize indexes

One of the simplest ways to become a hero in your organization is to add indexes. While this is very simple, it is often overlooked. This index tutorial will help.

Find common queries

If you have a large procedure, it may be likely you also have the same query being executed multiple times. If this is the case, then instead of performing the same query multiple times, dump the results into a temp table and reference the temp table in the rest of your queries. This is critical for speed. There have been many queries I have optimized in the past that run the same query a few times in the stored procedure.

Always go SET based

This means, never use while loops or cursors. This should actually be #1 on the list, however I’m hoping you already know this. There should be no reason to use loops or cursors. It has been proven time and time again that everything can be solved using SET based methods. SET based just means a simple select statement.

Use Exists instead of JOIN

If you are doing an Inner Join to a table just to eliminate a result set, and the table you are joining is not contributing any columns to the select list, then you are creating a Worktable in the background and using extra IO resources. Use exists instead.

Avoid Distinct

The DISTINCT clause is basically a big group by. Well, it is slightly more optimal, however you get the point. Do not use this unless you can’t avoid it. If you do find yourself using it, then it is very likely you have a bad table design somewhere.

Limit the Select list

Returning too many columns can have a drastic effect on your query. Not only will it increase you chances for bookmark lookups (or key lookups), but the network and disk latency add to the query. Not to mention you will be squeezing more data into your buffer cache.

Index temp tables

Temp tables are treated just like permanent tables according to SQL. They can have indexes & statistics. The only downfall is that they often cause recompiles for the statement when the result sets differ. To counter this use table variables if you have to.

Learn execution plans

Bad times in execution include:

  • table & clustered index scans
  • large number of output rows – dictated by thick arrows
  • parallelism
  • Hash-Joins
  • Sorts
  • Key Lookups (bookmark lookups)
  • Table spool – eager spool

Avoid poor performing techniques

CTE’s – Common table expressions
Instead of using CTE’s use temp tables. (Yeah I said it). They perform badly. They do have a good use, and that is recursion.

Table Variables
Yeah I said this too. I know some developers will give me flack for this, however I have rarely seen a table variable perform well. Yes, if you have under 1000 rows, consider it. Otherwise, I suggest #temp tables.

Categories: Development

SQL Server Cast

July 22
by briancarter 22. July 2010 07:09

The SQL Server cast function is the easiest data type conversion function to be used. It takes only one parameter followed by the AS clause to convert a specified value.

A quick example is the following:

SELECT
    UserID_String = CAST(UserID AS varchar(50))
FROM dbo.User

Categories: Development

Find Duplicates in a Table

July 22
by briancarter 22. July 2010 07:07

A common scenario when querying tables is the need to find duplicate fields within the same table. To do this is simple, it requires utilizing the GROUP BY clause and counting the number of recurrences. For example, lets take a customers table. Within the customers table, we want to find all the records where the customers FirstNames are the same. We also want to find which FirstNames are the same and count them.

First off, let’s get a count of how many customers share the same first name:

SELECT FirstName
    ,DuplicateCount = COUNT(1)
FROM Customer
GROUP BY FirstName
HAVING COUNT(1) > 1  -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates

So here we see the results of the query sorted by the names which are duplicated the most. Using the having clause restricts the result set to only those customers that have duplicates. The order by clause orders the results by those who are duplicated the most.

This method can also be expanded to include multiple columns, like FirstName and LastName. In order to expand the criteria, we simply add the columns to the select list and the group by clause.

SELECT FirstName
    ,LastName
    ,DuplicateCount = COUNT(1)
FROM Customer
GROUP BY
    FirstName
    ,LastName
HAVING COUNT(1) > 1  -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates

Ok, so now that we have found the duplicate items, how do we join that back on the main table so we can see the entire record? There are two methods that may perform differently depending on your result set so in this case I will include them both. Only the EXISTS method however can be used for multiple columns.

-- *********************************
-- * Find duplicates using IN
-- *********************************
SELECT *
FROM Customer
WHERE FirstName IN
(
    SELECT FirstName
    FROM Customer
    GROUP BY
        FirstName
    HAVING COUNT(1) > 1  -- more than one value
)
ORDER BY FirstName
 
-- *********************************
-- * Find duplicates using EXISTS
-- *********************************
SELECT *
FROM Customer c1
WHERE EXISTS
(
    SELECT 1
    FROM Customer
    WHERE FirstName = c1.FirstName
    GROUP BY
        FirstName
    HAVING COUNT(1) > 1  -- more than one value
)
ORDER BY FirstName

Categories: Development

SQL Server For Each Row Next

July 22
by briancarter 22. July 2010 07:00

It is difficult for me to write this particular article and I’ll tell you why. If you don’t care then just skip down to the example, but here goes anyway. It is very rare that you should have to perform looping in SQL. There are certain situations that do require it, and they are mostly back-end or maintenance related.

Ok, now that I got that off my chest, here you go. In order to perform an execution which iterates performing a “for each row”, this is the easiest way for me:

SELECT RowNum = Row_number() OVER(ORDER BY AttributeID), *
INTO   #Attributes
FROM   Attribute

DECLARE @MaxRownum INT
SET @MaxRownum = (SELECT MAX(RowNum)
                  FROM   #Attributes)

DECLARE @Iter INT
SET @Iter = (SELECT MIN(RowNum)
             FROM   #Attributes)

WHILE @Iter <= @MaxRownum
  BEGIN
      SELECT *
      FROM   #Attributes
      WHERE  RowNum = @Iter

      -- run your operation here
      SET @Iter = @Iter + 1
  END

DROP TABLE #Attributes 

Once again, take heed, even though this is how to loop, I encourage set-based operations whenever possible.  This does save you from deploying a console app when a simple iteration is needed.

Categories: Development

Rownum in SQL Server

July 22
by briancarter 22. July 2010 06:54

Prior to SQL Server 2005, there was no inherent function to generate a rownumber within a row. If you are on SQL 2005+, then you will utilize the following function:

-- SQL 2005+
SELECT
    RowNumber = ROW_NUMBER() OVER (ORDER BY a.AttributeID ASC)
    ,a.*
FROM Attribute a

This will return a column with 'RowNumber’ as the first column. 

Categories: Development

Using Coalesce

July 22
by briancarter 22. July 2010 06:50

The coalesce function is used to find the first non-null value. The function takes limitless number of parameters in order to evaluate the first non null. If all the parameters are null, then COALESCE will also return a NULL value.

-- hard coded example
SELECT MyValue = COALESCE(NULL, NULL, 'abc', 123)

The example above returns back ‘abc’ as it is the first non null value.

When would you use COALESCE you ask? Well, the most common scenario I use it in is when I am joining two or more tables together and the tables all contain an acceptable value for, say, firstname. However if firstname is null, in the first table, we will want to use it from the second table, and so forth.

SELECT FirstName = COALESCE (a.FirstName, b.First_Name, c.Fname)
FROM HRUsers a
LEFT JOIN MarketingUsers b
ON b.EmployeeID = a.EmployeeID
LEFT JOIN SalesUsers c
ON c.EmployeeID = a.EmployeeID

This example basically says, if FirstName is populated in the HRUsers table, we want to use that one first, otherwise, use the First_Name field in MarketingUsers, or if that is null, take the FName from SalesUsers.

It should be noted that if only comparing two values, the ISNULL function has been proven to be quicker.

Categories: Development

Insert From Select

July 22
by briancarter 22. July 2010 06:47

The method of inserting records from one table to another differs depending on whether or not the target table already exists. Neither method is difficult, however one method requires more coding.

Insert where target table does not exist

If it does not, then you are in luck, there is a method that could not be simpler using SELECT..INTO. Select into automatically creates a table based on the result of a select statement. This select statement can be as simple or complex as you like. The only think you need to do, is add the line: INTO TABLENAME right above the FROM statement. Here goes:

-- Copy all the records into a new table
SELECT *
INTO CustomerCopy
FROM Sales.Customer
 
-- View the table
SELECT *
FROM CustomerCopy

It should be noted that this method does not create any keys that may be existing from the old table to the new table.

Insert where target table does exist

It’s not that this method is more difficult, it just requires more code. To do this, you need to use the INSERT INTO statement. When using this method, you need to specify the column names of the target table you want to insert, and the column names of the source table you are inserting from. Here is an example:

INSERT INTO [AdventureWorksLT2008].[SalesLT].[Customer]
(
    [NameStyle]
    ,[Title]
    ,[FirstName]
    ,[MiddleName]
    ,[LastName]
    ,[Suffix]
    ,[CompanyName]
    ,[SalesPerson]
    ,[EmailAddress]
    ,[Phone]
    ,[PasswordHash]
    ,[PasswordSalt]
    ,[ModifiedDate]
)
 
SELECT
    [NameStyle]
    ,[Title]
    ,[FirstName]
    ,[MiddleName]
    ,[LastName]
    ,[Suffix]
    ,[CompanyName]
    ,[SalesPerson]
    ,[EmailAddress]
    ,[Phone]
    ,[PasswordHash]
    ,[PasswordSalt]
    ,[ModifiedDate]
  FROM [AdventureWorksLT2008].[SalesLT].[Customer2]

The brackets surrounding the column names above are optional. If you are using SQL Server Management Studio (SSMS), the easiest way to do this is to right click on the table name, choose ‘Script Table As’ -> ‘Create To’ -> ‘New Query Window’. This will automatically generate the insert code for all the column names to insert into. The only thing you have to do is fill in your select statement.

Categories: Development

List All Tables in a Database

July 22
by briancarter 22. July 2010 06:44

To list all databases in a SQL Server instance, run the following:

SELECT *
FROM sys.sysdatabases

There are a few methods for listing all the tables in a database. Some methods will be able to show more specifics than others due to the capabilities inherent in the tables being joined. I’ll show you the simplest way first which is probably the most supported way.

SELECT TABLE_SCHEMA + '.' + TABLE_NAME, *
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
ORDER BY TABLE_SCHEMA + '.' + TABLE_NAME

This method will it show you all the tables, and it will also display the rowcount and datasize in Megabytes.

SELECT *
FROM (
    SELECT
        TableName   = t.TABLE_SCHEMA + '.' + t.TABLE_NAME
        ,[RowCount] = SUM(sp.[rows])
        ,Megabytes  = (8 * SUM(CASE WHEN sau.type <> 1 THEN sau.used_pages 
WHEN sp.index_id < 2 THEN sau.data_pages ELSE 0 END)) / 1024 FROM INFORMATION_SCHEMA.TABLES t JOIN sys.partitions sp ON sp.object_id = OBJECT_ID(t.TABLE_SCHEMA + '.' + t.TABLE_NAME) JOIN sys.allocation_units sau ON sau.container_id = sp.partition_id WHERE TABLE_TYPE = 'BASE TABLE' GROUP BY t.TABLE_SCHEMA + '.' + t.TABLE_NAME ) A ORDER BY TableName

This method uses the INFORMATION_SCHEMA system view and joins against the sys.partitions view in order to get the counts. The megabytes calculation multiples the pages by the page size, then divides by 1024 in order to get the megabytes calculation.

Categories: Development


 Questions or Feedback, my contact information is located on my About page.


The opinions, thoughts, and comments made in these blog posts are solely my own (unless otherwise stated). They do not reflect the opinions, thoughts or practices of my employer, my universities, my family, or anyone else. Also, I retain the right to change my mind about anything I publish here without having to go back and edit posts that occurred in the past. 

These are my opinions, or just as likely, someone else's opinions that I leveraged for my own.