Brian's Blog

items I see across my tribes

Find Duplicates in a Table

July 22
by briancarter 22. July 2010 07:07

A common scenario when querying tables is the need to find duplicate fields within the same table. To do this is simple, it requires utilizing the GROUP BY clause and counting the number of recurrences. For example, lets take a customers table. Within the customers table, we want to find all the records where the customers FirstNames are the same. We also want to find which FirstNames are the same and count them.

First off, let’s get a count of how many customers share the same first name:

SELECT FirstName
    ,DuplicateCount = COUNT(1)
FROM Customer
GROUP BY FirstName
HAVING COUNT(1) > 1  -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates

So here we see the results of the query sorted by the names which are duplicated the most. Using the having clause restricts the result set to only those customers that have duplicates. The order by clause orders the results by those who are duplicated the most.

This method can also be expanded to include multiple columns, like FirstName and LastName. In order to expand the criteria, we simply add the columns to the select list and the group by clause.

SELECT FirstName
    ,LastName
    ,DuplicateCount = COUNT(1)
FROM Customer
GROUP BY
    FirstName
    ,LastName
HAVING COUNT(1) > 1  -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates

Ok, so now that we have found the duplicate items, how do we join that back on the main table so we can see the entire record? There are two methods that may perform differently depending on your result set so in this case I will include them both. Only the EXISTS method however can be used for multiple columns.

-- *********************************
-- * Find duplicates using IN
-- *********************************
SELECT *
FROM Customer
WHERE FirstName IN
(
    SELECT FirstName
    FROM Customer
    GROUP BY
        FirstName
    HAVING COUNT(1) > 1  -- more than one value
)
ORDER BY FirstName
 
-- *********************************
-- * Find duplicates using EXISTS
-- *********************************
SELECT *
FROM Customer c1
WHERE EXISTS
(
    SELECT 1
    FROM Customer
    WHERE FirstName = c1.FirstName
    GROUP BY
        FirstName
    HAVING COUNT(1) > 1  -- more than one value
)
ORDER BY FirstName

Categories: Development


 Questions or Feedback, my contact information is located on my About page.


The opinions, thoughts, and comments made in these blog posts are solely my own (unless otherwise stated). They do not reflect the opinions, thoughts or practices of my employer, my universities, my family, or anyone else. Also, I retain the right to change my mind about anything I publish here without having to go back and edit posts that occurred in the past. 

These are my opinions, or just as likely, someone else's opinions that I leveraged for my own.