A common scenario when querying tables is the need to find duplicate fields within the same table. To do this is simple, it requires utilizing the GROUP BY clause and counting the number of recurrences. For example, lets take a customers table. Within the customers table, we want to find all the records where the customers FirstNames are the same. We also want to find which FirstNames are the same and count them.
First off, let’s get a count of how many customers share the same first name:
SELECT FirstName
,DuplicateCount = COUNT(1)
FROM Customer
GROUP BY FirstName
HAVING COUNT(1) > 1 -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates
So here we see the results of the query sorted by the names which are duplicated the most. Using the having clause restricts the result set to only those customers that have duplicates. The order by clause orders the results by those who are duplicated the most.
This method can also be expanded to include multiple columns, like FirstName and LastName. In order to expand the criteria, we simply add the columns to the select list and the group by clause.
SELECT FirstName
,LastName
,DuplicateCount = COUNT(1)
FROM Customer
GROUP BY
FirstName
,LastName
HAVING COUNT(1) > 1 -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates
Ok, so now that we have found the duplicate items, how do we join that back on the main table so we can see the entire record? There are two methods that may perform differently depending on your result set so in this case I will include them both. Only the EXISTS method however can be used for multiple columns.
-- *********************************
-- * Find duplicates using IN
-- *********************************
SELECT *
FROM Customer
WHERE FirstName IN
(
SELECT FirstName
FROM Customer
GROUP BY
FirstName
HAVING COUNT(1) > 1 -- more than one value
)
ORDER BY FirstName
-- *********************************
-- * Find duplicates using EXISTS
-- *********************************
SELECT *
FROM Customer c1
WHERE EXISTS
(
SELECT 1
FROM Customer
WHERE FirstName = c1.FirstName
GROUP BY
FirstName
HAVING COUNT(1) > 1 -- more than one value
)
ORDER BY FirstName