The DISTINCT keyword is often a question of debate. Many ask “how to do a distinct only on the first few columns of a select statement ?”. There is a fundamental flaw in this question, since DISTINCT is always applied to all columns in the SELECT. Let’s look at an example:
create table #JobHistory (
Title nvarchar(50),
FirstName nvarchar(50),
LastName nvarchar(50),
StartDate datetime,
EndDate datetime
)
Insert into #JobHistory (Title, FirstName, LastName, StartDate, EndDate)
Values
('Developer', 'Brian', 'Carter', '2002-02-02', '2004-04-04'),
('DBA', 'Brian', 'Carter', '2004-04-04', '2008-08-08'),
('Developer', 'Brian', 'Carter', '2008-08-08', '2010-10-10')
SELECT Title, FirstName, LastName,
CONVERT(VARCHAR(10), StartDate, 111) AS StartDate,
CONVERT(VARCHAR(10), EndDate, 111) AS EndDate
FROM #JobHistory
The result, from SQL Server:
Title FirstName LastName StartDate EndDate
Developer Brian Carter 2002/02/02 2004/04/04
DBA Brian Carter 2004/04/04 2008/08/08
Developer Brian Carter 2008/08/08 2010/10/10
Often, what is needed is a list of distinct titles for a given person.
Let’s say we need a list:
Title FirstName LastName
Developer Brian Carter
DBA Brian Carter
The query would look like:
SELECT DISTINCT Title, FirstName, LastName
FROM #JobHistory
As you can see, the distinct was applied to all columns in the select.
The same result can be achieved by using GROUP BY as well:
SELECT Title, FirstName, LastName
FROM #JobHistory
GROUP BY Title, FirstName, LastName
The issue I have with Group By, is all columns in the Select must be listed (or use aggregate function like MAX, MIN, SUM, …). This subject is very interesting. I found another good write up on Jeffs blog.