Joshua Anderson: Distinct Values based on a single column

Distinct is a great tool! However, sometimes I find myself trying to retrieve records that are distinct in repsect to a single column. consider the folowing.

SELECT c1 , c2 
FROM (
SELECT 1 c1,'a' c2
UNION ALL
SELECT 1,'d'
UNION ALL
SELECT 2,'b'
UNION ALL
SELECT 3,'c'
UNION ALL
SELECT 4,'e'
UNION ALL
SELECT 4,'f'
) q1

This will return the rowset

c1 c2  
1  a
1  d
2  b
3  c
4  f
4  e

Well what if I only wanted to return distinct values based on c1 so that my result set looks like this:

c1 c2   
1  d
2  b
3  c
4  f

Distinct wont Quite get the job done in that case since even though c1 has repeated values the other rows in the data set make the row as a whole unique.

Review the following

SELECT c1 , c2 , ROW_NUMBER() over (PARTITION by c1 order by RAND() ) RowNum
FROM (
SELECT 1 c1,'a' c2
UNION ALL
SELECT 1,'d'
UNION ALL
SELECT 2,'b'
UNION ALL
SELECT 3,'c'
UNION ALL
SELECT 4,'e'
UNION ALL
SELECT 4,'f'
) q1

In this case we are setting up a rownumber for each row. The parition by statement makes the number specific to column specified. this will return the following result set

c1 c2 RowNum
1  d  1
1  a  2
2  b  1
3  c  1
4  f  1
4  e  2

so as you can see subsequent rows that have duplicate c1 values have a value for rownum greater than 1. If we wrap this in a outerquery such as

select * from (

SELECT c1 , c2 , ROW_NUMBER() over (PARTITION by c1 order by RAND() ) RowNum
FROM (
SELECT 1 c1,'a' c2
UNION ALL
SELECT 1,'d'
UNION ALL
SELECT 2,'b'
UNION ALL
SELECT 3,'c'
UNION ALL
SELECT 4,'e'
UNION ALL
SELECT 4,'f'
) q1
) q2 where RowNum <2

we get this :

c1 c2 RowNum
1  d  1
2  b  1
3  c  1
4  f  1

which is exactly what we were looking for!

Joshua Anderson

Thursday, May 24, 2012

Distinct Values based on a single column

No comments:

Post a Comment