Sunday, March 11, 2012

Getting Unique entire row records based on one unique column..

Hi,

I like to get all records(all columns) for any employee who has more than 2 unique SuggestedTraining..As simple as possible and execution (performance optimized) speed is as fast as possible.

Data:

Table structure below

Assessment Table

AssessmentID EmployeeID CourseID SuggestedTraining Date ....more columns
1 4 1 'Training 1' <date>
2 2 2 'Training 2' <date>
3 1 4 'Training 1' <date>
4 2 1 'Training 2' <date>
5 4 2 'Training 1' <date>
6 3 3 '' <date>
7 2 5 'Training 3' <date>
8 5 7 'Training 2' <date>
9 7 4 'Training 2' <date>
10 5 3 'Training 1' <date>
11 5 6 'Training 3' <date>

Desired Result:

AssessmentID EmployeeID CourseID SuggestedTraining Date
4 2 1 'Training 2' <date>
7 2 5 'Training 3' <date>
8 5 7 'Training 2' <date>
10 5 3 'Training 1' <date>
11 5 6 'Training 3' <date>

Try the example below.

Chris

Code Snippet

DECLARE @.AssessmentTable TABLE

(

AssessmentID INT,

EmployeeID INT,

CourseID INT,

SuggestedTraining VARCHAR(20)

)

INSERT INTO @.AssessmentTable(AssessmentID, EmployeeID, CourseID, SuggestedTraining)

SELECT 1, 4, 1, 'Training 1' UNION ALL

SELECT 2, 2, 2, 'Training 2' UNION ALL

SELECT 3, 1, 4, 'Training 1' UNION ALL

SELECT 4, 2, 1, 'Training 2' UNION ALL

SELECT 5, 4, 2, 'Training 1' UNION ALL

SELECT 6, 3, 3, '' UNION ALL

SELECT 7, 2, 5, 'Training 3' UNION ALL

SELECT 8, 5, 7, 'Training 2' UNION ALL

SELECT 9, 7, 4, 'Training 2' UNION ALL

SELECT 10, 5, 3, 'Training 1' UNION ALL

SELECT 11, 5, 6, 'Training 3'

SELECT at.AssessmentID,

at.EmployeeID,

at.CourseID,

at.SuggestedTraining

FROM @.AssessmentTable at

INNER JOIN (SELECT atg.EmployeeID

FROM @.AssessmentTable atg

GROUP BY atg.EmployeeID

HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2) t

ON t.EmployeeID = at.EmployeeID

|||

Hi Chris,

I cannot do the insertion because simply I do not know the data at runtime,

Code Snippet

INSERT INTO @.AssessmentTable(AssessmentID, EmployeeID, CourseID, SuggestedTraining)

SELECT 1, 4, 1, 'Training 1' UNION ALL

SELECT 2, 2, 2, 'Training 2' UNION ALL

SELECT 3, 1, 4, 'Training 1' UNION ALL

SELECT 4, 2, 1, 'Training 2' UNION ALL

SELECT 5, 4, 2, 'Training 1' UNION ALL

SELECT 6, 3, 3, '' UNION ALL

SELECT 7, 2, 5, 'Training 3' UNION ALL

SELECT 8, 5, 7, 'Training 2' UNION ALL

SELECT 9, 7, 4, 'Training 2' UNION ALL

SELECT 10, 5, 3, 'Training 1' UNION ALL

SELECT 11, 5, 6, 'Training 3'

The desired result is to get records (all columns) of table based on Employees having more than 2 Suggested Training, empty Suggested Training and Duplicated Training are excluded...

|||

It is just sample data Chris offered to demonstrate his suggested solution.

Run all of the code as Chris posted to verify if the solution works for you.

According to your specifications, the Date is immaterial for the solution. If that is NOT correct, then refine your desired outcome so we can refine the suggested solutions.

|||

Hi,

My question how do I replace this part of script to be adaptable to actual data.

Code Snippet

SELECT 1, 4, 1, 'Training 1' UNION ALL

SELECT 2, 2, 2, 'Training 2' UNION ALL

SELECT 3, 1, 4, 'Training 1' UNION ALL

SELECT 4, 2, 1, 'Training 2' UNION ALL

SELECT 5, 4, 2, 'Training 1' UNION ALL

SELECT 6, 3, 3, '' UNION ALL

SELECT 7, 2, 5, 'Training 3' UNION ALL

SELECT 8, 5, 7, 'Training 2' UNION ALL

SELECT 9, 7, 4, 'Training 2' UNION ALL

SELECT 10, 5, 3, 'Training 1' UNION ALL

SELECT 11, 5, 6, 'Training 3'

Try to replace entire code it with this,

Code Snippet

DECLARE @.AssessmentTable TABLE

(

AssessmentID INT,

EmployeeID INT,

CatalogueCourseID INT,

SuggestedTraining nVARCHAR(500),

CompetencyModelID int

)

Insert Into @.AssessmentTable(AssessmentID, EmployeeID, CatalogueCourseID, SuggestedTraining, CompetencyModelID)

Select AssessmentID, EmployeeID, CatalogueCourseID, SuggestedTraining, CompetencyModelID From Assessment

Where IsActive =1 AND SuggestedTraining <> ''

SELECT at.AssessmentID,at.EmployeeID,at.CatalogueCourseID,at.SuggestedTraining FROM @.AssessmentTable at

INNER JOIN (SELECT atg.EmployeeID FROM @.AssessmentTable atg GROUP BY atg.EmployeeID HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2) t

ON t.EmployeeID = at.EmployeeID

Actual Result:

AssessmentID EmployeeID CatalogueCourseID SuggestedTraining
9 9 8 Training Course 2
30 9 9 Training Course 3
42 9 9 Training Course 3 << Duplicate
43 9 7 Training Course 1
47 9 7 Training Course 1 << Duplicate
54 9 5 Training
48 12 7 Training Course 1
40 12 7 Training Course 1 << Duplicate
46 12 9 Training Course 3
..... More data below....

Still Desired Result is not achieved...

|||

As Arnie pointed out in a response to your previous post, you only need to use the following:

Code Snippet

SELECT at.AssessmentID,

at.EmployeeID,

at.CourseID,

at.SuggestedTraining

FROM @.AssessmentTable at

INNER JOIN (SELECT atg.EmployeeID

FROM @.AssessmentTable atg

GROUP BY atg.EmployeeID

HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2) t

ON t.EmployeeID = at.EmployeeID

The CREATE TABLE / INSERT statements were included in my original post solely to show that the concept works. All you need to do is take the above sample and replace '@.AssessmentTable' with your actual table name.

Chris

|||

Hi Chris,

I tried this code slightly modified it to exclude empty SuggestedTraining, and still there are duplicate Suggested Training in the results. How can I remove duplicated SuggestedTraining in the result set. Thanks.

Code Snippet

SELECT at.AssessmentID, at.EmployeeID, at.CatalogueCourseID, at.SuggestedTraining FROM Assessment at

INNER JOIN (SELECT atg.EmployeeID FROM Assessment atg Where SuggestedTraining <> ''

GROUP BY atg.EmployeeID HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2) t

ON t.EmployeeID = at.EmployeeID

|||

For the duplicates, which AssessmentID would you want to return - the maximum or the minimum? [therein lies a clue to the solution... Wink ]

Chris

|||

Well,

Maximum is the choice, right now date is not considered, how would the sql statement would be if both maximum and date (latest) is considered?

|||

Try the code below.

Chris

Code Snippet

SELECT MAX(at.AssessmentID),

at.EmployeeID,

at.CourseID,

at.SuggestedTraining

--, MAX(at.DateColumn)

FROM @.AssessmentTable at

INNER JOIN (

SELECT atg.EmployeeID

FROM @.AssessmentTable atg

GROUP BY atg.EmployeeID

HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2

) t ON t.EmployeeID = at.EmployeeID

GROUP BY at.EmployeeID,

at.CourseID,

at.SuggestedTraining

|||

I run the sql script you provided and here is the result:

AssessmentID EmployeeID CourseID SuggestedTraining

26 4 8 Training Course 1
8 4 8 Training Course 2
58 8 NULL <<-- Should Not be included
6 8 16 Training Course 1 <<-- Should Not be included
57 9 NULL <<-- Should Not be included
54 9 5 Training
47 9 7 Training Course 1
9 9 8 Training Course 2
42 9 9 Training Course 3
55 12 NULL <<-- Should Not be included
1 12 3 Training Course 2
48 12 7 Training Course 1
46 12 9 Training Course 3
22 13 NULL <<-- Should Not be included
13 13 NULL TEST TRAIN
41 13 7 Training Course 1
31 13 8 Training Course 2
45 13 9 Training Course 3
16 26 NULL <<-- Should Not be included
18 26 7 Training Course 1 <<-- Should Not be included
53 82 7 Training Course 1
38 82 16 Test Training Course
28 150 7 Training Course 1
27 150 9 Training Course 3
33 179 3 Training Course 2
37 179 16 Test Training Course
36 180 NULL <<-- Should Not be included
35 180 8 Training Course 2 <<-- Should Not be included

I filter the empty Suggested Training using this modified sql script:

Code Snippet

SELECT MAX(at.AssessmentID), at.EmployeeID, at.CatalogueCourseID, at.SuggestedTraining FROM Assessment at

INNER JOIN (SELECT atg.EmployeeID FROM Assessment atg GROUP BY atg.EmployeeID HAVING COUNT

(DISTINCT atg.SuggestedTraining) >= 2) t ON t.EmployeeID = at.EmployeeID Where at.SuggestedTraining <> ''

GROUP BY at.EmployeeID, at.CatalogueCourseID,at.SuggestedTraining

And still did not get desired result...

AssessmentID EmployeeID CourseID SuggestedTraining

26 4 8 Training Course 1
8 4 8 Training Course 2
6 8 16 Training Course 1 <<-- Should Not be included
54 9 5 Training
47 9 7 Training Course 1
9 9 8 Training Course 2
42 9 9 Training Course 3
1 12 3 Training Course 2
48 12 7 Training Course 1
46 12 9 Training Course 3
13 13 NULL TEST TRAIN
41 13 7 Training Course 1
31 13 8 Training Course 2
45 13 9 Training Course 3
18 26 7 Training Course 1 <<-- Should Not be included
53 82 7 Training Course 1
38 82 16 Test Training Course
28 150 7 Training Course 1
27 150 9 Training Course 3
33 179 3 Training Course 2
37 179 16 Test Training Course
35 180 8 Training Course 2 <<-- Should Not be included

Here is Whole Data for Assessment table:

AssessmentID EmployeeID CourseID SuggestedTraining

1 12 3 Training Course 2
3 59 7 Training Course 1
6 8 16 Training Course 1
7 10 3
8 4 8 Training Course 2
9 9 8 Training Course 2
10 132 7 Training Course 1
11 25 NULL Training Course 1
12 12 7 Training Course 1
13 13 NULL TEST TRAIN
14 54 7 Training Course 1
15 28 NULL
16 26 NULL
17 29 NULL
18 26 7 Training Course 1
19 6 NULL
20 84 7 Training Course 1
21 139 7 Training Course 1
22 13 NULL
23 82 7 Training Course 1
24 127 7 Training Course 1
25 64 7 Training Course 1
26 4 8 Training Course 1
27 150 9 Training Course 3
28 150 7 Training Course 1
29 12 7 Training Course 1
30 9 9 Training Course 3
31 13 8 Training Course 2
32 177 NULL
33 179 3 Training Course 2
34 180 8 Training Course 2
35 180 8 Training Course 2
36 180 NULL
37 179 16 Test Training Course
38 82 16 Test Training Course
39 9 7 Training Course 1
40 12 7 Training Course 1
41 13 7 Training Course 1
42 9 9 Training Course 3
43 9 7 Training Course 1
44 13 9 Training Course 3
45 13 9 Training Course 3
46 12 9 Training Course 3
47 9 7 Training Course 1
48 12 7 Training Course 1
49 87 7 Training Course 1
50 9 NULL
51 72 7 Training Course 1
52 83 9 Training Course 1
53 82 7 Training Course 1
54 9 5 Training
55 12 NULL
56 7 NULL
57 9 NULL
58 8 NULL

|||

I've re-arranged the filters in the query to filter out the rows that you don't need - see below. This now returns the results as specified in your previous post.

Chris

Code Snippet

DECLARE @.AssessmentTable TABLE

(

AssessmentID INT,

EmployeeID INT,

CourseID INT,

SuggestedTraining VARCHAR(20)

)

INSERT INTO @.AssessmentTable (AssessmentID, EmployeeID, CourseID, SuggestedTraining)

SELECT 1, 12, 3, 'Training Course 2' UNION ALL

SELECT 3, 59, 7, 'Training Course 1' UNION ALL

SELECT 6, 8, 16, 'Training Course 1' UNION ALL

SELECT 7, 10, 3, '' UNION ALL

SELECT 8, 4, 8, 'Training Course 2' UNION ALL

SELECT 9, 9, 8, 'Training Course 2' UNION ALL

SELECT 10, 132, 7, 'Training Course 1' UNION ALL

SELECT 11, 25, NULL, 'Training Course 1' UNION ALL

SELECT 12, 12, 7, 'Training Course 1' UNION ALL

SELECT 13, 13, NULL, 'TEST TRAIN' UNION ALL

SELECT 14, 54, 7, 'Training Course 1' UNION ALL

SELECT 15, 28, NULL, '' UNION ALL

SELECT 16, 26, NULL, '' UNION ALL

SELECT 17, 29, NULL, '' UNION ALL

SELECT 18, 26, 7, 'Training Course 1' UNION ALL

SELECT 19, 6, NULL, '' UNION ALL

SELECT 20, 84, 7, 'Training Course 1' UNION ALL

SELECT 21, 139, 7, 'Training Course 1' UNION ALL

SELECT 22, 13, NULL, '' UNION ALL

SELECT 23, 82, 7, 'Training Course 1' UNION ALL

SELECT 24, 127, 7, 'Training Course 1' UNION ALL

SELECT 25, 64, 7, 'Training Course 1' UNION ALL

SELECT 26, 4, 8, 'Training Course 1' UNION ALL

SELECT 27, 150, 9, 'Training Course 3' UNION ALL

SELECT 28, 150, 7, 'Training Course 1' UNION ALL

SELECT 29, 12, 7, 'Training Course 1' UNION ALL

SELECT 30, 9, 9, 'Training Course 3' UNION ALL

SELECT 31, 13, 8, 'Training Course 2' UNION ALL

SELECT 32, 177, NULL, '' UNION ALL

SELECT 33, 179, 3, 'Training Course 2' UNION ALL

SELECT 34, 180, 8, 'Training Course 2' UNION ALL

SELECT 35, 180, 8, 'Training Course 2' UNION ALL

SELECT 36, 180, NULL, '' UNION ALL

SELECT 37, 179, 16, 'Test Training Course' UNION ALL

SELECT 38, 82, 16, 'Test Training Course' UNION ALL

SELECT 39, 9, 7, 'Training Course 1' UNION ALL

SELECT 40, 12, 7, 'Training Course 1' UNION ALL

SELECT 41, 13, 7, 'Training Course 1' UNION ALL

SELECT 42, 9, 9, 'Training Course 3' UNION ALL

SELECT 43, 9, 7, 'Training Course 1' UNION ALL

SELECT 44, 13, 9, 'Training Course 3' UNION ALL

SELECT 45, 13, 9, 'Training Course 3' UNION ALL

SELECT 46, 12, 9, 'Training Course 3' UNION ALL

SELECT 47, 9, 7, 'Training Course 1' UNION ALL

SELECT 48, 12, 7, 'Training Course 1' UNION ALL

SELECT 49, 87, 7, 'Training Course 1' UNION ALL

SELECT 50, 9, NULL, '' UNION ALL

SELECT 51, 72, 7, 'Training Course 1' UNION ALL

SELECT 52, 83, 9, 'Training Course 1' UNION ALL

SELECT 53, 82, 7, 'Training Course 1' UNION ALL

SELECT 54, 9, 5, 'Training' UNION ALL

SELECT 55, 12, NULL, '' UNION ALL

SELECT 56, 7, NULL, '' UNION ALL

SELECT 57, 9, NULL, '' UNION ALL

SELECT 58, 8, NULL, ''

SELECT MAX(at.AssessmentID) AS AssessmentID,

at.EmployeeID,

at.CourseID,

at.SuggestedTraining

--, MAX(at.DateColumn)

FROM @.AssessmentTable at

INNER JOIN (

SELECT atg.EmployeeID

FROM @.AssessmentTable atg

WHERE atg.SuggestedTraining <> ''

GROUP BY atg.EmployeeID

HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2

) t ON t.EmployeeID = at.EmployeeID

WHERE at.SuggestedTraining <> ''

GROUP BY at.EmployeeID,

at.CourseID,

at.SuggestedTraining

ORDER BY 1

|||Thanks Chris, right now, based on data I have, this problem is resolved, when another problem arises...I post it here...I have marked your last reply as answer...

No comments:

Post a Comment