Showing posts with label columns. Show all posts
Showing posts with label columns. Show all posts

Tuesday, March 20, 2012

[SSIS] : STDEV in Derived Columns

Hello,

Does anyone have already tried to calculate a standard deviation (STDEV) in a derived column ?

Any help is welcome ;-)

Cheers,

Bertrand

Bertrandr wrote:

Hello,

Does anyone have already tried to calculate a standard deviation (STDEV) in a derived column ?

Any help is welcome ;-)

Cheers,

Bertrand

Have you looked into the documentation?

http://msdn2.microsoft.com/en-us/library/ms141671.aspx

it looks to me thta function is not available...you may want to do that in the DB or create a script component.

|||

Haven't calculated standard deviation in a derived column, but rather in an Async script component (i.e. Synchronous Input Id = 0 on the first output). The data being passed in to this transform is DT_CY data type (currency) called TotalDue. The data passed out is a DT_R8 on the output called StdDev.

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain

Inherits UserComponent

Private rowCount As Integer = 0

Private columnSum As Double = 0

Private columnSumOfSquares As Double = 0

Public Overrides Sub InputMain_ProcessInput(ByVal Buffer As InputMainBuffer)

While Buffer.NextRow()

InputMain_ProcessInputRow(Buffer)

End While

If Buffer.EndOfRowset Then

With OutputMainBuffer

.AddRow()

.StdDev = StdDev(rowCount, columnSum, columnSumOfSquares)

End With

OutputMainBuffer.SetEndOfRowset()

End If

End Sub

Public Overrides Sub InputMain_ProcessInputRow(ByVal Row As InputMainBuffer)

rowCount = rowCount + 1

columnSum = columnSum + Row.TotalDue

columnSumOfSquares = columnSumOfSquares + Math.Pow(Row.TotalDue, 2)

End Sub

Private Function StdDev(ByVal Rows As Integer, ByVal sum As Double, ByVal sumOfSquares As Double) As Double

Dim topSumVal As Double = (rowCount * columnSumOfSquares) - (Math.Pow(columnSum, 2))

Return Math.Sqrt(topSumVal / (rowCount * (rowCount - 1)))

End Function

End Class

|||Bertrand,
There has been a feature suggestion posted for SSIS with respect to statistics. Please visit the link and vote. Please add your comments as well.

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=253840

Thanks,
Phil|||

Phil Brammer wrote:

Bertrand,
There has been a feature suggestion posted for SSIS with respect to statistics. Please visit the link and vote. Please add your comments as well.

https://connect.microsoft.com/feedback/viewfeedback.aspx?FeedbackID=253840

Thanks,
Phil

Phil,

Are you sure that's the right link? it takes me to 'Page Not Found'.

-Jamie

|||

Jamie Thomson wrote:

Phil Brammer wrote:

Bertrand,
There has been a feature suggestion posted for SSIS with respect to statistics. Please visit the link and vote. Please add your comments as well.

https://connect.microsoft.com/feedback/viewfeedback.aspx?FeedbackID=253840

Thanks,
Phil

Phil,

Are you sure that's the right link? it takes me to 'Page Not Found'.

-Jamie

Good catch... When I visited the link, it redirected to a URL without "SQLServer" in the path... The link has been updated in my original post...

Sunday, March 11, 2012

Getting Unique entire row records based on one unique column..

Hi,

I like to get all records(all columns) for any employee who has more than 2 unique SuggestedTraining..As simple as possible and execution (performance optimized) speed is as fast as possible.

Data:

Table structure below

Assessment Table

AssessmentID EmployeeID CourseID SuggestedTraining Date ....more columns
1 4 1 'Training 1' <date>
2 2 2 'Training 2' <date>
3 1 4 'Training 1' <date>
4 2 1 'Training 2' <date>
5 4 2 'Training 1' <date>
6 3 3 '' <date>
7 2 5 'Training 3' <date>
8 5 7 'Training 2' <date>
9 7 4 'Training 2' <date>
10 5 3 'Training 1' <date>
11 5 6 'Training 3' <date>

Desired Result:

AssessmentID EmployeeID CourseID SuggestedTraining Date
4 2 1 'Training 2' <date>
7 2 5 'Training 3' <date>
8 5 7 'Training 2' <date>
10 5 3 'Training 1' <date>
11 5 6 'Training 3' <date>

Try the example below.

Chris

Code Snippet

DECLARE @.AssessmentTable TABLE

(

AssessmentID INT,

EmployeeID INT,

CourseID INT,

SuggestedTraining VARCHAR(20)

)

INSERT INTO @.AssessmentTable(AssessmentID, EmployeeID, CourseID, SuggestedTraining)

SELECT 1, 4, 1, 'Training 1' UNION ALL

SELECT 2, 2, 2, 'Training 2' UNION ALL

SELECT 3, 1, 4, 'Training 1' UNION ALL

SELECT 4, 2, 1, 'Training 2' UNION ALL

SELECT 5, 4, 2, 'Training 1' UNION ALL

SELECT 6, 3, 3, '' UNION ALL

SELECT 7, 2, 5, 'Training 3' UNION ALL

SELECT 8, 5, 7, 'Training 2' UNION ALL

SELECT 9, 7, 4, 'Training 2' UNION ALL

SELECT 10, 5, 3, 'Training 1' UNION ALL

SELECT 11, 5, 6, 'Training 3'

SELECT at.AssessmentID,

at.EmployeeID,

at.CourseID,

at.SuggestedTraining

FROM @.AssessmentTable at

INNER JOIN (SELECT atg.EmployeeID

FROM @.AssessmentTable atg

GROUP BY atg.EmployeeID

HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2) t

ON t.EmployeeID = at.EmployeeID

|||

Hi Chris,

I cannot do the insertion because simply I do not know the data at runtime,

Code Snippet

INSERT INTO @.AssessmentTable(AssessmentID, EmployeeID, CourseID, SuggestedTraining)

SELECT 1, 4, 1, 'Training 1' UNION ALL

SELECT 2, 2, 2, 'Training 2' UNION ALL

SELECT 3, 1, 4, 'Training 1' UNION ALL

SELECT 4, 2, 1, 'Training 2' UNION ALL

SELECT 5, 4, 2, 'Training 1' UNION ALL

SELECT 6, 3, 3, '' UNION ALL

SELECT 7, 2, 5, 'Training 3' UNION ALL

SELECT 8, 5, 7, 'Training 2' UNION ALL

SELECT 9, 7, 4, 'Training 2' UNION ALL

SELECT 10, 5, 3, 'Training 1' UNION ALL

SELECT 11, 5, 6, 'Training 3'

The desired result is to get records (all columns) of table based on Employees having more than 2 Suggested Training, empty Suggested Training and Duplicated Training are excluded...

|||

It is just sample data Chris offered to demonstrate his suggested solution.

Run all of the code as Chris posted to verify if the solution works for you.

According to your specifications, the Date is immaterial for the solution. If that is NOT correct, then refine your desired outcome so we can refine the suggested solutions.

|||

Hi,

My question how do I replace this part of script to be adaptable to actual data.

Code Snippet

SELECT 1, 4, 1, 'Training 1' UNION ALL

SELECT 2, 2, 2, 'Training 2' UNION ALL

SELECT 3, 1, 4, 'Training 1' UNION ALL

SELECT 4, 2, 1, 'Training 2' UNION ALL

SELECT 5, 4, 2, 'Training 1' UNION ALL

SELECT 6, 3, 3, '' UNION ALL

SELECT 7, 2, 5, 'Training 3' UNION ALL

SELECT 8, 5, 7, 'Training 2' UNION ALL

SELECT 9, 7, 4, 'Training 2' UNION ALL

SELECT 10, 5, 3, 'Training 1' UNION ALL

SELECT 11, 5, 6, 'Training 3'

Try to replace entire code it with this,

Code Snippet

DECLARE @.AssessmentTable TABLE

(

AssessmentID INT,

EmployeeID INT,

CatalogueCourseID INT,

SuggestedTraining nVARCHAR(500),

CompetencyModelID int

)

Insert Into @.AssessmentTable(AssessmentID, EmployeeID, CatalogueCourseID, SuggestedTraining, CompetencyModelID)

Select AssessmentID, EmployeeID, CatalogueCourseID, SuggestedTraining, CompetencyModelID From Assessment

Where IsActive =1 AND SuggestedTraining <> ''

SELECT at.AssessmentID,at.EmployeeID,at.CatalogueCourseID,at.SuggestedTraining FROM @.AssessmentTable at

INNER JOIN (SELECT atg.EmployeeID FROM @.AssessmentTable atg GROUP BY atg.EmployeeID HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2) t

ON t.EmployeeID = at.EmployeeID

Actual Result:

AssessmentID EmployeeID CatalogueCourseID SuggestedTraining
9 9 8 Training Course 2
30 9 9 Training Course 3
42 9 9 Training Course 3 << Duplicate
43 9 7 Training Course 1
47 9 7 Training Course 1 << Duplicate
54 9 5 Training
48 12 7 Training Course 1
40 12 7 Training Course 1 << Duplicate
46 12 9 Training Course 3
..... More data below....

Still Desired Result is not achieved...

|||

As Arnie pointed out in a response to your previous post, you only need to use the following:

Code Snippet

SELECT at.AssessmentID,

at.EmployeeID,

at.CourseID,

at.SuggestedTraining

FROM @.AssessmentTable at

INNER JOIN (SELECT atg.EmployeeID

FROM @.AssessmentTable atg

GROUP BY atg.EmployeeID

HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2) t

ON t.EmployeeID = at.EmployeeID

The CREATE TABLE / INSERT statements were included in my original post solely to show that the concept works. All you need to do is take the above sample and replace '@.AssessmentTable' with your actual table name.

Chris

|||

Hi Chris,

I tried this code slightly modified it to exclude empty SuggestedTraining, and still there are duplicate Suggested Training in the results. How can I remove duplicated SuggestedTraining in the result set. Thanks.

Code Snippet

SELECT at.AssessmentID, at.EmployeeID, at.CatalogueCourseID, at.SuggestedTraining FROM Assessment at

INNER JOIN (SELECT atg.EmployeeID FROM Assessment atg Where SuggestedTraining <> ''

GROUP BY atg.EmployeeID HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2) t

ON t.EmployeeID = at.EmployeeID

|||

For the duplicates, which AssessmentID would you want to return - the maximum or the minimum? [therein lies a clue to the solution... Wink ]

Chris

|||

Well,

Maximum is the choice, right now date is not considered, how would the sql statement would be if both maximum and date (latest) is considered?

|||

Try the code below.

Chris

Code Snippet

SELECT MAX(at.AssessmentID),

at.EmployeeID,

at.CourseID,

at.SuggestedTraining

--, MAX(at.DateColumn)

FROM @.AssessmentTable at

INNER JOIN (

SELECT atg.EmployeeID

FROM @.AssessmentTable atg

GROUP BY atg.EmployeeID

HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2

) t ON t.EmployeeID = at.EmployeeID

GROUP BY at.EmployeeID,

at.CourseID,

at.SuggestedTraining

|||

I run the sql script you provided and here is the result:

AssessmentID EmployeeID CourseID SuggestedTraining

26 4 8 Training Course 1
8 4 8 Training Course 2
58 8 NULL <<-- Should Not be included
6 8 16 Training Course 1 <<-- Should Not be included
57 9 NULL <<-- Should Not be included
54 9 5 Training
47 9 7 Training Course 1
9 9 8 Training Course 2
42 9 9 Training Course 3
55 12 NULL <<-- Should Not be included
1 12 3 Training Course 2
48 12 7 Training Course 1
46 12 9 Training Course 3
22 13 NULL <<-- Should Not be included
13 13 NULL TEST TRAIN
41 13 7 Training Course 1
31 13 8 Training Course 2
45 13 9 Training Course 3
16 26 NULL <<-- Should Not be included
18 26 7 Training Course 1 <<-- Should Not be included
53 82 7 Training Course 1
38 82 16 Test Training Course
28 150 7 Training Course 1
27 150 9 Training Course 3
33 179 3 Training Course 2
37 179 16 Test Training Course
36 180 NULL <<-- Should Not be included
35 180 8 Training Course 2 <<-- Should Not be included

I filter the empty Suggested Training using this modified sql script:

Code Snippet

SELECT MAX(at.AssessmentID), at.EmployeeID, at.CatalogueCourseID, at.SuggestedTraining FROM Assessment at

INNER JOIN (SELECT atg.EmployeeID FROM Assessment atg GROUP BY atg.EmployeeID HAVING COUNT

(DISTINCT atg.SuggestedTraining) >= 2) t ON t.EmployeeID = at.EmployeeID Where at.SuggestedTraining <> ''

GROUP BY at.EmployeeID, at.CatalogueCourseID,at.SuggestedTraining

And still did not get desired result...

AssessmentID EmployeeID CourseID SuggestedTraining

26 4 8 Training Course 1
8 4 8 Training Course 2
6 8 16 Training Course 1 <<-- Should Not be included
54 9 5 Training
47 9 7 Training Course 1
9 9 8 Training Course 2
42 9 9 Training Course 3
1 12 3 Training Course 2
48 12 7 Training Course 1
46 12 9 Training Course 3
13 13 NULL TEST TRAIN
41 13 7 Training Course 1
31 13 8 Training Course 2
45 13 9 Training Course 3
18 26 7 Training Course 1 <<-- Should Not be included
53 82 7 Training Course 1
38 82 16 Test Training Course
28 150 7 Training Course 1
27 150 9 Training Course 3
33 179 3 Training Course 2
37 179 16 Test Training Course
35 180 8 Training Course 2 <<-- Should Not be included

Here is Whole Data for Assessment table:

AssessmentID EmployeeID CourseID SuggestedTraining

1 12 3 Training Course 2
3 59 7 Training Course 1
6 8 16 Training Course 1
7 10 3
8 4 8 Training Course 2
9 9 8 Training Course 2
10 132 7 Training Course 1
11 25 NULL Training Course 1
12 12 7 Training Course 1
13 13 NULL TEST TRAIN
14 54 7 Training Course 1
15 28 NULL
16 26 NULL
17 29 NULL
18 26 7 Training Course 1
19 6 NULL
20 84 7 Training Course 1
21 139 7 Training Course 1
22 13 NULL
23 82 7 Training Course 1
24 127 7 Training Course 1
25 64 7 Training Course 1
26 4 8 Training Course 1
27 150 9 Training Course 3
28 150 7 Training Course 1
29 12 7 Training Course 1
30 9 9 Training Course 3
31 13 8 Training Course 2
32 177 NULL
33 179 3 Training Course 2
34 180 8 Training Course 2
35 180 8 Training Course 2
36 180 NULL
37 179 16 Test Training Course
38 82 16 Test Training Course
39 9 7 Training Course 1
40 12 7 Training Course 1
41 13 7 Training Course 1
42 9 9 Training Course 3
43 9 7 Training Course 1
44 13 9 Training Course 3
45 13 9 Training Course 3
46 12 9 Training Course 3
47 9 7 Training Course 1
48 12 7 Training Course 1
49 87 7 Training Course 1
50 9 NULL
51 72 7 Training Course 1
52 83 9 Training Course 1
53 82 7 Training Course 1
54 9 5 Training
55 12 NULL
56 7 NULL
57 9 NULL
58 8 NULL

|||

I've re-arranged the filters in the query to filter out the rows that you don't need - see below. This now returns the results as specified in your previous post.

Chris

Code Snippet

DECLARE @.AssessmentTable TABLE

(

AssessmentID INT,

EmployeeID INT,

CourseID INT,

SuggestedTraining VARCHAR(20)

)

INSERT INTO @.AssessmentTable (AssessmentID, EmployeeID, CourseID, SuggestedTraining)

SELECT 1, 12, 3, 'Training Course 2' UNION ALL

SELECT 3, 59, 7, 'Training Course 1' UNION ALL

SELECT 6, 8, 16, 'Training Course 1' UNION ALL

SELECT 7, 10, 3, '' UNION ALL

SELECT 8, 4, 8, 'Training Course 2' UNION ALL

SELECT 9, 9, 8, 'Training Course 2' UNION ALL

SELECT 10, 132, 7, 'Training Course 1' UNION ALL

SELECT 11, 25, NULL, 'Training Course 1' UNION ALL

SELECT 12, 12, 7, 'Training Course 1' UNION ALL

SELECT 13, 13, NULL, 'TEST TRAIN' UNION ALL

SELECT 14, 54, 7, 'Training Course 1' UNION ALL

SELECT 15, 28, NULL, '' UNION ALL

SELECT 16, 26, NULL, '' UNION ALL

SELECT 17, 29, NULL, '' UNION ALL

SELECT 18, 26, 7, 'Training Course 1' UNION ALL

SELECT 19, 6, NULL, '' UNION ALL

SELECT 20, 84, 7, 'Training Course 1' UNION ALL

SELECT 21, 139, 7, 'Training Course 1' UNION ALL

SELECT 22, 13, NULL, '' UNION ALL

SELECT 23, 82, 7, 'Training Course 1' UNION ALL

SELECT 24, 127, 7, 'Training Course 1' UNION ALL

SELECT 25, 64, 7, 'Training Course 1' UNION ALL

SELECT 26, 4, 8, 'Training Course 1' UNION ALL

SELECT 27, 150, 9, 'Training Course 3' UNION ALL

SELECT 28, 150, 7, 'Training Course 1' UNION ALL

SELECT 29, 12, 7, 'Training Course 1' UNION ALL

SELECT 30, 9, 9, 'Training Course 3' UNION ALL

SELECT 31, 13, 8, 'Training Course 2' UNION ALL

SELECT 32, 177, NULL, '' UNION ALL

SELECT 33, 179, 3, 'Training Course 2' UNION ALL

SELECT 34, 180, 8, 'Training Course 2' UNION ALL

SELECT 35, 180, 8, 'Training Course 2' UNION ALL

SELECT 36, 180, NULL, '' UNION ALL

SELECT 37, 179, 16, 'Test Training Course' UNION ALL

SELECT 38, 82, 16, 'Test Training Course' UNION ALL

SELECT 39, 9, 7, 'Training Course 1' UNION ALL

SELECT 40, 12, 7, 'Training Course 1' UNION ALL

SELECT 41, 13, 7, 'Training Course 1' UNION ALL

SELECT 42, 9, 9, 'Training Course 3' UNION ALL

SELECT 43, 9, 7, 'Training Course 1' UNION ALL

SELECT 44, 13, 9, 'Training Course 3' UNION ALL

SELECT 45, 13, 9, 'Training Course 3' UNION ALL

SELECT 46, 12, 9, 'Training Course 3' UNION ALL

SELECT 47, 9, 7, 'Training Course 1' UNION ALL

SELECT 48, 12, 7, 'Training Course 1' UNION ALL

SELECT 49, 87, 7, 'Training Course 1' UNION ALL

SELECT 50, 9, NULL, '' UNION ALL

SELECT 51, 72, 7, 'Training Course 1' UNION ALL

SELECT 52, 83, 9, 'Training Course 1' UNION ALL

SELECT 53, 82, 7, 'Training Course 1' UNION ALL

SELECT 54, 9, 5, 'Training' UNION ALL

SELECT 55, 12, NULL, '' UNION ALL

SELECT 56, 7, NULL, '' UNION ALL

SELECT 57, 9, NULL, '' UNION ALL

SELECT 58, 8, NULL, ''

SELECT MAX(at.AssessmentID) AS AssessmentID,

at.EmployeeID,

at.CourseID,

at.SuggestedTraining

--, MAX(at.DateColumn)

FROM @.AssessmentTable at

INNER JOIN (

SELECT atg.EmployeeID

FROM @.AssessmentTable atg

WHERE atg.SuggestedTraining <> ''

GROUP BY atg.EmployeeID

HAVING COUNT(DISTINCT atg.SuggestedTraining) >= 2

) t ON t.EmployeeID = at.EmployeeID

WHERE at.SuggestedTraining <> ''

GROUP BY at.EmployeeID,

at.CourseID,

at.SuggestedTraining

ORDER BY 1

|||Thanks Chris, right now, based on data I have, this problem is resolved, when another problem arises...I post it here...I have marked your last reply as answer...

Sunday, February 19, 2012

[DTS Transformation]

Hello I trying to do some transformation using javascript. I have a table with 40 columns that I need dynamicaly mapped to another table. For example I need to test each Column of tableA if Col1 of TableA looks like an email address map it to TableB email if it looks like a zip map it to Email...

I need to do this with Javascript

if someone could point me in the right direction that would be great

lets say I start small test each col in TableA for an @. sign if I find it map to Email if I can get the idea of how to do that I can do this

thanksok I pretty much got it to work by adding JScript to the ActiveX Transformation Properties. By using pattern matching I was able to get the the data to insert into the correct fields. However one of the problems I am having is when I come across a field with a NULL value. Is there a way to test for NULL. I tried if(v != NULL) but the JScript in the DTS does not seam to know what NULL means?

Thanks for the help

Monday, February 13, 2012

[*-)]how to delete all columns except one column for the corresponding column name......?

hi friends,

i've a table with (columns) username, content,data,......... i need to delete all column names(i.e.,content,data,......) except username for the specified username. eg: consider username=mahendran. i've to delete the values in the content,data,............for the username=mahendran. but username should exist.

how to do that?pls help me.....Confused

Use the following

DataTable.Columns.Remove takes the column name or column reference as argument.

To remove column using it's index, you can use DataTable.Columns.RemoveAt method.

|||

hi farooq,

i need the SQL query for delete command. i wrote

delete [contact],[title] from <table_name> where userid=<user_id>

but it is not working.what is the correct format?

|||

If you want to change your table structure, you useAlter Table along withDrop Column instead of Delete which is for your records. Here is a sample:

ALTERTABLE yourTable

DROPCOLUMN col2, col3

You can find more information about SQL DML(SQL Data Manipulation Language) and DDL(SQL Data Definition Language) if you are interested about the topic.