Selective change tracking in a column

  • Hi,

    I have to count how many times a record changed from group 2 to group 1 or 3 per ID ordered by RowID

    This is my table:

    RowID ID  GROUP
    1      1   1
    2      1   2
    3      1   1
    4      1   2
    5      1   3
    6      1   2
    7      1   4
    8      1   5
    9      1   2
    1      2   2
    2      2   1
    3      2   3

    The end table should look like this:

    RowID ID  GROUP
    3     1     1
    5     1     3
    2     2     1

    This is where I'm trying to explain this a bit more:
    So for all ID 1 records check how many times it was in group 2 then if it was in group 2 then check if the next record (based on the RowID) is in group 1 or 3 ignore rest, keep all the records where it changed from group 2 to 1 or 3
    and do the same for ID 2,3,4 and so on.....

    There must be a simple way to do this I just can't figure it out

    Please help

    Thank you

  • I figured it out 😀 I just leave the code here for future generations


    SELECT B1.*
    FROM
        (
            SELECT *,ROW_NUMBER() over(partition by B1.ID order by B1.RowID desc) as P_count
            FROM #tempbase B1
        ) B1
        LEFT JOIN
        (
            SELECT *,ROW_NUMBER() over(partition by B2.ID order by B2.RowID desc) as P_count2
            FROM #tempbase B2
        ) B2
        ON B1.ID = B2.ID
        AND B2.Group = 2
        AND B1.Group IN(1,3)
        AND B1.P_count = B2.P_count2-1
    WHERE B2.ID IS NOT NULL

  • The problem with this approach is that it requires two scans of the base table.  Here is an approach that only requires one scan of the table.

    ;
    WITH Chg AS
    (
        SELECT *, LAG(Grp) OVER(PARTITION BY ID ORDER BY RowID) AS PrevGroup
        FROM #tempbase
    )
    SELECT RowID, ID, Grp
    FROM Chg
    WHERE Grp IN (1,3)
        AND PrevGroup = 2
    ;

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, October 19, 2017 8:48 AM

    The problem with this approach is that it requires two scans of the base table.  Here is an approach that only requires one scan of the table.

    ;
    WITH Chg AS
    (
        SELECT *, LAG(Grp) OVER(PARTITION BY ID ORDER BY RowID) AS PrevGroup
        FROM #tempbase
    )
    SELECT RowID, ID, Grp
    FROM Chg
    WHERE Grp IN (1,3)
        AND PrevGroup = 2
    ;

    Drew

    I recommend always being fully expressive in code and not relying on defaults. So flesh out that LAG so everyone that reads/debugs/etc the code knows explicitly what it is doing.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Thursday, October 19, 2017 1:54 PM

    drew.allen - Thursday, October 19, 2017 8:48 AM

    The problem with this approach is that it requires two scans of the base table.  Here is an approach that only requires one scan of the table.

    ;
    WITH Chg AS
    (
        SELECT *, LAG(Grp) OVER(PARTITION BY ID ORDER BY RowID) AS PrevGroup
        FROM #tempbase
    )
    SELECT RowID, ID, Grp
    FROM Chg
    WHERE Grp IN (1,3)
        AND PrevGroup = 2
    ;

    Drew

    I recommend always being fully expressive in code and not relying on defaults. So flesh out that LAG so everyone that reads/debugs/etc the code knows explicitly what it is doing.

    I don't think there is a right or wrong here.  There are merits to both approaches.  I prefer leaving out the defaults if they match what I would use anyhow, because it's less typing.  The only time that I have seen an issue with using the defaults is in the frame for windowed functions, because many people are unaware of when a default frame is applied.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I've got to go with Kevin on this one... My soul dies a little every time I review a script that has a CREATE TABLE script that looks like this...

    CREATE TABLE ImAnAss (
        ID INT IDENTITY PRIMARY KEY,
        SomeColumn VARCHAR,
        AnotherColumn DECIMAL DEFAULT (0),
        TheDate DATETIME DEFAULT (GETDATE())
        );

  • Jason A. Long - Thursday, October 19, 2017 3:25 PM

    I've got to go with Kevin on this one... My soul dies a little every time I review a script that has a CREATE TABLE script that looks like this...

    CREATE TABLE ImAnAss (
        ID INT IDENTITY PRIMARY KEY,
        SomeColumn VARCHAR,
        AnotherColumn DECIMAL DEFAULT (0),
        TheDate DATETIME DEFAULT (GETDATE())
        );

    I think that tables and variables are different, because they should be designed to match the expected data.  Expected data varies too widely to rely on any defaults.  I also like naming my constraints so that they're meaningful, which means that they would be different from the default names provided.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • The most important thing you said there Drew was "...should be...". But sadly they aren't. Probably 99% of all columns I have ever seen in SQL Server applications (25 years now) are NULLable (because that is the default), and the problems I have seen because of that are innumerable.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Friday, October 20, 2017 11:57 AM

    The most important thing you said there Drew was "...should be...". But sadly they aren't. Probably 99% of all columns I have ever seen in SQL Server applications (25 years now) are NULLable (because that is the default), and the problems I have seen because of that are innumerable.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply