Remove errant readings

  • I have an ETL that imports readings from a flat file. The ReadingDate on the data is always good, it's the ReadingHours that are suspect. How do I flag and remove errant readings?

    Here's the code to build a sample table.

    USE [tempdb];

    SET NOCOUNT ON;

    SET XACT_ABORT ON;

    GO

    BEGIN TRANSACTION;

    USE [tempdb]

    GO

    CREATE TABLE [dbo].[Readings](

    [mrid] [bigint] NULL,

    [AssetId] [int] NOT NULL,

    [ReadingHours] [decimal](18, 2) NULL,

    [ReadingDate] [date] NULL,

    [ReadingOrderByReadingDate] [bigint] NULL,

    [ReadingOrderByReadingHours] [bigint] NULL

    );

    INSERT INTO [dbo].[Readings]([mrid], [AssetId], [ReadingHours], [ReadingDate], [ReadingOrderByReadingDate], [ReadingOrderByReadingHours])

    SELECT 1, 84019, 1227.00, '20100518 00:00:00.000', 1, 1 UNION ALL

    SELECT 2, 84019, 1256.00, '20100519 00:00:00.000', 2, 2 UNION ALL

    SELECT 3, 84019, 1395.00, '20100915 00:00:00.000', 3, 4 UNION ALL

    SELECT 4, 84019, 1440.00, '20100930 00:00:00.000', 4, 5 UNION ALL

    SELECT 5, 84019, 1490.00, '20101119 00:00:00.000', 5, 6 UNION ALL

    SELECT 6, 84019, 1302.00, '20110228 00:00:00.000', 6, 3 UNION ALL

    SELECT 7, 84019, 1602.00, '20110309 00:00:00.000', 7, 7 UNION ALL

    SELECT 8, 84019, 1687.00, '20110630 00:00:00.000', 8, 8 UNION ALL

    SELECT 9, 84019, 8935.00, '20110914 00:00:00.000', 9, 27 UNION ALL

    SELECT 10, 84019, 1795.00, '20111014 00:00:00.000', 10, 9 UNION ALL

    SELECT 11, 84019, 1894.00, '20120321 00:00:00.000', 11, 10 UNION ALL

    SELECT 12, 84019, 1901.00, '20120330 00:00:00.000', 12, 11 UNION ALL

    SELECT 13, 84019, 1919.00, '20120425 00:00:00.000', 13, 12 UNION ALL

    SELECT 14, 84019, 1942.00, '20120606 00:00:00.000', 14, 13 UNION ALL

    SELECT 15, 84019, 2615.00, '20120921 00:00:00.000', 15, 26 UNION ALL

    SELECT 16, 84019, 2047.00, '20121023 00:00:00.000', 16, 14 UNION ALL

    SELECT 17, 84019, 2057.00, '20121031 00:00:00.000', 17, 16 UNION ALL

    SELECT 18, 84019, 2054.00, '20121108 00:00:00.000', 18, 15 UNION ALL

    SELECT 19, 84019, 2070.00, '20121126 00:00:00.000', 19, 17 UNION ALL

    SELECT 20, 84019, 2102.00, '20130211 00:00:00.000', 20, 18 UNION ALL

    SELECT 21, 84019, 2106.00, '20130214 00:00:00.000', 21, 19 UNION ALL

    SELECT 22, 84019, 2108.00, '20130219 00:00:00.000', 22, 20 UNION ALL

    SELECT 23, 84019, 2120.00, '20130315 00:00:00.000', 23, 21 UNION ALL

    SELECT 24, 84019, 2156.00, '20130520 00:00:00.000', 24, 22 UNION ALL

    SELECT 25, 84019, 2162.00, '20130530 00:00:00.000', 25, 23 UNION ALL

    SELECT 26, 84019, 2172.00, '20130610 00:00:00.000', 26, 24 UNION ALL

    SELECT 27, 84019, 2213.00, '20131121 00:00:00.000', 27, 25

    COMMIT;

    RAISERROR (N'[dbo].[Readings]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;

    GO

    In this instance, readings on lines: 6, 9, & 15 should be removed. Line six is bad, period. Line 9 is bad based on the fact that lines 10 - 14 are ever increasing and probably good and would represent better data. If I imported line 9, the rest of the data would be skipped. Line 15 would be considered bad because 16 - 27 are ever increasing and would be considered good readings (and the same problem with line 9, If i import line 15, lines 16 through 27 would be skipped) They are bad values, but the only way to know they are bad are to look at all of the values in the group.

    I've found a million different ways that don't work, but I'm to the point where I'm rehashing bad ideas.

    Can I get some fresh ones?

  • what happens for line 18?

    SELECT 17, 84019, 2057.00, '20121031 00:00:00.000', 17, 16 UNION ALL

    SELECT 18, 84019, 2054.00, '20121108 00:00:00.000', 18, 15 UNION ALL

    SELECT 19, 84019, 2070.00, '20121126 00:00:00.000', 19, 17 UNION ALL

    is this an errant row?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • This is an option using 2 deletes. I'm assuming that you don't have gaps on mrid, if you have gaps, you'll need to create an artificial key.

    DELETE r

    FROM Readings r

    JOIN Readings r1

    ON r1.mrid = r.mrid - 1

    AND r1.ReadingHours > r.ReadingHours

    JOIN Readings r2

    ON r2.mrid = r.mrid + 1

    AND r1.ReadingHours < r2.ReadingHours

    DELETE r

    FROM Readings r

    JOIN Readings r1

    ON r1.mrid = r.mrid + 1

    AND r1.ReadingHours < r.ReadingHours

    I'm not sure if you can avoid importing the rows instead of importing all rows and cleaning afterwards.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Yep, line 18 was supposed to be bad as well. I missed it.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply