Parsing millions of records without index

  • I have 4 columns CC, YY, MM, DD and they have records like

    '20', '10', '5', '23'

    To make this work as a time stamp, I have the following query

    ;WITH temp AS(

    SELECT ID,

    (isnull(CONVERT(char(2),CC), (CASE WHEN YY Between 80 and 99 THEN '19' ELSE '20' END)) +

    CASE WHEN YY < 10 THEN '0'+ CONVERT(char(1),YY) ELSE CONVERT(char(2),YY) END + '-'+

    CASE WHEN MM < 10 THEN '0'+ CONVERT(char(1),MM) ELSE CONVERT(char(2),MM) END + '-'+

    CASE WHEN DD < 10 THEN '0'+ CONVERT(char(1),DD) ELSE CONVERT(char(2),DD) END) AS DayNumber)

    SELECT COUNT(*) FROM temp

    LEFT OUTER JOIN tableA where CONVERT(DATETIME, temp.DayNumber,121) >= tableA.startdate and CONVERT(DATETIME, temp.DayNumber, 121) <= tableA.enddate

    GROUP BY temp.DayNumber

    (consider startdate and enddate values as 2010-05-23 00:00:00.000 and 2010-05-24 00:00:00.000 respectively which are stored in a physical table)

    Now this is the exact query that I am using to find the records. I just changed the table names, but that shouldnt matter.

  • ;WITH temp AS(

    SELECT ID,

    (isnull(CONVERT(char(2),CC), (CASE WHEN YY Between 80 and 99 THEN '19' ELSE '20' END)) +

    CASE WHEN YY < 10 THEN '0'+ CONVERT(char(1),YY) ELSE CONVERT(char(2),YY) END + '-'+

    CASE WHEN MM < 10 THEN '0'+ CONVERT(char(1),MM) ELSE CONVERT(char(2),MM) END + '-'+

    CASE WHEN DD < 10 THEN '0'+ CONVERT(char(1),DD) ELSE CONVERT(char(2),DD) END) AS DayNumber) FROM source.Table)

    SELECT COUNT(*) FROM temp

    LEFT OUTER JOIN tableA where CONVERT(DATETIME, temp.DayNumber,121) >= tableA.startdate and CONVERT(DATETIME, temp.DayNumber, 121) <= tableA.enddate

    GROUP BY temp.DayNumber

  • rajiv.varma (5/25/2010)


    @Jeff

    Its not the temp table that has 4 columns. The source (which is a view being populated in real time and having no constraints/joins whatsoever) has 4 columns which are CC, YY, MM, DD. I dont know why they chose to store it like this but this is what I have to work with 🙁

    For the sake of example:

    SELECT CC, YY, MM, DD FROM source_view

    will provide me values such as

    20, 10, 5, 24 (end result with my conversion: 2010-05-24)

    20, 10, 5, 23 (" " :2010-05-23)

    19, 98, 2, 6 (" ":1998-02-06)

    20, 2, 6, 9 (" ": 2002-06-09).

    The data type for these columns is numeric(2,0). I dont know if there are any joins or not, but the source (as I already mentioned) is a view on a replicated SQL Server. The legacy system is IBM iSeries (which I dont have access to).

    There are no indexes on the view. and thats seriously stupid considering the fact that there are 100 million records

    There are no triggers/joins/ any other objects associated with this view. Also, this view is NOT indexed at all. Not even a freakin Non clustered index. Plus, there are no partitions available.

    In simple terms, consider building a single view for your entire Adventureworks database without any index/constraint/joins/trigger/any DB object associated to it. No Stored procedures/UDF's. NOTHING.

    Ah... got it. Sorry for the confusion on my part.

    The likely reason why they separated dates into 4 columns is to make it easy to "pivot" on the various parts of the date in the DW. The reason why your query is taking so long is because it has to put those 4 parts together for the whole table (table scan) to accomodate your non-sargeble query.

    There are a couple of ways to get around this. One, of course, would be to add a computed column to the table that put date parts together in a "determinent" fashion so it could be indexed. Of course, I don't know what affect that will have on any of the queries or other things that look at the DW. The index, of course, would be relatively large, as well.

    Another way may be to split the CC/YY out for the StartDate and EndDate and "precapture" that information in a temp table where you could do a similar computed column with much less impact on the DW. If the StartDate and EndDate are in the same CC/YY/MM, you could split out the StartDate and EndDate into those three elements for the criteria as to what to "precapture" which would greatly reduce the size of the precapture".

    Of course, if the underlying tables of the view doesn't have any indexes on CC/YY/MM, then none of this makes a difference. Again, a "regular" view will use the underlying indexes of a table if everything else is in "good shape".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks Jeff,

    I think I will try to create computed column in temp table and create an index on that, Hopefully that might help. But there's no way I am getting an index on the (source) view or the table underlying the view. Looks like the guys responsible for that have been working for quite some time to figure out a strategy to implement that. 😛 . Thanks once again.

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply