Eliminate Loop and take a set-based approach

  • An interesting requirement where an account holder will be awarded points based on the amount spent at the retail store. There are different earning slots at which an account holder can earn points:

    Slot 1 $0 - $4999 1 point for each dollar spent

    Slot 2 $5000 - $9999 3 points for each dollar spent

    Slot 3 $10000 - $49999 5 points for each dollar spent

    Slot 4 $50000 - $99999 10 points for each dollar spent

    So if I open up an account today and I buy products "A", "B" and "C" in one transaction (shopping cart) and pay $500, I will get 500 points.

    Next week when I buy a product "D" for $6000 my calculation will start from previous amount spend balance. For the first $4499 I will get 1 point for each dollar (already spent $500 last week, so $4999 (max amount in first slot) - $500 = $4499) and for remaining balance $6000 - $4999 = $1001, I will earn 3 points for each dollar. And this can happens for multiple products or a single product in a shopping cart, i.e. 1 product can switch slots based on the paid amount and spend balance as Product "D" did.

    To add to the complexity the points need to be given for each bought product separately (shown for each product on the invoice) and not on the total shopping cart or transaction bill. (So product "D" will show both slots and points earned in each slot on the invoice)

    The account remains active till 5 years and after that the spend restarts from $0. So account holders continue to earn points for 5 years.

    So each product will have to go in a while loop (trying to avoid cursors) with added last spend balance to determine which product switches from 1 slot to the other within each transaction or shopping cart.

    Can this be done without looping to enhance performance in SSIS?

    Thanks

    G

  • I'll start off by saying that I am almost certain that no WHILE loop will be required here.

    But in order to get to a solution, I'll need to ask a few more questions:

    1) Why SSIS? How is this routine going to be used? It sounds (to me) more like a T-SQL stored procedure than anything SSIS would do. If the points are to be awarded at POS, per transaction, SSIS is not the right technology to use.

    2) Can you provide some DDL and sample data? Not every column is needed, just the relevant ones.

    3) Probably answered by your question above, but how is product ordering denoted in a transaction? (This information is needed to determine which product gets which number of points when a slot boundary is encountered.)

    4) Following on from (1) above, please think about what you want the routine to do. One approach would be to call a stored proc with a transaction ID parameter and have the sp perform an update on the relevant order header/lines tables - if they exist at that point.

    Or am I completely on the wrong track and you are looking for some sort of regular batch update?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Thanks for the reply Phil.

    1. The points are calculated in a batch from a flat file generated by the POS on a regular basis. They are shown later to the account holder on the website (perhaps I shouldn't have said invoice earlier) at each line item level in the bill. SSIS is the tool used that extracts from these flat files and calculate points before they are loaded to the DB.

    2. Sorry, no DDL samples available at this time. But here is a rough sort of idea:

    -POS HEADER contains transaction level info including Transaction ID, Sales Date, Register #, gross sales amt, net sales amount, Tax amt, etc.

    -POS DETAIL contains line item level info including above + line sequence #, product id, retail amt, qty, etc.

    -POS PAYMENT contains payment info including Transaction ID, Sales Date, Register #, Paid Amount, Acct Holder Card #, etc.

    Transaction ID + Sales Date + Register # - Primary Key in Header and Foreign Key in others.

    -AmtSpendTillDate table contains spend balance for each account for its life-span: Holder Card #, last transaction date, amount spent till date.

    3. Product Ordering is the order of scanning. Each line item on the bill gets a line # starting from 1.

    4. Yes its a regular batch update scenario.

    Thanks

    G

  • Thanks for the extra detail - makes more sense now.

    If you want to avoid loops, you'll need to use the power of the database - and as the data is not yet in the database at the time the SSIS package runs, you can't do that directly.

    I'd be tempted to load the data into the db and then run a stored proc to allocate the points - is that an option?

    Or load the data into a staging area, run a stored proc to allocate points in the staging area, then load from staging into main database.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Trying to avoid ELTL implementation where we load to a table first and then transform. But if it is faster than looping in SSIS, might consider it.

  • 1) I am not sure SSIS is the way to go here. I am about 90% certain (given nebulous descriptions of problem and no sample data with expected outputs) that this can be done using set-based and efficient mechanisms.

    2) This is NOT easy processing logic and in my opinion goes WAY beyond what you should expect for free on a forum by people who donate their spare time to help others. Get a professional to help you with this. Since you mention POS system I am betting it will need to scale to possibly LOTS of feeds from MANY POS machines and if you try to loop this you will likely not be successful.

    3) WHILE loop is essentially the same ugliness as a cursor from a performance standpoint.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks Kevin.

    I was definitely not expecting the complete solution (with the info provided), just needed a direction if it was possible. Processing happens in a batch process from flat files (pls note the source is flat files not SQL Table) and given the complexity (maintenance effort later) and pipeline vs SQL Engine performance - I think SSIS is the way to go.

    I actually have found a solution that will eliminate looping. SSIS can generate running totals (key word here for googling) using Script Transformation Task and that can be used here without looping. After doing that you will have account spend updated (added to previous row grouped by the basket id) with each bought product row and then you can easily find the current earning slot from it and calculate points based on the slot in the same script or downstream derived column transform.

    Thanks to all for replies.

    Gagan

  • Script component is what you mean, I think.

    Well done on taking the time to find a solution to your problem.

    I still have concerns over the potential performance of your solution, for the following reasons:

    1) Your solution still requires row-by-row processing.

    2) For each transaction, I am presuming that you will need to do a look-up on the database (to get information about previous transactions for that account).

    A set-based solution, as suggested in my earlier messages, would avoid all that.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • So I've got three points per dollar for the last $500 of product "D." What happens if I now return "A" "B" and "C?" Does my score for product "D" change to one point per dollar?

  • Phil Parkin (11/15/2011)


    Script component is what you mean, I think.

    Well done on taking the time to find a solution to your problem.

    I still have concerns over the potential performance of your solution, for the following reasons:

    1) Your solution still requires row-by-row processing.

    2) For each transaction, I am presuming that you will need to do a look-up on the database (to get information about previous transactions for that account).

    A set-based solution, as suggested in my earlier messages, would avoid all that.

    I agree on all points. Row-at-a-time can crush SSIS performance just like anything else done one row at a time. Hopefully your performance is acceptable and you can move on.

    If you do need (or want) to pursue set-based solution, CASE will be your friend. It is my favorite 4-letter TSQL word!! 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (11/16/2011)


    Phil Parkin (11/15/2011)


    Script component is what you mean, I think.

    Well done on taking the time to find a solution to your problem.

    I still have concerns over the potential performance of your solution, for the following reasons:

    1) Your solution still requires row-by-row processing.

    2) For each transaction, I am presuming that you will need to do a look-up on the database (to get information about previous transactions for that account).

    A set-based solution, as suggested in my earlier messages, would avoid all that.

    I agree on all points. Row-at-a-time can crush SSIS performance just like anything else done one row at a time. Hopefully your performance is acceptable and you can move on.

    If you do need (or want) to pursue set-based solution, CASE will be your friend. It is my favorite 4-letter TSQL word!! 🙂

    What about FROM, JOIN & TRAN? 😀

  • I don't know about the OP, but I would like to see a set-based solution based on the criteria the OP gave.

  • What about FROM, JOIN & TRAN? [BigGrin]

    And, especially around the middle of the afternoon, or when you accidentally pressed F5 while connected to production, DUMP.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (11/16/2011)


    What about FROM, JOIN & TRAN? [BigGrin]

    And, especially around the middle of the afternoon, or when you accidentally pressed F5 while connected to production, DUMP.

    DUMP or DROP :-D.

  • Yeap, I agree with both of you.

    I did a lookup (one time) in the beginning and added 4 extra columns for max slot amount for each slot and then checked in which slot the amount was falling in and based point calculation on that. performance is fine. And I think every component in SSIS does a row-by-row processing e.g. you can do the above (after running total) in Derived column task and its still faster - that's the beauty of pipeline vs. procedural engine.

    Now this discussion was only for purchases, returns is a different animal all together, I am going to write a separate Data flow task for that. 🙂

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply