Long, complicated problem (SQL Svr 7)

  • This is going to be long... so please bear with me.

    Our sales team uses a CRM system that uses SQL Server 7 as backend. Customer purchase our "online products", and these products are created onto our website, which uses Oracle as backend.

    Our DQ department now has to look at every single sale, check the data in the CRM System and compare it to the product created on our website, make sure that all the attributes match.

    Now, I've been tasked to create a "process" to check everything automatically. Only those with problems (discrepancies between the 2 systems) should be returned for manual inspection.

    Our website's product data are heavily guarded, I can only get a snapshot of a part of it 4 times a day. The data I got are the attributes of the products, I store them into a table in SQL Server 7.

    Coming up - the script I've created (so far):

  •  
    

    /* This script search for all companies that have a new product pending Data Quality check/approval.
    ** Inside those companies, we want to compare each and every product between the CRM system (SQL Server 7),
    ** and the web data (loaded into a table in SQL Server 7 - *web products attributes table*).
    ** If discrepancies exist, insert that company's ID into a table called ErrorCompTable.
    */

    DECLARE @comp_id int
    DECLARE @web_row_id int
    --DECLARE all web_product_attributes variables
    DECLARE @temp varchar(100)


    DECLARE company_cursor CURSOR
    FOR (SELECT company_id
    FROM st_cases
    WHERE status = 'New'
    ) -- this returns all the company_ids that have new products.



    OPEN company_cursor
    FETCH NEXT FROM company_cursor INTO @comp_id

    WHILE @@FETCH_STATUS = 0
    BEGIN

    /******************************************************************************************************
    ** Inside the COMPANY_CURSOR, we declare and open another cursor called row_cursor,
    ** and loop through all the products inside company @comp_id.
    ******************************************************************************************************/

    DECLARE row_cursor CURSOR
    FOR (SELECT /*attributes*/
    FROM /*web products attributes table*/
    WHERE company_id = @comp_id
    )

    OPEN row_cursor
    FETCH NEXT FROM row_cursor
    INTO /*@web_row_id, @ web_products_attributes... etc*/

    WHILE @@FETCH_STATUS = 0
    BEGIN


    /******************************************************************************************************
    ** Inside the ROW_CURSOR, we compare one product's attributes between the web's (@web_products_attributes)
    ** and CRM one by one. (Linked by CRM_row_id - @web_row_id.)
    ** IF <> THEN we insert the company_id into table *ErrorCompTable* and BREAK from
    ** row_cursor & company_cursor (skip to next company).
    ******************************************************************************************************/

    SET @temp = (SELECT CRM_product_attributes FROM CRM_product_table
    WHERE crm_row_id = @web_row_id)
    IF @temp <> CRM_product_attributes
    BEGIN
    INSERT INTO ErrorCompTable @comp_id
    BREAK
    END

    /******************************************************************************************************
    ** REPEAT FOR PRODUCT'S ALL ATTRIBUTES ... ... ... etc etc...
    ******************************************************************************************************/

    /******************************************************************************************************
    ** DONE! After we've checked all the attributes for the previous row (@web_row_id),
    ** we FETCH NEXT web_row_id & attributes from the cursor into the variables, then go back
    ** in the WHILE loop.
    ******************************************************************************************************/
    FETCH NEXT FROM row_cursor INTO /*@web_row_id, @ web_products_attributes... etc*/
    END -- End row_cursor WHILE loop

    /******************************************************************************************************
    ** If we BREAK out, we will be here, which closes the row_cursor and fetch the next company
    ** and start again.
    ******************************************************************************************************/
    CLOSE row_cursor
    DEALLOCATE row_cursor


    FETCH NEXT FROM company_cursor INTO @comp_id

    END -- End company_cursor WHILE loop

    CLOSE company_cursor
    DEALLOCATE company_cursor



  • OK, the problem is performance... the script takes more than 7 minutes to run in my testing environment! I obviously can't have that run as a job on the production server!

    I think the biggest problem is that each them I loop through the cursor to check every single product's attributes, I have to find the attributes in multiple tables in the CRM system's SQL Server 7 DB.

    SELECT crm_attribute1 FROM this_table

    IF @web_attribute1 <> crm_attribute1

    ...

    SELECT crm_attribute2 FROM this_table JOIN that_table

    IF @web_attribute2 <> crm_attribute2

    ...

    Now, if we were on SQL Svr 2000, then I think I should have created an Indexed View for all the CRM system's product attributes, right? That way, inside the cursor, it doesn't have to look for attributes by joining tables & stuff. The Index should help a lot too!

    However, we're on SQL Server 7 and it doesn't support Indexed View. Is there ANY WAY I can make this process faster?

    (Also, as a side question, since I'm inserting the company_ids into a table, should I enclose this scrpit in a transaction? If so, how? )

    Thanks for any help!

  • I may have mis-read this but...

    Can you not join the CRM and your snapshot tables selecting only Comp_id and use the where clause to exclude matched records.

    Something like...

    SELECT Comp_ID

    FROM CRM c

    JOIN Snapshot s

    ON JOIN COLUMNS

    WHERE c.f1 <> s.f1

    Or c.f2 <> s.f2

    Etc...

    Obviously when you create your snapshot table you can apply indexes to it.

  • If I understand you correctly, in the *web products attribute table* you have one row per product containing all the attributes.

    In your CRM table, you have one row per product and per product attribute.

    And it is hell to start comparing these :).

    You could remove all of the cursors by comparing each attribute one at a time in a join.

    
    
    SELECT web.company_id
    FROM web
    LEFT OUTER JOIN
    (SELECT crm_row_id, attribute_value
    FROM crm
    WHERE attribute_id = <matches>) attr
    ON web.row_id = attr.crm_row_id
    WHERE web.attribute_value <> attr.attribute_value
    OR attr.attribute_value = NULL

    The attr subquery in the join, just selects the correct row from the crm table with the correct attribute to compare.

    Repeat this for each attribute.

  • NPeeters is mostly right. The web table has 1 row per product, the crm table also has 1 row per product HOWEVER... there are many different categories of products. So, the table is like this:

    CREATE TABLE crm_product_attributes

    (

    prod_category int NOT NULL,

    crm_row_id int NOT NULL,

    att_col_1 NULL,

    att_col_2 NULL,

    ...

    ...

    att_col_20 NULL

    )

    So, depending on the different categories, products can have different attributes. Also, customers can purchase "additional" attributes products to add to their original purchase! These "additional attributes" are add onto the "Related_crm_row_id" to combine into ONE row in the web table!!!!! (Ugly huh?!)

    Some attributes on the web table like "sales_person_id", "company_name" are stored in different tables in the CRM db.

    The only link between the CRM tables and the web table is the row_id (product level only!) I cannot link them with company_id.

    Indexes were built on the web table and the CRM tables.

    ------------------------------------------------------------------------------

    I think we haven't seen a working solution yet, right?

    Now, a co-worker suggested that I get rid of the cursors because SQL Server sucks at it. (I'm relatively new to the DB Dev scene and have been reading SQL Server Central for about 1 year. I tend to think that SQL is fast and powerful, it's just us mortals writing bad codes on it that makes it slow. 🙂 Anyways, he suggested that in SQL Server, "Less is More"... I should try separating the 2 cursors into 2 stored procedures and write codes to loop through them.

    I haven't started writing it that way yet, what do you all think?

    Again, thanks for any input on this!!!

  • Breaking this into two procs may definitely help, but you really need to avoid nesting cursors. That's your performance problem, I'm sure.

    Try to figure out how to do one of those cursors as a batch, then use a "While" loop for the other one. Or use "While" loops for both if you have to. Below's an example of a loop from an earlier discussion:

    If object_id('tempdb..#table1') is not null Drop Table #table1

    Create Table #table1 (Number int, Name_ varchar(50))

    Insert #table1 Values (1, 'ee,ed,ff,er,fsd')

    Insert #table1 Values (2, 'rr,kk')

    If object_id('tempdb..#table2') is not null Drop Table #table2

    Create Table #table2 (Number int, Name_ varchar(10))

    While 1 = 1 --Always - Loop is controlled by "Break"

    Begin

    Insert Into #table2

    Select t1.Number

    ,

    Case

    When Charindex(',', t1.Name_) > 0

    Then Replace(Left(t1.Name_, (Charindex(',', t1.Name_))), ',', '')

    Else t1.Name_

    End

    From #table1 t1

    Where Len(Name_) <> 0

    IF @@Rowcount = 0 Break --Once there are no more records to process break loop

    Update #table1

    Set Name_ =

    Case When Charindex(',', Name_) > 0

    Then Right(Name_, Len(Name_) - Charindex(',', Name_))

    Else ''

    End

    From #table1

    Where Len(Name_) <> 0

    End

    Select *

    From #table2

    Signature is NULL

  • First, I should say that I can't endorse the use of cursors, but that is really a 'do as I say and not as I do' sort of thing. Second, what is the total number of comparisons you have to make? Unless it is in the tens of thousands (at least), you shouldn't be seeing the performance problem.

    Some things to check:

    Mark your cursor as 'forward_only'. By default SQL 2000 will use a dynamic cursor, which is slower, unless you specify otherwise...Since you are using 7.0, I'm not sure what your cursor options are (I know 'Fast_Forward_Only' is not available) but you may want to look into that.

    Many problems that appear to require a cursor, in fact, do not. I know that the challenge I set for my team is 'no cursors' in production'...Your friend gave you a good suggestion when he recommended SP's (using output parameters, I presume). However, case statements, cross-joins, self-joins, and temp tables can get you a long way. I had a bit of trouble following the description, but if you had a subset of some dummy data, I'd give it a go as a challenge.

    Write a short script that creates the tabels and inserts some values to recreate the problem...I'm sure someone will pick it up and run with it.

    ciao ciao


    ciao ciao

  • Are you saying that you are supposed check the CRM system to find out if the attributes of the product you sold match the attributes of the product you put into production? Shouldn't they do that BEFORE they put it into production?

    Also, unless you are either comparing a huge number of variables or running against an awful big # of records, you shouldn't really have this problem. SQL 7 was not great with cursors, but it wasn't that bad either. I would check both the box this process is running on, the priority being given to SQL Server within the Windows OS, as well as how much memory is available to the process.

    ciao ciao


    ciao ciao

  • Thursday, Aug. 14th, I was in the middle of rewriting the query as Calvin suggested when suddenly, VOOooooommmmm... ... the power went out! Now I moved to NYC in beginning of 2002 from Seattle, where I had experience a wide power outage in the winter of 92. It was actually quite fun, I was in high school, my friends came over and we tell ghost stories... sigh, those were the days. This time, I started walking at ~ 4:30pm, from 31st St & 9th Ave (West side) all the way across the Queensboro bridge (59th St and 1st Ave)... I was lucky enough to caught a bus after that, got home at ~8:30pm... completely exhausted in the heat! Man! What an experience.

    So anyway, now I'm back, trying to figure this problem out.

    P1Daniels, it's funny you mentioned. The company I work at is one of the biggest internet company in located in NYC, however, the sales process is done completely manually! Sales people call clients and get sales, enter sales info into the CRM system, then "Order Fulfillment" people fulfill these sales onto our WEB system. So as you can probably guess, there are tons of user errors in between! The scipt I'm developing here is meant to check all the new sales data that have recently been fulfilled, and notify my group to correct them if errors exist.

    I'll look into the "forward_only" cursor option, however, I really want to try rewriting this whole thing without cursors, but I don't know how... can someone please give me some hints?

    The general process should go like this, right?...

    CREATE TABLE #Company_table (company_id INT)

    INSERT INTO #Company_table

    SELECT company_id

    FROM CRM_system WHERE "new sales cases exist"

    SELECT 1st company_id FROM #Company_table

    WHILE --(don't know how to condition this loop)

    BEGIN

    (Call a stored procedure with company_id as input, the sproc should check all products' attributes and return value 1 or 0 depending if errors exist)

    IF (error exist), INSERT company_id INTO error_table

    SELECT next company_id?

    END

    Hmmm... I guess my biggest question is "How to condition the loop to loop through the company_ids". Should I add an identity column to the #company_table?

    Thanks for letting me pick your brains 🙂 Much appreciated!! Thanks all!

  • A different approach maybe?

    Is it possible to build a table that links a 'web-product-attribute' to a 'crm-product-attribute' in some way? Is this some kind of general structure that is always applied, regardless of the product / customer who purchases the stuff?

    If that's the case, it should be fairly straightforward to build this as a set-based solution.

    Is it in some way possible to post some (simple) sample data? This makes the discussion a lot easier...

  • Man, that power outage was some crazy stuff. Only inevitable a complex self-referencing system, though (Chaos!). I heard people were being pretty cool about things, though; probably have a baby boom in another 9 months. 🙂

    Npeeters has the right idea about how this should be done...in a linking table and using sets instead of loops..much much better peformance. However, as long as company_id is unique:

    if object_ID('tempdb..#Company_table') is not null drop table #Company_table

    CREATE TABLE #Company_table (company_id INT, error varchar(25))

    if object_ID('tempdb..#Error_Table') is not null drop table #Error_Table

    create table #Error_Table (company_id INT, error varchar(25))

    INSERT INTO #Company_table Values (1, 'Error')

    INSERT INTO #Company_table Values (2, 'Good')

    INSERT INTO #Company_table Values (3, 'Error')

    INSERT INTO #Company_table Values (4, 'Good')

    INSERT INTO #Company_table Values (5, 'Good')

    INSERT INTO #Company_table Values (6, 'Error')

    declare @minID int,

    @maxID int

    select @minID = 0,

    @maxID = max(company_id)

    FROM #Company_table

    While @minID < @maxID

    begin

    Select top 1 @minID = company_id

    From #Company_table

    Where company_id > @minID

    --This is where you'd do your error checking

    Insert #Error_Table

    --sp_CheckforError @minID

    Select *

    From #Company_table

    Where company_id = @minID and error = 'error'

    end

    Select * from #Error_Table

    Signature is NULL

  • Yes! Yes! We're getting closer... Let me first try to make up some sample data:

    if object_ID('tempdb..#CRM_table') is not null drop table #CRM_table

    CREATE TABLE #CRM_table (company_id INT, row_id INT, prod_type varchar(50), number_of_ad INT, end_date datetime, related_row_id INT NULL)

    if object_ID('tempdb..#WEB_table') is not null drop table #WEB_table

    create table #WEB_table (company_id INT, row_id INT, prod_type varchar(50), number_of_ad INT, end_date datetime)

    INSERT INTO #CRM_table Values (001, 5001, 'Single Advertiser', 5, '2003-12-31', NULL)

    INSERT INTO #CRM_table Values (001, 5002, 'Additional Advertiser', 10, '2003-12-31', 5001)

    INSERT INTO #CRM_table Values (002, 5003, 'Single Advertiser', 10, '2004-03-31', NULL)

    INSERT INTO #CRM_table Values (003, 5004, 'Single Advertiser', 8, '2003-11-30', NULL)

    INSERT INTO #WEB_table Values (001, 5001, 'Member Advertiser', 15, '2003-12-31')

    INSERT INTO #WEB_table Values (002, 5003, 'Single Advertiser', 10, '2004-03-31')

    INSERT INTO #WEB_table Values (003, 5004, 'Single Advertiser', 8, '2003-11-30')

    /* There we go, this is a simplified version where CRM's product attributes are all in one table. I agree that changing to a set based solution will make this go much much faster! So even though I can't create an indexed view in SQL Svr 7, I think I'm still going to try to create a view in the CRM system to mirror what's in the WEB system, so that I can compare all the products at once in a set based solution.

    However, as you can see in company 001, row 5001 and 5002. CRM system allows for customer to buy "additional" attributes, like 5002, these attributes are add on to the original product - row 5001. Only the total combined product is shown on the WEB system. This is the challenge that I'm facing when trying to create the view today.

    I think once the view is created, I can do:

    */

    if object_ID('tempdb..#Error_table') is not null drop table #Error_table

    create table #Error_table (company_id INT)

    declare @minID int,

    @maxID int

    select @minID = 0,

    @maxID = max(company_id)

    FROM #CRM_table

    While @minID < @maxID

    begin

    Select top 1 @minID = company_id

    From #CRM_table

    Where company_id > @minID

    --This is where you'd do your error checking

    DECLARE @return_error INT

    EXEC @return_error = usp_Check_CRM_WEB_errors @minID

    IF @return_error > 0

    Insert #Error_Table VALUES @minID

    ELSE

    PRINT 'Company ' + @minID + ' has no error.'

    end

    Select * from #Error_Table

  • The SPROC usp_Check_CRM_WEB_errors will be a set based solution, like this?

    CREATE PROC usp_Check_CRM_WEB_errors

    @comp_id

    AS

    SELECT crm.row_id

    FROM #CRM_table crm

    LEFT OUTER JOIN #WEB_table web

    ON crm.row_id = web.row_id

    WHERE crm.company_id <> web.company_id

    OR crm.prod_type <> web.prod_typ

    OR crm.number_of_ad <> web.number_of_ad

    OR crm.end_date <> web.end_date

    OR web.row_id IS NULL

    IF @@ROWCOUNT = 0

    RETURN(0)

    ELSE

    RETURN(1)

    GO

  • Yeah, that looks great! Even with a un-indexed view it should run faster than row by row processing.

    Is there any reason you need to check each company_id separately? If you could check them all at once then you could remove your last loop.

    Signature is NULL

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply