Your Recommendations on Query / Cursor in case?

  • I have asked to do a comparsion various columns in 2 tables with the same table structure, for example

    Table Person1 with the following structure

    FirstName

    LastName

    PhoneNumber

    Address

    City

    PostalCode

    Birthdate

    PersonID

    and Table Person2 with the exact same structure

    In this case should cusor be the appropiate choice or would writing queries comparing different fields is better in terms of performance. # of rows will be close to 900 thousand records per table

  • Out of personal preference I try not to EVER write cursors unless they are absolutely necessary. I read somewhere about how SQL is a set based language and you need to think of everything in resultsets and never a record at a time. Don't get me wrong sometimes a cursor is the only way to accomplish what you want to do but in your case I would compare the two tables column by column.

  • I agree with cheddar. Even with the great performance improvements made for cursors, this looks like one that is much better in a set-based operation.

    The exception that comes to mind (and still might not be a good condition) is if you are writing a generic utility and don't know the table structure beforehand.

    My 2 cents.

    Guarddata-

  • Since you have 900 Thousand records that would be a lot of memory thrashing to process a cursor based query. Should be faster if you can write a set based query.

    Gregory Larsen, DBA

    If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

    Gregory A. Larsen, MVP

  • Since you have 900 Thousand records that would be a lot of memory thrashing to process a cursor based query. Should be faster if you can write a set based query.

    Gregory Larsen, DBA

    If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

    Gregory A. Larsen, MVP

  • I think you can do it without cursor in one table scan like this (only replace NULL values with, say, '' using ISNULL function):

    SELECT PersonId

    , CASE t1.FirstName WHEN t2.FirstName THEN 0 ELSE 1 END AS FirstNameDifferent

    , CASE t1.LastName WHEN t2.LastName THEN 0 ELSE 1 END AS LastNameDifferent

    .............

    FROM t1

    JOIN t2 ON t1.PersonID = t2.PersonID

    Edited by - mromm on 03/06/2003 11:58:57 AM

    Edited by - mromm on 03/06/2003 12:00:01 PM

  • You can normally do away with cursors with temp tables. Populate a temp table (Either variable or normal) and loop through it. I have used this many times with large records sets and not had any problems.

    Could you not solve the problem by doinig a normal select with join?

    Select * From table1 JOIN table 2 on T1.PersonID = T2.PersonID

    Where [All the fields do not match]

    That would be instant results and only return mismatched data.

    Cheers,

    Crispin

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

  • Crispin,

    Forgive my ignorance, but how do you loop through a table if you don't use a cursor?

    Jeremy

  • You can do it one of two ways. Noth require a temp table.

    I prefer using a table variable instead of a normal table in TempDB

    -----------------------------------------------

    SET NOCOUNT ON

    --Declare you temp table (Can use a normal temp table.)

    Declare

    @vQuotes TABLE(

    Ident INT NOT NULL IDENTITY(1, 1),

    QuoteID INT NOT NULL,

    QuoteRef Varchar(20) NOT NULL

    )

    --Declare Local Variables.

    -- @vCount is number of rows in temp table

    -- @vQuoteID is the column I want to work with

    -- @vQuoteRef is the column I want to work with

    Declare

    @vCount INT,

    @vQuoteID INT,

    @vQuoteRef VARCHAR(20)

    -- Insert into the table

    Insert Into @vQuotes

    Select QuoteID, QuoteRef From Quotes

    --Set the @#vCount with row count. Can use Set @vCount = MAX(Ident) From @vQuotes

    Select @vCount = @@Identity

    WHILE @vCount > 0

    Begin

    -- Loop through the table pulling one row at a time into a variable.

    Select

    @vQuoteID = QuoteID,

    @vQuoteRef = QuoteRef

    From @vQuotes Where Ident = @vCount

    -- DO what you want here. Can exec another proc passing individual variables.

    Print STR(@vQuoteID) + ' ' + @vQuoteRef

    -- Dec your count

    Set @vCount = @vCount - 1

    End

    -- Go and have a smoke.

    ----------------------------------

    If you want to use a "normal" temp table for this, create it as usual and use #table name instead of @vTableName.

    I have had no problems with this method. Preferr it over cursors. Prefer using variable tables over normal because of less disk IO and you don't have to clean up the table.

    People have warned about using them with large record sets due to high memory consumption. I have yet to have a problem. 10000 rows in it and no problem.

    Cheers,

    Crispin

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

  • One more hint though. This can cause problems with variable tables.

    SQL does not have statistics on the, You cannot add an index to it.

    If you gonna have large tables where an index would be beneficial, rather use a normal temp table.

    Cheers,

    Crispin

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

  • Thanks for the info.

    I have a csv feed from another system and I need to do inserts/updates in my system. Some of these feed files are large (can be several million rows) so what I do is this:

    - load into table in tempdb though bulk insert

    - declare cursor

    - fetch row from cursor

    - determine if row exists in table

    - insert/update depending on whether exists already

    - fetch next

    - tidy up

    In ideal conditions (only one job running and nothing else runnning on server) I can process upto 2 million rows per hour. I often run multiple updates concurrently (anything up to 10) and can get a total of 5 or 6 million rows processed per hour - I guess I start running into I/O constraints rather than anything else which is why I don't get a straight multiple of the max throughput.

    I'll give it a go sometime and see if I get a performance boost. There will be the extra overhead of creating the index on the identity column which I don't have at the moment but I might save time by not having the overhead of a cursor.

    Thanks

    Jeremy

  • Maybe I misunderstand you but -

    Solution 1)

    Dump the contents of the csv file into a temp table.

    You not gonna get around that.

    Could you not do something alone these lines.

    Insert Into MainTable Where Somedata Not In (Select somedata from TempTable)

    This will do a "diffential" insert.

    No comparisons for SQL to do.

    I have a couple of imports that work like this. Much faster.

    Solution2)

    Haven't actually tried it but should work.

    When you import data, you can specify a column copy OR query.

    Could you not select the columns from the CSV file where data not in main table.

    This might be extremly slow as SQL cannot do an efficiant search on a text file (Not 100% sure of this)

    This, assuming the select works, would be even faster as SQL will only be pulling relavant data from the file.

    Let us know what happens. Would be interesting.

    Crispin

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply