Inconsistent Query Results

  • When I execute the following T-SQL query, I only get 4655 rows returned:

    SELECT c.* FROM Customers c INNER JOIN customers_from_dedupe d

    ON c.recordid = d.recordid

    WHERE promotion = 3465

    The 'recordid' column is an IDENTITY column in both tables.

    If I change the SELECT parameter from c.* to d.*, and execute:

    SELECT d.* FROM Customers c INNER JOIN customers_from_dedupe d

    ON c.recordid = d.recordid

    WHERE promotion = 3465

    I get the expected 771,458 rows returned.

    After further investigation, I found that if I remove the WHERE-clause from the first query and execute:

    SELECT c.* FROM Customers c INNER JOIN customers_from_dedupe d

    ON c.recordid = d.recordid

    I get the desired 771,458 rows.  The 'promotion' column is a clustered-index.

    Can anyone explain the reason for this inconsistency??  Shouldn't I get the same number of rows for all these queries???  Thanks!!

  • Is the promotion column in c or d? I think you should prefix promotion with the table in the WHERE clause (like WHERE c.promotion = ). I'm not sure if that is related to your problem, but it might help us all understand the query and the data better.

     

  • Sorry....the 'promotion' column is only in table 'c' (customers).

  • Please post the DDL of the tables. It may also help to post the results of using SET SHOWPLAN_TEXT ON, with the 2 alternative queries.

  • Table C:  recordid and promotion are INT

    Table D:  recordid is VARCHAR(50)

    Here's the results from SHOWPLAN_TEXT:

    StmtText

    SELECT c.* FROM Customers c INNER JOIN customers_from_dedupe d

    ON c.recordid = d.recordid

    WHERE promotion = 3465

    (1 row(s) affected)

    StmtText

      |--Hash Match(Inner Join, HASH[c].[RecordId])=([Expr1002]), RESIDUAL[c].[RecordId]=[Expr1002]))

           |--Clustered Index Seek(OBJECT[GroupAuto].[dbo].[Customers].[Promo] AS [c]), SEEK[c].[Promotion]=3465) ORDERED FORWARD)

           |--Compute Scalar(DEFINE[Expr1002]=Convert([d].[recordid])))

                |--Table Scan(OBJECT[GroupAuto].[dbo].[customers_from_dedupe] AS [d]))

    (4 row(s) affected)

     

  • Which 1 of these 2 statements from the initial post & subsequent replies above is actually correct ?

    >>The 'recordid' column is an IDENTITY column in both tables.

    >>Table D:  recordid is VARCHAR(50)

     

  • Perhaps you should convert d.recordid from VARCHAR(50) to INT before comparing it to c.recordid.

  • The 'recordid' column is only an IDENTITY column in Table C.  Casting the 'recordid' to INT int Table D has no effect:

    SELECT c.* FROM Customers c INNER JOIN customers_from_dedupe d

    ON c.recordid = CAST(d.recordid AS INT)

    WHERE c.promotion = 3465

    Table D originally contained all the records from Table C with promotion = 3465.  Then, records with the same address were stripped out of Table D.  Table D only contained a subset of the columns in Table C.  The 'promotion' column is not in Table D and 'recordid' is used as the link to Table C.

    Therefore, the where-clause is not really needed in the above query.  If I remove it, the query returns the correct results.  However, I would like to understand why the where-clause is having a negative effect on this query.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply