select a from b where Int_field in (1, 2, 3, 4, 5,.......1000)

  • We have select queries which has to do something like:

    select somefield from table_a where Int_field in (1, 2, 3, 4, 5,.......1000) --- we are selecting on 1000+ IDs out of the table of about 1 million records (Of course the numbers are not consecutive like I showed).  We do have clustered key built on that field but it's still slow (> 2 seconds).  Is there any way we can improve this?

    Thanks!

  • From your example above, are you building the select statement as dynamic SQL? Is the SQL part of a SP that is recompiling? On average how many records are returned? How is table fragmentation, and is the table statistics up to date?

    These are some of the considerations that you would need to look into. Other factors are more server related, such as how much memory allocated to SQL Server? How many CPUs, and what is the processor like?

    A possible alternative to gauge any difference in performance is to create a table variable and insert the values in the IN () clause to that table variable. Then either use an INNER JOIN or simply use a SELECT statement as part of the IN () clause, ie.

    declare @tbl table (int_field int)

    --code to insert int_field into table variable

    select a from b where int_field IN (select int_field from @tbl)

    OR

    select a from b inner join @tbl t on b.int_field = t.int_field

  • if your numbers are in sequence...

    select *

    from table

    where fld between @start AND @end

  • Thanks for the input guys... However,  the IDs passed from a list from front end and are not stored in a table nor they are in sequence.... Also we would like to avoid temp tables if possible.   Not sure if there is a better way to go around this problem.

  • If you have no control over the comma-separated list coming from the front-end, then you best bet is to transform it into an XML document (easier than it sounds) and join to the XML as if it were a table.

    Below is a working example. Replace @TestList with whatever is coming from the front end and @TestData with your 1million row table:

    Declare @xmldoc int

    Declare @TestList as varchar(1000)

    Select @TestList = '1,3,5,9'

    Select @TestList = '<ROOT><List ID="' + Replace(@TestList, ',', '"></List><List ID="') + '"></List></ROOT>'

    Select @TestList

    exec sp_xml_preparedocument @xmldoc OUTPUT, @TestList

    Declare @TestData Table (IDColumn int )

    Insert Into @TestData

    Select 1 Union All

    Select 2 Union All

    Select 3 Union All

    Select 4 Union All

    Select 5 Union All

    Select 6 Union All

    Select 7 Union All

    Select 8 Union All

    Select 9 Union All

    Select 10

     

    Select *

    From @TestData as t

    Inner Join  OPENXML (@xmldoc, '/ROOT/List',1) WITH (ID  int) As IDList

      On (t.IDColumn = IDList.ID)

     

  • I would strongly recommend that you DON'T use a massive IN list like you are proposing. I have seen this break a server (stack overflow) - admittedly the person who did this generated >2,500 elements in the list, but I imagine that some of yours may come close.

    It's also horribly inefficient. What you're actually saying is:

    WHERE int_field = 1

    OR int_field = 2

    OR int_field = 3

    ...

    each of which requires a separate index lookup (or worse, a tablescan). If you are tablescanning on this column, then you could find yourself in trouble (an index seek wouldn't be so bad, but it's still bad practice!).

    Why are you so against temp tables? By putting the data in a table you can do a proper join against other tables, rather than sequential lookups, and this is much more efficient.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply