Combining statements with different ranges

  • I'm trying to figure out a way to speed up a process. This process is trying to find the oldest record for a part number at various workstations in our system. The relevant fields in our table look like this:

    Part_Number

    Workstation

    Date_Entered

    This process needs to look first to see if the part exists at a specific workstation or workstations greater than the specified one, in workstation sequence. If still not found, it looks at all workstations, again in workstation sequence.

    Say, for example, I was looking for part number ABC. I have workstation numbers 1-10 and the specific workstation I want to start with is workstation 5. Workstation 3 is the only one with the part number at it.

    I would do the following lookups:

    select workstation, date_entered from inventory where part_number='ABC' and workstation>=5 order by workstation, date_entered;

    select workstation, date_entered from inventory where part_number='ABC' order by workstation, date_entered;

    If I came across the record for the part in the first lookup, I don't do the second look up.

    Is there a way to combine these statements into one?

    I hope I've made sense.

    Thanks very much,

    Andy

  • You could do this:

    select

    workstation,

    date_entered

    from

    inventory

    where

    part_number = 'ABC'

    and workstation >= 5

    union all

    select

    workstation,

    date_entered

    from

    inventory

    where

    part_number = 'ABC'

    and not exists

    (select *

    from inventory

    where part_number = 'ABC'

    and workstation >= 5)

    order by

    workstation,

    date_entered ;

    I used a Union clause to turn it into a single query, and added "Where Not Exists" to the second query to make it not pull any records if the first query will have results.

    Does that do what you need?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • You could union the selects together and then sort by choice and your existing criteria

    select 1 as choice,workstation, date_entered from inventory

    where part_number='ABC' and workstation>=5

    UNION

    select 2 as choice,workstation, date_entered from inventory

    where part_number='ABC' and workstation<5

    order by choice,workstation, date_entered;

  • the Troglodytic, procedural solution:

    IF EXISTS (select workstation, date_entered from inventory

    where part_number='ABC' and workstation >= 5)

    select workstation, date_entered from inventory

    where part_number='ABC'

    and workstation>=5

    order by workstation, date_entered

    ELSE

    select workstation, date_entered from inventory

    where part_number='ABC'

    and workstation < 5

    order by workstation, date_entered;

    -MarkO

    "You do not really understand something until you can explain it to your grandmother" - Albert Einstein

  • Thanks for the responses, but in each of those, wouldn't it still be running two queries and just return a single data set? I was hoping to figure out how to put it into a single select statement and tell SQL to return workstations 5-10 first followed by 1-4 (in my example). Can something be done with the order by to do this instead of returning straight workstation sequence?

    Sorry if I wasn't clear enough before.

    Andy

  • ... I was hoping to figure out how to put it into a single select statement and tell SQL to return workstations 5-10 first followed by 1-4 ...

    Why?

    -MarkO

    "You do not really understand something until you can explain it to your grandmother" - Albert Einstein

  • molson-927659 (1/7/2010)


    Why?

    The goal was to try to speed up the process by eliminating an extra query. I'm no expert on SQL, but I would think that combining into a single select statement would be faster.

    I received a response with the answer on a different forum:

    select top 1 workstation, date_entered from inventory where part_number='ABC' order by case when workstation>=@yourworkstation then 1 else 0 end desc,workstation, date_entered

    This returns the data that I'm looking for and, from my meager understanding or the way this all works, I think it will be faster.

    Andy

  • ajreynolds (1/7/2010)


    molson-927659 (1/7/2010)


    Why?

    The goal was to try to speed up the process by eliminating an extra query. I'm no expert on SQL, but I would think that combining into a single select statement would be faster.

    I received a response with the answer on a different forum:

    select top 1 workstation, date_entered from inventory where part_number='ABC' order by case when workstation>=@yourworkstation then 1 else 0 end desc,workstation, date_entered

    This returns the data that I'm looking for and, from my meager understanding or the way this all works, I think it will be faster.

    Andy

    Two things, test, test, and test again. Also, compare actual execution plans.

  • This is a simple one...

    select workstation, date_entered from inventory where part_number='ABC' and workstation>=5 order by workstation, date_entered;

    if @@rowcount = 0

    --make this call another sp to get the best execution plans

    --I'm using smaller than 5 here since we already know that there's nothing above it

    select workstation, date_entered from inventory where part_number='ABC' and workstation<5 order by workstation, date_entered;

    On the programmers side... if first recordset is empty, move to next recordset and display the data.

    If the first query will mostly result in full table scan (or clustered index), you can use the order by case method to only do 1 scan vs 2. But you need to test with your own data to see what is the best solution for you.

  • I must appologize MarkO. As I said before, I'm not SQL expert. I had thought that the select statement in the "if" would cause your method to be slow. In fact, if I'm reading the execution plans correctly, yours is the fastest of all the options presented. I haven't looked at execution plans before, so I hope I'm reading them correctly. I ran two tests with each of the solutions presented above, first with a workstation number of 5, which would return a record at workstation 5 in my test data. Second I ran it with the workstation set to 99. Since there is no data with a workstation that high, it would force it to the second condition of workstation < the specified one. I turned on "Include Actual Execution Plan" and compared the "Estimated Subtree Cost" of the top node of the execution plan for each of the solutions and found the following:

    test 1 test 2

    My original: .0421682 .0844343

    Gsquared: .0794749 did not return any data

    jcdyntek: .0805071 .0794198

    MarkO: .0485817 .0486795

    My other solution: .0667801 .0667801

    Ninjas_RGRus: .0421682 .0662867

    Am I looking at the right thing here?

    Andy

  • Are those times in seconds or is it another figure?

    If it's seconds and assuming that query won't run 100 times a second, then there's no real need to tune it any further. Performance tuning is about finding the 10% of the queries taking 90% of the load and bring that number back down to 10% (metaphorically speaking)... just the 80/20 rule.

  • also one last thing to note... a covering index could make this query fly without hurting inserts too much...

    create index whatever on dbo.TblName (part_number, workstation) INCLUDE (date_entered).

    But again not really 100% required if actually running in under 1/20th of a second and you don't plan to massivly grow that table. As always it depends on your environement and data movements.

  • ...I must appologize MarkO...

    No apologies neccessary Andy. This is the 1st time all week I did something right.:blink:

    ...Am I looking at the right thing here?...

    Looks good to me, but I leave that question to the experts - I'm just a hacker (in the historical sense)

    -MarkO

    "You do not really understand something until you can explain it to your grandmother" - Albert Einstein

  • Ninja's_RGR'us (1/7/2010)


    Are those times in seconds or is it another figure?

    If it's seconds and assuming that query won't run 100 times a second, then there's no real need to tune it any further. Performance tuning is about finding the 10% of the queries taking 90% of the load and bring that number back down to 10% (metaphorically speaking)... just the 80/20 rule.

    The help for SQL just says the Estimated Subtree Cost is

    The total cost to the query optimizer for executing this operation and all operations preceding it in the same subtree.

    A quick search on the net indicates that the costs don't have any real units, they seem to just be a way to compare different operations.

    The actual query I'm working on is much more complicated than the example I put up in my original question, but what I've learned here will help me optimize it. It's possible that it will run 1000 times or more in a row, and the process is currently fairly slow, so speeding it up as much as possible is pretty critical.

    Andy

  • 1 - What is slow? 2 secs per run times 1000 run is obviously a long time to wait after something. However what would be a fast run in your view.... on the end user's view? Another thing to consider is how often will that report run (assuming it's a report at this point). Reports can be run in off hours and exported to excel or whatever format the user desires and be ready whenever they want.

    2 - You are talking about 1000 queries being ran. The oviously much faster way is doing the query only once with correct group bys to give you what you need possibly taking that time down from ?40? minutes to maybe just 1 or 2 or even faster. This method combined with covering indexed could make a single index scan across a big table which is way faster than 1000 seeks + bookmark lookups.

    Then again I can't make this call for you without seeing the actual execution plan and the queries you are running.

    There are article on this site on how to post that info. If you post what we need then we can go further in finding the hotspots in your query for you.

    Just to give you an exemple I had to calculate the inventory in a ERP. The ERP only saves the transactions in the system, but never the actual inventory per location. My best query to fetch the inventory for all skus after all transactions (700 000 after 2 years) would have taken about 2 hours to run if the server was idle. I changed the format to 2 full table scans using a quircky update and that time shot down to 30 seconds and, in essence, processing 50 000 rows a second with rather complex groupings. It's still slow if you have to wait for it. But the user will only run the report once or twice a month so he can live with that. Now keep in mind that in that 30 seconds I have to copy all 700 000 rows in a temp table. Build the clustered index once, do the first calculation. Then drop the clustered index and rebuild in a different sequence and do the second calculation and then go ahead and display depending on the dates of other transactions in the system. I would guesstimate now that the actual calculation time takes about 10% of the whole process... so that's about 500 000 rows a second processed.

    See what I mean?? 😉

    HTH.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply