Performance when using UNION in a view

  • Hi SQL Gurus,

    I have 2 databases which have identical schemas.  They contain client data that must remain segragated.

    We are running some analysis that goes across both these databases where I hit on a performance question.

    In Query Analyser if I run the query

    USE database3

    SELECT * FROM database1.MyTable

    UNION

    SELECT * FROM database2.MyTable

    the query takes around 300 seconds.

    I also have a view which is defined as

    USE database3

    CREATE VIEW MyView AS

    SELECT * FROM database1.MyTable

    UNION

    SELECT * FROM database2.MyTable

    If I run the query

    USE database3

    SELECT * FROM MyView

    the query takes 360 seconds, around a minute slower.  Why is this?  I would have expected the view to be in some way optimised. I certainly wouldn't have expected it to be slower.  I have run some timing tests around 20 times and get consistent results.

    Does anyone know why this is, or can point me to any relevant documentation?

    Each table has around 10 million rows. We are using SQL 2000 Enterprise running on Windows 2000

    Thanks in advance!

  • What's the difference in the execution plans?

  • Do each of the seperate queries that the union combine return values distinct from each other, if so you should use 'union all' this will just union the results, whereas 'union' alone will sort the result sets and eliminate duplicates. This sort takes a lot of processing.

  • UNION ALL fixes the problem!  I hadn't appreciated the difference between UNION and UNION ALL. I knew I must be doing something wrong!

    Thanks for very much for your help.

     

  • So what are the final run times?

  • Final times were basically all the same at around the 300 seconds mark.  It would be nice if it were faster, but as this is for our own internal purposes, it doesn't really matter.

     

    Joe,

    We have separate databases because each database represents the data from a different client. We process data for a large number of external clients.  If any client saw another clients data, we as a company would be royally screwed!  Without going into any detail about how our app works, we regularly need to import client data, process and back up, but often we will need to restore from a previous back up a day or so after processing; if all data was in the same database, any restore would overwrite any updates from other clients.  Secondly, we have something in the region of a terabyte of client databases, they are much easier to manage at the file level as individual databases.  We have more than one SQL Server, so small databases allows us to do load balancing when required; we can physically move a database from one machine to another if we want to.  Thirdly, we have also found that as the databases grow larger, their performance degrades, so separating them out makes sense from that angle also.

    We only go across the databases for our own internal auditing purposes, which is not part of the process for our clients.  No data appears in 2 places (I'm not sure why you think it is)  this is the whole point of using a view!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply