TOP 1000.....

  • Hello everybody

    I'm trying to retrieve 200 million records from a table. Since its taking forever to query the process I decided to break it up into smaller pieces.

    I did a TOP 1000000 to retrieve the first milllion. What my question is, is there a code to retrieve the next 1000000 records from that table until all the 200 million records are retrieved? Please nd thank you in advance...

    Anchelin

  • Only if there is a unique key which you can use as part of the select to get greater than the value in the last row retrieved. I don't think that splitting the retrieval by 1,000,000 rows at a time will help much as sql will need to bypass the 1st mil to get the 2nd, 2 mil to get the third etc and for that volumn sql will most likely use table scan (although I may be wrong on that, you will need to check the execution plan)

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I don't have a unique key, because all the rows I retrieve are something like 'ComputerName', InstanceName', etc... I can add another column to get the results, like 'CounterDetail_FK'

    But how will my code look like then?

  • Without a unique key you will not be able to select the data in chunks because there is no guarantee for the order of the data without an 'ORDER BY' clause and unless the data is static you also cannot guarantee the data will be retrieved correctly.

    What would be in the 'CounterDetail_FK' column (does FK equate to Foreign Key?), if it is not unique then the same problem exists.

    However if 'CounterDetail_FK' is a way of grouping the data then you could select each 'group' at a time, but again if the data is not static the problem above is still there.

    Can you post ddl, some sample data and explain how the output is being used as I cannot understand why you want to select 1,000,000 rows, what are you going to do with the output?

    Far away is close at hand in the images of elsewhere.
    Anon.

  • OK...

    I have to retrieve this data fom another server into a table on another server, then I have to summarise this data; its for Performance purposes...

    'SMC_InstaceID' is the unique key (primary key). the code is as follows:

    Select TOP 1000000 Name, ComputerName, InstanceName, CounterName, SampledValue, DateTimeSampled, SMC_Instance_ID from TableName

    Then I have to summarise the SampledValue column, by min, max.avg, etc. ....

    Hope this helps...

  • So what is 'taking forever to query'

    The transfer of data from ServerA to ServerB?

    Or the summary?

    Why the transfer, is it possible to summarise the data on the server that holds the data?

    Based on your last post, if you do want to select bit by bit then

    DECLARE @SMC_Instance_ID bigint

    SET @@SMC_Instance_ID = 1  (Assuming that ID starts at 1)

    SELECT Name, ComputerName, InstanceName, CounterName, SampledValue, DateTimeSampled, SMC_Instance_ID

    FROM TableName

    WHERE SMC_Instance_ID >= @SMC_Instance_ID

    AND SMC_Instance_ID < (@SMC_Instance_ID + 1000000)

    SET @SMC_Instance_ID = @SMC_Instance_ID + 1000000

    Repeat the SELECT and the following SET to continue to retrieve data. You will get 1,000,000 rows at a time providing SMC_Instance_ID is consecutive with no gaps otherwise you get fewer rows

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I can't do summary on Server A, thats our MOM Reporting Server...

    And the 'summary' is taking forever to query...

    The SMC_Instance_ID starts at 10088422.... all have unique ID's like 2084920 or 0498092994 or 8498344899

    So will it be OK if I can set it to '10088422' then??

     

  • quote And the 'summary' is taking forever to query...

    If you are summarising 200 mil rows then it will take as long as it takes and probably a long time. Not sure that it would be faster doing the work outside of sql either. What does the execution plan for the summary look like, that might give you a few hints for improving the query.

    quote So will it be OK if I can set it to '10088422'

    If 10088422 is the lowest ID then yes start at 10088422 otherwise the first 10 iterations will select nothing

    Far away is close at hand in the images of elsewhere.
    Anon.

  • What's a point of collecting huge amount of data on some server where you cannot get any use of it?

    If you mean to work out the data in another place you need to pass the data to that another place right when it's generated.

    It's like storing accounting documents in IT department. Imagine every week you must copy whole pile of documents and drag it upstairs.

    I guess manager organized work in organization this way would be considered idiot and fired within hours.

    So why idiocy is so acceptable in IT development?

    _____________
    Code for TallyGenerator

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply