April 20, 2007 at 2:52 am
Hello everybody
I'm trying to retrieve 200 million records from a table. Since its taking forever to query the process I decided to break it up into smaller pieces.
I did a TOP 1000000 to retrieve the first milllion. What my question is, is there a code to retrieve the next 1000000 records from that table until all the 200 million records are retrieved? Please nd thank you in advance...
Anchelin
April 20, 2007 at 3:00 am
Only if there is a unique key which you can use as part of the select to get greater than the value in the last row retrieved. I don't think that splitting the retrieval by 1,000,000 rows at a time will help much as sql will need to bypass the 1st mil to get the 2nd, 2 mil to get the third etc and for that volumn sql will most likely use table scan (although I may be wrong on that, you will need to check the execution plan)
Far away is close at hand in the images of elsewhere.
Anon.
April 20, 2007 at 3:17 am
I don't have a unique key, because all the rows I retrieve are something like 'ComputerName', InstanceName', etc... I can add another column to get the results, like 'CounterDetail_FK'
But how will my code look like then?
April 20, 2007 at 3:31 am
Without a unique key you will not be able to select the data in chunks because there is no guarantee for the order of the data without an 'ORDER BY' clause and unless the data is static you also cannot guarantee the data will be retrieved correctly.
What would be in the 'CounterDetail_FK' column (does FK equate to Foreign Key?), if it is not unique then the same problem exists.
However if 'CounterDetail_FK' is a way of grouping the data then you could select each 'group' at a time, but again if the data is not static the problem above is still there.
Can you post ddl, some sample data and explain how the output is being used as I cannot understand why you want to select 1,000,000 rows, what are you going to do with the output?
Far away is close at hand in the images of elsewhere.
Anon.
April 20, 2007 at 3:54 am
OK...
I have to retrieve this data fom another server into a table on another server, then I have to summarise this data; its for Performance purposes...
'SMC_InstaceID' is the unique key (primary key). the code is as follows:
Select TOP 1000000 Name, ComputerName, InstanceName, CounterName, SampledValue, DateTimeSampled, SMC_Instance_ID from TableName
Then I have to summarise the SampledValue column, by min, max.avg, etc. ....
Hope this helps...
April 20, 2007 at 4:10 am
So what is 'taking forever to query'
The transfer of data from ServerA to ServerB?
Or the summary?
Why the transfer, is it possible to summarise the data on the server that holds the data?
Based on your last post, if you do want to select bit by bit then
DECLARE @SMC_Instance_ID bigint
SET @@SMC_Instance_ID = 1 (Assuming that ID starts at 1)
SELECT Name, ComputerName, InstanceName, CounterName, SampledValue, DateTimeSampled, SMC_Instance_ID
FROM TableName
WHERE SMC_Instance_ID >= @SMC_Instance_ID
AND SMC_Instance_ID < (@SMC_Instance_ID + 1000000)
SET @SMC_Instance_ID = @SMC_Instance_ID + 1000000
Repeat the SELECT and the following SET to continue to retrieve data. You will get 1,000,000 rows at a time providing SMC_Instance_ID is consecutive with no gaps otherwise you get fewer rows
Far away is close at hand in the images of elsewhere.
Anon.
April 20, 2007 at 4:23 am
I can't do summary on Server A, thats our MOM Reporting Server...
And the 'summary' is taking forever to query...
The SMC_Instance_ID starts at 10088422.... all have unique ID's like 2084920 or 0498092994 or 8498344899
So will it be OK if I can set it to '10088422' then??
April 20, 2007 at 5:21 am
And the 'summary' is taking forever to query... |
If you are summarising 200 mil rows then it will take as long as it takes and probably a long time. Not sure that it would be faster doing the work outside of sql either. What does the execution plan for the summary look like, that might give you a few hints for improving the query.
So will it be OK if I can set it to '10088422' |
If 10088422 is the lowest ID then yes start at 10088422 otherwise the first 10 iterations will select nothing
Far away is close at hand in the images of elsewhere.
Anon.
April 20, 2007 at 5:36 am
What's a point of collecting huge amount of data on some server where you cannot get any use of it?
If you mean to work out the data in another place you need to pass the data to that another place right when it's generated.
It's like storing accounting documents in IT department. Imagine every week you must copy whole pile of documents and drag it upstairs.
I guess manager organized work in organization this way would be considered idiot and fired within hours.
So why idiocy is so acceptable in IT development?
_____________
Code for TallyGenerator
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply