One Big Table To Two Smaller Ones

Question

Post reply

One Big Table To Two Smaller Ones

J Hines

Old Hand

Points: 375
More actions
September 15, 2003 at 7:03 am

#81978

I need Help!

I have one very large table (12 million rows) with the following columns:

(DID, datetime, origination, servicetype, serviceloc, servicecon, servicename, host, value)

I would like to seperate the table into two tables with the following columns:

Table1

(DID,origination, servicetype, serviceloc, servicecon, servicename, host)

Table2

(DID,datetime, value1)

Currently the data is in one table and even with indexes the performance is very poor. I believe by doing this I can drasticaly improve the performance of the reports hitting the tables since the number rows in table1 would be less than 1000 and the number of rows in table2 would be around 12 million.

I appreciate any help I can get with this.

Thanks,

Jamey

- JH

- JH

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

guarddata SSCertifiable Points: 6583 More actions · Answer 1

You are almost there already. Now that you have created the tables, you can

INSERT INTO Table1

SELECT DISTINCT DID,origination, servicetype, serviceloc, servicecon, servicename, host

FROM <OriginalTable> (NOLOCK)

INSERT INTO Table2

SELECT DID, datetime, value1

FROM <OriginalTable> (NOLOCK)

I would recommend leaving the indexes off table 1 and 2 until after they have data...it will speed up the insert.

Good luck

Guarddata-

G.R.Prithiviraj Kulasingham SSCarpal Tunnel Points: 4104 More actions · Answer 2

According to your analyis, currently the fields DID,origination, servicetype, serviceloc, servicecon, servicename, host are duplicated and you want to normalize the table. DID will be the primary key for the new table1 (referenced in Table2.

create a new table called Table 1 with the fields DID,origination, servicetype, serviceloc, servicecon, servicename, host and the run the query.

Insert into Table1 Select distinct DID,origination, servicetype, serviceloc, servicecon, servicename, host from <CurrentTable>

Now rename the <CurrentTable> into Table2 and remove the unnneccesary fields.

Create the relationship properly.

I suggest you to create view <CurrentName> which is a selection from both tables.

You can have instead of Insert, instead of update & instead of delete triggers to handle any code referring the old table.

This is only a suggestion.

G.R. Preethiviraj Kulasingham

Chief Technology Officer.

Softlogic Information Systems Limited,

14 De Fonseka Place,

Colombo 05.

Sri Lanka.

Cheers,
Prithiviraj Kulasingham

http://preethiviraj.blogspot.com/

J Hines Old Hand Points: 375 More actions · Answer 3

Thtanks for the responses!

If I forgot to mention that the DID column is an ID field. Therefore, if I do a select distinct on it it will return all 12 million rows.

I built the following code and it seems to work however when running it against the 12 million rows it runs for roughly 24 hours. (which is a problem)

Here's what I have so far:

--Updates the Descrition table with any new data

INSERT INTO tblDescription (origination, servicetype, serviceloc, host, servicecon)

SELECT DISTINCT 'MailStats', service, param, server, 'Residential'

FROM tblAll a

WHERE NOT EXISTS (SELECT a.service, a.param, a.server FROM tblAll a, tblDescription d WHERE a.service = d.servicetype and a.param = d.serviceloc and a.server = d.host)

--Scans through decription table and inserts values into the value table and links the desciption table to the values table by the DID column.

INSERT INTO tblResults (DID,[datetime], datamonth, value1)

SELECT d.DID, a.date, DATEPART(Month,a.date), a.value

FROM tblAll a, tblDescription d

WHERE EXISTS (SELECT a.service, a.param, a.server FROM tblAll a, tbldescription d WHERE a.service = d.servicetype and a.param = d.serviceloc and a.server = d.host)

Sorry for the elementary coding here. I am VERY new to this!

- JH

guarddata SSCertifiable Points: 6583 More actions · Answer 4

INSERT INTO Table1

12 million rows is a lot of data to cut your teeth on .

Remove the WHERE NOT EXISTS clause from your statements. Since nothing is in the destination table to begin with, this condition should always be true.

HOWEVER, the real problem is the duplication of the key fields. If all 6 fields define the key as you seem to indicate, try this

CREATE TABLE tblDescription (

DescID INT IDENTITY(1,1),

origination ....add the proper data types

servicetype

serviceloc

servicecon

servicename

host

)

CREATE TABLE tblResults (

DescID INT,

datetime .... not a good idea to use a reserved word like this...try something like ResultTime

value ....

)

INSERT INTO tblDescription (origination, servicetype, serviceloc, servicecon, servicename, host)

SELECT DISTINCT origination, servicetype, serviceloc, servicecon, servicename, host

from tblAll

INSERT INTO tblResults ( DescID, ResultTime, Value )

SELECT D.DescID, A.ResultTime, A.Value

FROM tblAll A

INNER JOIN tblDescription D ON

D.origination = A.origination

AND D.servicetype = A.servicetype

AND D.serviceloc = A.serviceloc

AND D.servicecon = A.servicecon

AND D.servicename = A.servicename

AND D.host = A.host

Good luck. I would recommend picking some some good books or tutorials dealing with Database structure and design.

Guarddata-

NPratt SSC Enthusiast Points: 147 More actions · Answer 5

24 hours to run a report on a 12 million rows table sounds a bit excessive if you ask me. Have you defined any indexes on your tables referenced in this report? Look at your JOIN columns and WHERE criteria columns and make some intelligent decisions on whether to build indexes on them. And if the answer is yes to above question, have you checked to see that they are not fragmented maybe?

Nikki Pratt

Development DBA

Nikki Pratt
Development DBA

Gabor Nyul SSCrazy Eights Points: 9459 More actions · Answer 6

Jamey,

guarddata has given the right answer.

I would extent his version for table2 insert by the following:

INSERT INTO Table2

SELECT DID, datetime, value1

FROM <OriginalTable> (NOLOCK)

WHERE datetime is not null

As you have written you would have around 1000 records in table2.

As I don't know your data I'assumed that the datetime column is the selective one (either it has a value or it has not). Of cours you should modify this query to fit your environment (check the value1 column for not null or other thisg which fits to your requirements)

By creating the indexes after the data load the whole insert processes should not take too long for 12M records.

Bye

Gabor

Bye

Gabor

Bye
Gabor

Gabor Nyul SSCrazy Eights Points: 9459 More actions · Answer 7

Or there is an another possibility what I do if I have a large table to remodel (over 100M rows)

- I do a bcp out into a file from a view corresponding to table1 definition after a second bcp out corresponding to table2 definition.

- Create tbale1 and table2 whithout indexes

- bcp in into table1 and table2

- create the required indexes

Assuming you have set the recovery mode to bulk logged there will be a minimum of logging therefore this one is the fastest solution over all.

Bye

Gabor

Bye
Gabor

J Hines Old Hand Points: 375 More actions · Answer 8

Thanks everyone for all of your help!

Just to give some more detail to my set up. Table1 should only have 1000 or so rows, however, during each insert I must check and make sure that the data from the large table is in Table1, if it is then just grab the DID and insert it along with the datetime, and value into Table2. If it's not in Table1 then insert it into Table1 and then grab the DID number along with the datetime and value and insert it into Table2.

The data is generated like so...We have approx. 1000 systems that are being monitored (The data in Table1). Each system is being polled approx. 500 times a day (the data in Table2). Currently there is only one table and all of the data is being dumped into it every 30 minutes. This is causing very slow performance on the system since I don't have time to reIndex the table between updates I have already ordered another server to set up a data wharehouse on in order to report off of, this will give some relief to the primary server.

Hopefully this will give you all an idea as too what I need and also an idea of what the tables are used for. Again I want to thank everyone for their help with this matter.

Thanks,

Jamey

- JH