Best way to handle massive inserts?

  • I just been handed a project that has only one table, and one proc.

    This table has 300 inserts per second, and grows to about 140 gig, before they truncate it. They truncate it once a week.

    They are talking about using ETL, and/or breaking it out into two tables..one for the inserts one for the selects. Is this best done by re-writing the insert query to write to two indentical tables? Or write an SSIS package to do it?

    I don't know much about this yet, (& don't know ETL) but I guess the problem is the slowdowns on the searching of this huge table. It's constantly doing inserts.

    Would it be best to simply write all the inserts out to a text file? If so, how would I search it?

    What are my options?

    ~Thanks in advance

  • I guess it depends on what the relationship is between all the inserts and the selects.

    What kind of data is it and how is it used?

    Are people querying it (reports or whatever) in such a way that data that's less than 1 second old matters?

    What is the source of all the inserts? Is it lots and lots of users adding one row at a time each, or is it some ETL process that reads from a file and inserts one row at a time, or something else?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I haven't seen it yet, But I'm told it's diagnostic data. Users, transactions, searches that sorta thing. The table only has about 10 columns tho, fwiw.

    Yes, they are querying it in such a way that it needs to be immediatly available data. Not less then 1 second, but less then 30 seconds.

    Yes, it's lots of differnet users from all over the place adding one row at a time.

    Thanks for answering GSquared

  • In that case, I'm not clear on what a second table would solve.

    I guess you could have a second table that would get loaded every 30 seconds from the primary table, and you could query that. Would possibly have less resource contention most of the time that way. But it wouldn't really do what you want, which is speed up selects from the table.

    What you're talking about, a few hundred inserts per second and a bunch of selects, is pretty normal. It shouldn't behave too badly in a case like that. What's the specific problem that you're running into?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Yeah..I know what u mean about adding a second table.

    I guess I was thinking that the (#1) insert table would have no indexes or pkeys, and the (#2) searchable table would be indexed for the searches. But I don't know if I would really gain anything because I would still need to do inserts into that 'search' table.

    I think the problem they are having is because of the constant inserts, when they go to run the queries against this table it takes forever. I don't think they get errors, but I'm not sure of that yet.

    How would I handle this? If I remove the indexes & pkey then the inserts will be fast, but searches queries slow..

    I'm guessing they don't have it indexed properly on the retrieval side. Do you think if I index it correctly, to get fast retrievals/searches, the impact for these 300 inserts per seconds would still be okay?

    Or would adding many indexes to the table slow the inserts down too much?

  • I agree with GSquared, that sort of load should be pretty easily handled by SQL Server. What are you clustering on? For tables where write speed is important, I usually use an identity column for the clustered index, as it's a guaranteed sequential write. I'd also make sure to toss a covering index on the table for your selects, but see if you can get away with only one or maybe two indexes on the table; the less indexes, the faster the write. If you can post additional details (table structures, selects, etc.), we can help further.

    Regards,

    Michelle Ufford
    SQLFool.com - Adventures in SQL Tuning

  • Thanks Michelle, will do.

  • Just so I'm clear on this, you all think that SQL should be able to handle 300 unique inserts per second and still be able to handle queries run against it, without a problem?

    If so, I would think I should work on the indexes and query optimization for this table, correct?

  • Without knowing more about your environment, that's my general suggestion, yes. 🙂

    To give you an idea, I have some tables that do pretty much the same tasks as your table, and they're able to reach 3-4k inserts per second, on a 24-column table. Now, hardware also plays a part, and there could of course be issues with IO bottlenecks, etc. However, if you're really just having this problem on one or two tables and not on the whole system, then there's probably some investigation and tuning to be done on those tables.

    So general suggestions are: Run a trace and monitor what's happening on your table. Make sure your selects are seeks and not performing bookmark lookups. Try using NoLock hints on your selects, assuming dirty reads are acceptable.

    Regards,

    Michelle Ufford
    SQLFool.com - Adventures in SQL Tuning

  • Great advise

    Thanks Michelle

  • Do NOT use NoLock hints unless it's okay for the people viewing the data to get incorrect information.

    I'd start with checking execution plans for the slow queries. If you're getting table scans, then there's your problem.

    I'd also check the table schema, as has been mentioned, and make sure there's a clustered index that can handle lots of inserts. If, for example, someone has the clustered index on a UniqueIdentifier column and is defaulting it to NewID, then that's at least part of your problem right there. If, on the other hand, it's a clustered index on either an identity column or a time stamp (datetime) column, then that's okay. If there's no clustered index, and no nonclustered index that satisfies the main queries, then there's your problem right there.

    If you have trouble finding the issue from the execution plans, then save them as sqlplan files, zip them up, and upload them here, and we can take a look and help out.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks SSCarpal Tunnel, I will try your suggestions!

  • "SSC Carpal Tunnel" is just a title for writing far too many posts on these boards. The actual screen name is in bold right above that, "GSquared". You've got the same on your posts.

    Not that it matters much, but thought I'd mention it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Sorry GSquared.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply