Inserting huge amounts of XML data in relational tables

  • Hi all, my problem is this.

    I have a huge xml feed that I get updates on every minute. Since it is complex enough I chose to insert it in SQL server 2005 relational tables before displaying any sort of data on my website.

    The data in the XML are stored as different nodes representing different objects and they need proccesing before insert, so I cannot use bulk insert to do the job.

    I also found out that using stored procedures is slower than gathering 20-40 queries in a string and issuing all inserts at once.

    Do you thing I'm going the right way? Is there anything else that i can use to make things quicker?

  • I actually built a solution similar to what you're describing using Bulk Load from the SQLXML.  I'm actually processing large FIXML (Financial Information eXchange) files with size in the 50+ MB.  Here are the steps to follow:

    1. Write an XSLT to transform the attributes from each node into nodes.
    2. Get a tool like NXSLT2 or Saxon to generate an XML file using the stylesheet from step 1.
    3. Write a schema (XSD) that will be used by the SQLXML Bulk Load to import your data into the database.
    4. Once you have the two files ready write a script to Bulk Load the data from your new XML file into the database using your schema (XSD). 

    I have oversimplify this process.  It is kind of complex but very, very fast.  You can find the NXSLT2 utility and SQLXML by googleing these terms.  Once you install the SQLXML, you will find plenty samples in the documentation.  If you know Spanish, I wrote an article about this process which you can find at http://syoungdesigns.blogdns.com/

    [font="Verdana"]Sal Young[/font]
    [font="Verdana"]MCITP Database Administrator[/font]

  • Unfortunately I don't know spanish but thanks for your reply.

    I' ll give it a try.

    Thanks again

  • I have found that a much faster way of parsing large amounts xml for insertion into a table is to use XmlReader. XmlReader gives you very fast, forward only read access to an xml blob. As you read through the xml, create a csv file that you can then bulk load into sql server with bcp.exe (or BULK INSERT/DTS/SSIS etc).

    Using this technique I parsed 1TB of xml into relational tables in sql in 15hrs (about 1gb/min).

    If you use the xslt approach, you suffer from having to bring the entire xml file into memory in order to transform it. It's much more efficient to use XmlReader if the transform is relatively simple (and xml to csv is very simple).

    ---------------------------------------
    elsasoft.org

  • jezemine I think I would try your solution, too.

    Actually what I do right know is to read the Xml with XmlReader and create batches of SQL insert statements. This used to work pretty good for my needs (100Mb+ xml file with 200000 inserts) in a shared hosting environment but it seems that the server is getting crowded cause I'm noticing worse and worse performance. This is why I'm searching for an other solution.

    I' m just wondering though what sort of hardware do you use to achieve 1gb/min. It just seems too good.

    Anyway your solution is closer to what I have already done so it is worth a try...

    Thank you

  • To get the 1gb/min, I ran the parser app on one server (dual proc hyper threaded, 8gig ram) and SQL Server on another (same specs). Each server has 1gbit NICs, so that helped a lot with sending so much data

    The reason your method slow is because you are generating INSERT statements, instead of bulk load files. I think you'll find that the inserts are your bottleneck, not the XmlReader. If you switch to using bulk insert, it will be much faster.

    To make bulk inserts fast, drop all indexes on the table you are inserting to, and use the TABLOCK hint. also you should use the -b flag on bcp.exe so that you don't insert the whole file as one big transaction (this could blow your transaction log if there are millions of rows in the file). Generally i use -b100000.

    More tips here: http://www.databasejournal.com/features/mssql/article.php/3095511

    Finally, in my case I had one process parsing the xml and creating the csv's, and another process bcp'ing the csvs into sql server.

    ---------------------------------------
    elsasoft.org

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply