Unable to link data from data pulled from XML

  • drew.allen (8/17/2015)


    Eirikur Eiriksson (8/14/2015)


    drew.allen (8/14/2015)


    Eirikur Eiriksson (8/14/2015)


    Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.

    And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.

    Drew

    Might I ask you to elaborate further on the complexity and the role of the text() function within XQuery that you mentioned? This sounds really interesting and I love any opportunity to learn something new!

    😎

    Unfortunately, I don't know the details of that complexity. The reason I posted my results in the first place is that this explanation didn't account for my data, and I was hoping that someone would be able to explain why.

    Drew

    My original response was to point out that including the text() node is important for improving performance. On my system (SQL 2014 Enterprise 8 cores/32gb) the optimizer showed that including the text() node made the query faster. That has always been my experience. If you are saying that this is not true based on what the optimizer is estimating then I would warn you that the oprimizer's estimations are not always correct. In my follow-up test the optimizer said that the query() method and value() method were equal in performance but ACTUAL testing proved that the query() method was slower. The optimizer's estimations were wrong.

    Performance aside, including the text() node is just good programming. Without it you are relying on the parser's default behavior to get you your result. Without it, what you are really saying is, "give me all the child objects here" and the parser is returning text because that's the default behavior. Including the text() node tells the XML parser that you want the text. It's a more scientific approach to programming to say exactly what you want instead of depending on the applicaton's default behavior to provide you with the correct answer.

    As a fellow BI Developer who probably deals with lots of XML I thought this would be worth pointing out.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • In the above query which kind of secoundary xml index will be good for extracting the data from xml if the xmls are sored in a table.

  • Alan.B (8/17/2015)


    Performance aside, including the text() node is just good programming. Without it you are relying on the parser's default behavior to get you your result. Without it, what you are really saying is, "give me all the child objects here" and the parser is returning text because that's the default behavior. Including the text() node tells the XML parser that you want the text. It's a more scientific approach to programming to say exactly what you want instead of depending on the applicaton's default behavior to provide you with the correct answer.

    I think I have mentioned this before on these forums but it is worth mentioning again.

    Specifying the text() node is semantically different from not specifying the text() node. That difference is what makes SQL Server build a faster execution plan when you specify text().

    The difference will only show in the result if you have mixed content XML and that is not very common so in most cases you can use text() without having to worry about getting the wrong data.

    Here is a sample that shows the difference for mixed content XML.

    declare @XML xml = '<root>There can <bold>be</bold> many <italic>text</italic> nodes in XML</root>';

    select @XML.value('root[1]', 'varchar(100)');

    select @XML.value('(root/text())[1]', 'varchar(100)');

    Result:

    [font="Courier New"]----------------------------------------------------------------------------------------------------

    There can be many text nodes in XML

    ----------------------------------------------------------------------------------------------------

    There can

    [/font]

    The execution plan for the version not using text() uses a TVF to get the specified node and then joins to another call to a TVF that gets the text() nodes of all child nodes. The text() nodes found are then aggregated to one value in the UDX operator.

    When specifying the text() you only get one call to a TVF that gets the text() node value.

  • Mikael Eriksson SE (8/18/2015)


    I think I have mentioned this before on these forums but it is worth mentioning again.

    Specifying the text() node is semantically different from not specifying the text() node. That difference is what makes SQL Server build a faster execution plan when you specify text().

    The difference will only show in the result if you have mixed content XML and that is not very common so in most cases you can use text() without having to worry about getting the wrong data.

    And more, about query() vs value() :

    declare @XML xml = '<root>There can <bold>be</bold> many <italic>text</italic> nodes in XML</root>

    <root>And <bold>more</bold> MS SQL <italic>XML</italic> allows for multiple roots</root>';

    select @XML.value('root[1]', 'varchar(500)');

    select CAST(@XML.query('data(root)') AS varchar(500));

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply