Unable to link data from data pulled from XML

Question

Post reply

Unable to link data from data pulled from XML

samalex

SSCertifiable

Points: 5428
More actions
August 13, 2015 at 11:12 am

#398896

Hi all,

I'm pulling data from XML into tables, but I'm unsure how to link the data after it's imported. This example has names and tasks, and I can pull the data into two tables, but I can't find any way to link the task to the appropriate person. My person and task tables populate without issue, but there's nothing I can find to link the rows together. So in this example Test 1 would go to the first two Tasks and Test 2 would go to the second two work items.

Thanks for any suggestions.

DECLARE @XML TABLE (XMLData XML); DECLARE @Person Table (Name NVARCHAR(50), Addresss NVARCHAR(50)); DECLARE @Task Table (Name NVARCHAR(50), Details NVARCHAR(50)); INSERT INTO @XML SELECT ' <process> <header> <Person><Name>Test1</Name><Address>123 main street</Address></Person> <Tasks> <task><Name>Do some work</Name><details>details</details></task> <task><Name>Do some more work</Name><details>details more</details></task> </Tasks> </header> <header> <Person><Name>Test2</Name><Address>234 main street</Address></Person> <Tasks> <task><Name>Do some work 2</Name><details>details 2 </details></task> <task><Name>Do some more work3</Name><details>details more 3</details></task> </Tasks> </header> </process>' SELECT CONVERT(NVARCHAR(255),c.query('data(Name)')) AS name, CONVERT(NVARCHAR(255),c.query('data(Address)')) AS address FROM @XML r CROSS APPLY XMLData.nodes('process/header/Person') x(c) SELECT CONVERT(NVARCHAR(255),c.query('data(Name)')) AS name, CONVERT(NVARCHAR(255),c.query('data(details)')) AS address FROM @XML r CROSS APPLY XMLData.nodes('process/header/Tasks/task') x(c)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 76776 More actions · Answer 1

Instead of trying to shred them separately and then put them together, shred them together and separate them.



DECLARE @XML TABLE (XMLData XML);

DECLARE @Person_Tasks TABLE( person_name NVARCHAR(50), person_address NVARCHAR(50), task_name NVARCHAR(50), task_details NVARCHAR(50))

DECLARE @Person Table (Name NVARCHAR(50), Addresss NVARCHAR(50));

DECLARE @Task Table (Name NVARCHAR(50), Details NVARCHAR(50));

INSERT INTO @XML SELECT '

<process>

	<header>

		<Person><Name>Test1</Name><Address>123 main street</Address></Person>

		<Tasks>

			<task><Name>Do some work</Name><details>details</details></task>

			<task><Name>Do some more work</Name><details>details more</details></task>

		</Tasks>

	</header>

	<header>

		<Person><Name>Test2</Name><Address>234 main street</Address></Person>

		<Tasks>

			<task><Name>Do some work 2</Name><details>details 2 </details></task>

			<task><Name>Do some more work3</Name><details>details more 3</details></task>

		</Tasks>

	</header>

</process>'

INSERT @Person_Tasks(person_name, person_address, task_name, task_details)

SELECT p.c.value('Name[1]', 'NVARCHAR(50)') AS person_name

,	p.c.value('Address[1]', 'NVARCHAR(50)') AS person_address

,	t.c.value('Name[1]', 'NVARCHAR(50)') AS task_name

,	t.c.value('details[1]', 'NVARCHAR(50)') AS task_details

FROM @XML

CROSS APPLY XMLData.nodes('process/header/Person') p(c)

CROSS APPLY c.nodes('../Tasks/task') t(c)

INSERT @Person( Name, Addresss )

SELECT DISTINCT person_name, person_address

FROM @Person_Tasks

INSERT @Task( Name, Details )

SELECT DISTINCT task_name, task_details

FROM @Person_Tasks

SELECT *

FROM @Person_Tasks

SELECT *

FROM @Person

SELECT *

FROM @Task

It is possible to put them together once they've been shredded separately, but it's much more complicated. Also, you might try using the value function instead of the query function if you want to pull values out of xml data.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Alan Burstein SSC Guru Points: 61136 More actions · Answer 2

Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

DECLARE @XML TABLE (XMLData XML);



DECLARE @Person_Tasks TABLE( person_name NVARCHAR(50), person_address NVARCHAR(50), task_name NVARCHAR(50), task_details NVARCHAR(50))

DECLARE @Person Table (Name NVARCHAR(50), Addresss NVARCHAR(50));

DECLARE @Task Table (Name NVARCHAR(50), Details NVARCHAR(50));

INSERT INTO @XML SELECT '

<process>

	<header>

		<Person><Name>Test1</Name><Address>123 main street</Address></Person>

		<Tasks>

			<task><Name>Do some work</Name><details>details</details></task>

			<task><Name>Do some more work</Name><details>details more</details></task>

		</Tasks>

	</header>

	<header>

		<Person><Name>Test2</Name><Address>234 main street</Address></Person>

		<Tasks>

			<task><Name>Do some work 2</Name><details>details 2 </details></task>

			<task><Name>Do some more work3</Name><details>details more 3</details></task>

		</Tasks>

	</header>

</process>'

-- Not specifying the text() node

SELECT p.c.value('Name[1]', 'NVARCHAR(50)') AS person_name

,	p.c.value('Address[1]', 'NVARCHAR(50)') AS person_address

,	t.c.value('Name[1]', 'NVARCHAR(50)') AS task_name

,	t.c.value('details[1]', 'NVARCHAR(50)') AS task_details

FROM @XML

CROSS APPLY XMLData.nodes('process/header/Person') p(c)

CROSS APPLY c.nodes('../Tasks/task') t(c)

-- specifying the text() node

SELECT p.c.value('(Name/text())[1]', 'NVARCHAR(50)') AS person_name

,	p.c.value('(Address/text())[1]', 'NVARCHAR(50)') AS person_address

,	t.c.value('(Name/text())[1]', 'NVARCHAR(50)') AS task_name

,	t.c.value('(details/text())[1]', 'NVARCHAR(50)') AS task_details

FROM @XML

CROSS APPLY XMLData.nodes('process/header/Person') p(c)

CROSS APPLY c.nodes('../Tasks/task') t(c)

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

drew.allen SSC Guru Points: 76776 More actions · Answer 3

Alan.B (8/13/2015)

Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Alan Burstein SSC Guru Points: 61136 More actions · Answer 4

drew.allen (8/13/2015)

Alan.B (8/13/2015)

Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

Drew

On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

In other news if you compare the query() method to the value() method the optimizer tells you that you get the same performance for each :-P. I know from experience that, for something like this, the value() method is much faster (see attached query plan).

So I put together this little performance test:



-- let's get sample data

SET NOCOUNT ON;

IF OBJECT_ID('tempdb..#XML') IS NOT NULL DROP TABLE #XML;

CREATE TABLE #XML (id int primary key, XMLData XML);

INSERT INTO #XML

SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

'<process>

	<header>

		<Person><Name>Test1</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

		<Tasks>

			<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details</details></task>

			<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more</details></task>

		</Tasks>

	</header>

	<header>

		<Person><Name>Test2</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

		<Tasks>

			<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details 2 </details></task>

			<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more 3</details></task>

		</Tasks>

	</header>

</process>'

FROM sys.all_columns a, sys.all_columns b;

GO

-- clear the cache and start fresh...

DBCC FREEPROCCACHE with no_infomsgs

DBCC DROPCLEANBUFFERS with no_infomsgs

GO

PRINT '=== Not specifying the text() node ==='

DECLARE

  @starttime datetime = getdate(),

  @x1 nvarchar(50),

  @x2 nvarchar(50),

  @x3 nvarchar(50),

  @x4 nvarchar(50);

SELECT

@x1 = p.c.value('Name[1]', 'NVARCHAR(50)'),

@x2 = p.c.value('Address[1]', 'NVARCHAR(50)'),

@x3 = t.c.value('Name[1]', 'NVARCHAR(50)'),

@x4 = t.c.value('details[1]', 'NVARCHAR(50)')

FROM #XML

CROSS APPLY XMLData.nodes('process/header/Person') p(c)

CROSS APPLY c.nodes('../Tasks/task') t(c);

PRINT DATEDIFF(MS,@starttime, getdate());

GO 3

PRINT '=== specifying the text() node ==='

DECLARE

  @starttime datetime = getdate(),

  @x1 nvarchar(50),

  @x2 nvarchar(50),

  @x3 nvarchar(50),

  @x4 nvarchar(50);

SELECT

@x1 = p.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

@x2 = p.c.value('(Address/text())[1]', 'NVARCHAR(50)'),

@x3 = t.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

@x4 = t.c.value('(details/text())[1]', 'NVARCHAR(50)')

FROM #XML

CROSS APPLY XMLData.nodes('process/header/Person') p(c)

CROSS APPLY c.nodes('../Tasks/task') t(c);

PRINT DATEDIFF(MS,@starttime, getdate());

GO 3

PRINT '=== Using the query() method ==='

DECLARE

  @starttime datetime = getdate(),

  @x1 nvarchar(50),

  @x2 nvarchar(50),

  @x3 nvarchar(50),

  @x4 nvarchar(50);

SELECT

@x1 = CAST(p.c.query('data(Name)') AS nvarchar(50)),

@x2 = CAST(p.c.query('data(Address)')AS nvarchar(50)),

@x3 = CAST(t.c.query('data(Name)')AS nvarchar(50)),

@x4 = CAST(t.c.query('data(details)')AS nvarchar(50))

FROM #XML

CROSS APPLY XMLData.nodes('process/header/Person') p(c)

CROSS APPLY c.nodes('../Tasks/task') t(c);

PRINT DATEDIFF(MS,@starttime, getdate());

GO 3

Though the optimizer said that the value() method with text() would be fastest and the value() and query() method would be tied for second the results of the test told a different story...



Beginning execution loop

=== Not specifying the text() node ===

810

=== Not specifying the text() node ===

663

=== Not specifying the text() node ===

646

Batch execution completed 3 times.

Beginning execution loop

=== specifying the text() node ===

1603

=== specifying the text() node ===

1616

=== specifying the text() node ===

1590

Batch execution completed 3 times.

Beginning execution loop

=== Using the query() method ===

3166

=== Using the query() method ===

3263

=== Using the query() method ===

3330

Batch execution completed 3 times.

Interestingly, the text() node specification slows the query down (this is not always the case but was the case in this test). As expected, the query method() was way slower. Unfortunately for the query method it was not getting a parallel query plan so, to level the playing field, I added Adam Machanic's make_parallel() function to the mix and re-ran that query() method with a parallel plan.

Results:

Beginning execution loop



=== Using the query() method with make_parallel() ===

1150

=== Using the query() method with make_parallel() ===

1110

=== Using the query() method with make_parallel() ===

1110

Batch execution completed 3 times.

Better but still slower than value() method without specifying the text() node. Moral of the story: don't always trust the query optimizer estimations.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 5

Alan.B (8/13/2015)

drew.allen (8/13/2015)

Alan.B (8/13/2015)

Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

Drew

On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

In other news if you compare the query() method to the value() method the optimizer tells you that you get the same performance for each :-P. I know from experience that, for something like this, the value() method is much faster (see attached query plan).

So I put together this little performance test:

Both solutions are sub-optimal as they imply work (cost) which more than offsets any gains from the text() function. The culprit is the traversing nodes for the Tasks/task, it's simply a killer!

😎

Using Alan's fine test harness to proof this, observe that without the traversing (../) the query is almost 3 times faster than any of the others.

-- let's get sample data



SET NOCOUNT ON;

IF OBJECT_ID('tempdb..#XML') IS NOT NULL DROP TABLE #XML;

CREATE TABLE #XML (id int primary key, XMLData XML);

INSERT INTO #XML

SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

'<process>

	<header>

		<Person><Name>Test1</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

		<Tasks>

			<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details</details></task>

			<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more</details></task>

		</Tasks>

	</header>

	<header>

		<Person><Name>Test2</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

		<Tasks>

			<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details 2 </details></task>

			<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more 3</details></task>

		</Tasks>

	</header>

</process>'

FROM sys.all_columns a, sys.all_columns b;

GO

-- clear the cache and start fresh...

DBCC FREEPROCCACHE with no_infomsgs

DBCC DROPCLEANBUFFERS with no_infomsgs

GO

PRINT '=== (EE) No traversing and text() ==='

DECLARE

  @starttime datetime = getdate(),

  @x1 nvarchar(50),

  @x2 nvarchar(50),

  @x3 nvarchar(50),

  @x4 nvarchar(50);

SELECT

     @x1 = HEADER.DATA.value('(Person/Name/text())[1]'     ,'NVARCHAR(50)')    --AS Person_Name

    ,@x2 = HEADER.DATA.value('(Person/Address/text())[1]'  ,'NVARCHAR(50)')    --AS Person_Address

    ,@x3 = TASK.DATA.value('(Name/text())[1]'              ,'NVARCHAR(50)')    --AS Task_Name

    ,@x4 = TASK.DATA.value('(details/text())[1]'           ,'NVARCHAR(50)')    --AS Task_details

FROM #XML   XD

CROSS APPLY XD.XMLData.nodes('process/header')  AS HEADER(DATA)

CROSS APPLY HEADER.DATA.nodes('Tasks/task')     AS TASK(DATA);

PRINT DATEDIFF(MS,@starttime, getdate());

GO 3

PRINT '=== Not specifying the text() node ==='

DECLARE

  @starttime datetime = getdate(),

  @x1 nvarchar(50),

  @x2 nvarchar(50),

  @x3 nvarchar(50),

  @x4 nvarchar(50);

SELECT

@x1 = p.c.value('Name[1]', 'NVARCHAR(50)'),

@x2 = p.c.value('Address[1]', 'NVARCHAR(50)'),

@x3 = t.c.value('Name[1]', 'NVARCHAR(50)'),

@x4 = t.c.value('details[1]', 'NVARCHAR(50)')

FROM #XML

CROSS APPLY XMLData.nodes('process/header/Person') p(c)

CROSS APPLY c.nodes('../Tasks/task') t(c);

PRINT DATEDIFF(MS,@starttime, getdate());

GO 3

PRINT '=== specifying the text() node ==='

DECLARE

  @starttime datetime = getdate(),

  @x1 nvarchar(50),

  @x2 nvarchar(50),

  @x3 nvarchar(50),

  @x4 nvarchar(50);

SELECT

@x1 = p.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

@x2 = p.c.value('(Address/text())[1]', 'NVARCHAR(50)'),

@x3 = t.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

@x4 = t.c.value('(details/text())[1]', 'NVARCHAR(50)')

FROM #XML

CROSS APPLY XMLData.nodes('process/header/Person') p(c)

CROSS APPLY c.nodes('../Tasks/task') t(c);

PRINT DATEDIFF(MS,@starttime, getdate());

GO 3

PRINT '=== Using the query() method ==='

DECLARE

  @starttime datetime = getdate(),

  @x1 nvarchar(50),

  @x2 nvarchar(50),

  @x3 nvarchar(50),

  @x4 nvarchar(50);

SELECT

@x1 = CAST(p.c.query('data(Name)') AS nvarchar(50)),

@x2 = CAST(p.c.query('data(Address)')AS nvarchar(50)),

@x3 = CAST(t.c.query('data(Name)')AS nvarchar(50)),

@x4 = CAST(t.c.query('data(details)')AS nvarchar(50))

FROM #XML

CROSS APPLY XMLData.nodes('process/header/Person') p(c)

CROSS APPLY c.nodes('../Tasks/task') t(c);

PRINT DATEDIFF(MS,@starttime, getdate());

GO 3

PRINT '=== No traversing and text() ==='

DECLARE

  @starttime datetime = getdate(),

  @x1 nvarchar(50),

  @x2 nvarchar(50),

  @x3 nvarchar(50),

  @x4 nvarchar(50);

SELECT

     @x1 = HEADER.DATA.value('(Person/Name/text())[1]'     ,'NVARCHAR(50)')    --AS Person_Name

    ,@x2 = HEADER.DATA.value('(Person/Address/text())[1]'  ,'NVARCHAR(50)')    --AS Person_Address

    ,@x3 = TASK.DATA.value('(Name/text())[1]'              ,'NVARCHAR(50)')    --AS Task_Name

    ,@x4 = TASK.DATA.value('(details/text())[1]'           ,'NVARCHAR(50)')    --AS Task_details

FROM #XML   XD

CROSS APPLY XD.XMLData.nodes('process/header')  AS HEADER(DATA)

CROSS APPLY HEADER.DATA.nodes('Tasks/task')     AS TASK(DATA);

PRINT DATEDIFF(MS,@starttime, getdate());

GO 3

Results

Beginning execution loop



=== (EE) No traversing and text() ===

396

=== (EE) No traversing and text() ===

313

=== (EE) No traversing and text() ===

310

Batch execution completed 3 times.

Beginning execution loop

=== Not specifying the text() node ===

796

=== Not specifying the text() node ===

830

=== Not specifying the text() node ===

856

Batch execution completed 3 times.

Beginning execution loop

=== specifying the text() node ===

1993

=== specifying the text() node ===

1970

=== specifying the text() node ===

1926

Batch execution completed 3 times.

Beginning execution loop

=== Using the query() method ===

4170

=== Using the query() method ===

4183

=== Using the query() method ===

4153

Batch execution completed 3 times.

Beginning execution loop

=== No traversing and text() ===

316

=== No traversing and text() ===

313

=== No traversing and text() ===

313

Batch execution completed 3 times.

drew.allen SSC Guru Points: 76776 More actions · Answer 6

Alan.B (8/13/2015)

drew.allen (8/13/2015)

Alan.B (8/13/2015)

Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

Drew

On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

You misunderstood. The original query ran FASTER (48/52) not slower (52/48) than the query with text() on my system.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 7

drew.allen (8/14/2015)

Alan.B (8/13/2015)

drew.allen (8/13/2015)

Alan.B (8/13/2015)

Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

Drew

On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

You misunderstood. The original query ran FASTER (48/52) not slower (52/48) than the query with text() on my system.

Drew

The text() is faster than without it, here are results from two versions of non-traversing XQueries from my previous post, one with and on without the text().

😎

=== (EE) No traversing and text() === 423



=== (EE) No traversing and text() ===   310

=== (EE) No traversing and text() ===   313

--------------------------------------------

=== (EE) No traversing ===              570

=== (EE) No traversing ===              570

=== (EE) No traversing === 613

drew.allen SSC Guru Points: 76776 More actions · Answer 8

Eirikur Eiriksson (8/14/2015)

The text() is faster than without it, here are results from two versions of non-traversing XQueries from my previous post, one with and on without the text().

First, I was comparing the two traversing methods, so posting results from the non-traversing method isn't really relevant.

Second, I'm saying that I'm getting different results with my setup. I don't care how many times you post results from YOUR setup, it isn't going to change what I am getting from MY setup.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 9

drew.allen (8/14/2015)

Eirikur Eiriksson (8/14/2015)

The text() is faster than without it, here are results from two versions of non-traversing XQueries from my previous post, one with and on without the text().

First, I was comparing the two traversing methods, so posting results from the non-traversing method isn't really relevant.

Second, I'm saying that I'm getting different results with my setup. I don't care how many times you post results from YOUR setup, it isn't going to change what I am getting from MY setup.

Drew

I find these comments slightly amusing, are you having a bad day?

😎

Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 10

Easy now, folks. These types of things are how we all learn. 😉 Well, except for me. I reject all XML whenever I can. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

drew.allen SSC Guru Points: 76776 More actions · Answer 11

Eirikur Eiriksson (8/14/2015)

Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.

And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 12

drew.allen (8/14/2015)

Eirikur Eiriksson (8/14/2015)

Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.

And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.

Drew

Might I ask you to elaborate further on the complexity and the role of the text() function within XQuery that you mentioned? This sounds really interesting and I love any opportunity to learn something new!

😎

drew.allen SSC Guru Points: 76776 More actions · Answer 13

Eirikur Eiriksson (8/14/2015)

drew.allen (8/14/2015)

Eirikur Eiriksson (8/14/2015)

Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.

And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.

Drew

Might I ask you to elaborate further on the complexity and the role of the text() function within XQuery that you mentioned? This sounds really interesting and I love any opportunity to learn something new!

😎

Unfortunately, I don't know the details of that complexity. The reason I posted my results in the first place is that this explanation didn't account for my data, and I was hoping that someone would be able to explain why.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Alan Burstein SSC Guru Points: 61136 More actions · Answer 14

Eirikur Eiriksson (8/14/2015)

Alan.B (8/13/2015)

drew.allen (8/13/2015)

Alan.B (8/13/2015)

Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

Drew

On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

In other news if you compare the query() method to the value() method the optimizer tells you that you get the same performance for each :-P. I know from experience that, for something like this, the value() method is much faster (see attached query plan).

So I put together this little performance test:

Both solutions are sub-optimal as they imply work (cost) which more than offsets any gains from the text() function. The culprit is the traversing nodes for the Tasks/task, it's simply a killer!

😎

Using Alan's fine test harness to proof this, observe that without the traversing (../) the query is almost 3 times faster than any of the others.

-- let's get sample data
SET NOCOUNT ON; IF OBJECT_ID('tempdb..#XML') IS NOT NULL DROP TABLE #XML; CREATE TABLE #XML (id int primary key, XMLData XML); INSERT INTO #XML SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '<process> <header> <Person><Name>Test1</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person> <Tasks> <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details</details></task> <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more</details></task> </Tasks> </header> <header> <Person><Name>Test2</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person> <Tasks> <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details 2 </details></task> <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more 3</details></task> </Tasks> </header> </process>' FROM sys.all_columns a, sys.all_columns b; GO -- clear the cache and start fresh... DBCC FREEPROCCACHE with no_infomsgs DBCC DROPCLEANBUFFERS with no_infomsgs GO PRINT '=== (EE) No traversing and text() ===' DECLARE @starttime datetime = getdate(), @x1 nvarchar(50), @x2 nvarchar(50), @x3 nvarchar(50), @x4 nvarchar(50); SELECT @x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name ,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address ,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name ,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details FROM #XML XD CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA) CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA); PRINT DATEDIFF(MS,@starttime, getdate()); GO 3 PRINT '=== Not specifying the text() node ===' DECLARE @starttime datetime = getdate(), @x1 nvarchar(50), @x2 nvarchar(50), @x3 nvarchar(50), @x4 nvarchar(50); SELECT @x1 = p.c.value('Name[1]', 'NVARCHAR(50)'), @x2 = p.c.value('Address[1]', 'NVARCHAR(50)'), @x3 = t.c.value('Name[1]', 'NVARCHAR(50)'), @x4 = t.c.value('details[1]', 'NVARCHAR(50)') FROM #XML CROSS APPLY XMLData.nodes('process/header/Person') p(c) CROSS APPLY c.nodes('../Tasks/task') t(c); PRINT DATEDIFF(MS,@starttime, getdate()); GO 3 PRINT '=== specifying the text() node ===' DECLARE @starttime datetime = getdate(), @x1 nvarchar(50), @x2 nvarchar(50), @x3 nvarchar(50), @x4 nvarchar(50); SELECT @x1 = p.c.value('(Name/text())[1]', 'NVARCHAR(50)'), @x2 = p.c.value('(Address/text())[1]', 'NVARCHAR(50)'), @x3 = t.c.value('(Name/text())[1]', 'NVARCHAR(50)'), @x4 = t.c.value('(details/text())[1]', 'NVARCHAR(50)') FROM #XML CROSS APPLY XMLData.nodes('process/header/Person') p(c) CROSS APPLY c.nodes('../Tasks/task') t(c); PRINT DATEDIFF(MS,@starttime, getdate()); GO 3 PRINT '=== Using the query() method ===' DECLARE @starttime datetime = getdate(), @x1 nvarchar(50), @x2 nvarchar(50), @x3 nvarchar(50), @x4 nvarchar(50); SELECT @x1 = CAST(p.c.query('data(Name)') AS nvarchar(50)), @x2 = CAST(p.c.query('data(Address)')AS nvarchar(50)), @x3 = CAST(t.c.query('data(Name)')AS nvarchar(50)), @x4 = CAST(t.c.query('data(details)')AS nvarchar(50)) FROM #XML CROSS APPLY XMLData.nodes('process/header/Person') p(c) CROSS APPLY c.nodes('../Tasks/task') t(c); PRINT DATEDIFF(MS,@starttime, getdate()); GO 3 PRINT '=== No traversing and text() ===' DECLARE @starttime datetime = getdate(), @x1 nvarchar(50), @x2 nvarchar(50), @x3 nvarchar(50), @x4 nvarchar(50); SELECT @x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name ,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address ,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name ,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details FROM #XML XD CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA) CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA); PRINT DATEDIFF(MS,@starttime, getdate());
GO 3

Results

Beginning execution loop
=== (EE) No traversing and text() === 396 === (EE) No traversing and text() === 313 === (EE) No traversing and text() === 310 Batch execution completed 3 times. Beginning execution loop === Not specifying the text() node === 796 === Not specifying the text() node === 830 === Not specifying the text() node === 856 Batch execution completed 3 times. Beginning execution loop === specifying the text() node === 1993 === specifying the text() node === 1970 === specifying the text() node === 1926 Batch execution completed 3 times. Beginning execution loop === Using the query() method === 4170 === Using the query() method === 4183 === Using the query() method === 4153 Batch execution completed 3 times. Beginning execution loop === No traversing and text() === 316 === No traversing and text() === 313 === No traversing and text() === 313
Batch execution completed 3 times.

Well done sir. I knew there was a better way to parse those nodes but got lazy. Something was not right. I learned a few things in this thread. 😀

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001