Query on a large database

Question

Post reply

Query on a large database

Viewing 15 posts - 61 through 75 (of 104 total)

You must be logged in to reply to this topic. Login to reply

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 1

Jayanth_Kurup

SSC-Insane

Points: 22967

October 26, 2015 at 2:03 am

#1835455

EDIT :- BUMP UP three days on still no data from any other tests.

Jayanth Kurup[/url]

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 2

Jayanth_Kurup

SSC-Insane

Points: 22967

October 26, 2015 at 8:43 pm

#1835634

BUMPity BUMP BUMP , still no data from ?

For what its worth there has been a lot of positive feedback for the scripts and video posted on linkedin.

Jayanth Kurup[/url]

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

Jayanth_Kurup (10/23/2015)

Just wondering if anybody was able to finish testing the scripts and if there are any new results ?

EDIT : BUMP, hoping for some independently verified results , also created a complete video with more scenarios in case any body is looking for a deep dive.

To be honest, the scripts are terrible. Everything is either a hard-code path name or geared specifically to SQL Server 2014, which I won't even install on my laptop until the next SP is successful and has been out for at least 2 months. And I'm not one to store data with programs. You weren't even thoughtful enough to write a bloody Bulk Insert for the test data. Not exactly "readily consumable" code. It's a whole lot deeper dive than it could/should have been. You didn't even change the default growth settings. And now you've taken to bumping your own posts. Give it a break while we fix your "right click" code. :Whistling:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 4

Jayanth_Kurup

SSC-Insane

Points: 22967

October 26, 2015 at 9:39 pm

#1835648

Thanks for taking the time to do this Jeff , I'll try an have a more easily usable script ready in a few minutes and send them to you.

Jayanth Kurup[/url]

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

Jayanth_Kurup (10/26/2015)

Thanks for taking the time to do this Jeff , I'll try an have a more easily usable script ready in a few minutes and send them to you.

Looking at your current script, there are no indexes on the tables, either. If that's what you really setup for your tests, it's no wonder why partitioning seemed faster to you. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 6

that was on purpose so that the tester can compare

heap no index - (script provided) - for baseline

vs

clustered index table ( they create the clustered index )

vs

heap partitioned table ( script provider)

the other script is ready just testing it , will upload in a few minutes

One of the original arguments was that partitioning doesnt provide any benefit , so the comparison between partitioned and partitioned heap should clear that up.

Jayanth Kurup[/url]

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 7

Jayanth_Kurup

SSC-Insane

Points: 22967

October 27, 2015 at 12:28 am

#1835659

Updated , cleaned up scripts attached.

I have written it so that you can simply copy paste and execute ( after reviewing the code ie).

Jayanth Kurup[/url]

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 8

Jayanth_Kurup

SSC-Insane

Points: 22967

October 29, 2015 at 10:40 am

#1836455

The final video

https://www.youtube.com/watch?v=Eyjokb1o7Go

Jayanth Kurup[/url]

Sergiy SSC Guru Points: 110192 More actions · Answer 9

OK, populated 2 tables from the scripts.

Here is the outcome:



set statistics time on

Select REF_AGE , COUNT(*)

from Partitiontest123.dbo.partitionedtbl

where ref_age between '27' and '30'

group by REF_AGE

set statistics time off

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 303 ms.



set statistics time on

Select REF_AGE , COUNT(*)

from Partitiontest122.dbo.unpartitionedtbl

where ref_age between '27' and '30'

group by REF_AGE

set statistics time off

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 115 ms.

Clear win for clustered index.

Almost 3 times faster.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 10

Jayanth_Kurup (10/27/2015)

Updated , cleaned up scripts attached.

I have written it so that you can simply copy paste and execute ( after reviewing the code ie).

I'm guessing that you didn't actually check anything after you ran that. ALL of the data went into the PRIMARY partition because each field in the CSV is encapsulated in quotes. There's no such thing in the Ref_Age column as 24 because it's actually "24" including the quotes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

I've got the data loading now.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

Alright then... after getting around the double-quote problem with loading your data by changing the delimiter on the BULK INSERT from ',' to '","', I ran some performance tests using YOUR test code and 1 variation of your test code.

The first code I ran on both databases looked like this. It's a variation of your exact code in that I changed the last GO to GO 5 so I wouldn't have to babysit each run.

Here's the first test I did, which drops the buffers on every iteration.



PRINT '========== Partitioned table, drop buffers every iteration. ===================='

GO

dbcc dropcleanbuffers

set statistics time on

Select REF_AGE , COUNT(*)

from Partitiontest123.dbo.partitionedtbl

where ref_age between '27' and '30'

group by REF_AGE

set statistics time off

GO 5

PRINT '========== NON Partitioned table, drop buffers every iteration. ===================='

GO

dbcc dropcleanbuffers

set statistics time on

Select REF_AGE , COUNT(*)

from Partitiontest122.dbo.unpartitionedtbl

where ref_age between '27' and '30'

group by REF_AGE

set statistics time off

GO 5

And, after deleting the rows that didn't mean anything to make the times more obvious, here's what I got for run results...



========== Partitioned table, drop buffers every iteration. ====================

Beginning execution loop

   CPU time = 1623 ms,  elapsed time = 15008 ms.

   CPU time = 1669 ms,  elapsed time = 15228 ms.

   CPU time = 1232 ms,  elapsed time = 15044 ms.

   CPU time = 1716 ms,  elapsed time = 15400 ms.

   CPU time = 1764 ms,  elapsed time = 15593 ms.

Batch execution completed 5 times.

========== NON Partitioned table, drop buffers every iteration. ====================

Beginning execution loop

   CPU time = 873 ms,  elapsed time = 23586 ms.

   CPU time = 1077 ms,  elapsed time = 24508 ms.

   CPU time = 843 ms,  elapsed time = 24171 ms.

   CPU time = 982 ms,  elapsed time = 24743 ms.

   CPU time = 920 ms,  elapsed time = 25257 ms.

Batch execution completed 5 times.

By golly!!! It sure does look like the partitioned table beats the tar out of the monolithic table with an index just like you said, doesn't it? Well my young friend, don't bet your life on it. 😉

While the partitioned table certainly beat the unpartitioned indexed table FOR THIS TEST, it's the wrong bloody test. Who in their right mind would ever essentially clear cache on a production machine between runs? Hopefully, no one or only the severely insane. :hehe:

Let's see what happens if we only clear cache for the first iteration as if we just got done building each table. We'll use the "variation" that I was talking about where we put a GO in after the DBCC command so cache is cleared only before the first iteration.

Here's the code for that...



PRINT '========== Partitioned table, drop buffers only on first iteration. ===================='

GO

dbcc dropcleanbuffers

GO --Added this

set statistics time on

Select REF_AGE , COUNT(*)

from Partitiontest123.dbo.partitionedtbl

where ref_age between '27' and '30'

group by REF_AGE

set statistics time off

GO 5

PRINT '========== NON Partitioned table, drop buffers only on first iteration. ===================='

GO

dbcc dropcleanbuffers

GO --Added this

set statistics time on

Select REF_AGE , COUNT(*)

from Partitiontest122.dbo.unpartitionedtbl

where ref_age between '27' and '30'

group by REF_AGE

set statistics time off

GO 5

And here are the cleaned up results from that run...



========== Partitioned table, drop buffers only on first iteration. ====================

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Beginning execution loop

   CPU time = 1747 ms,  elapsed time = 15148 ms.

   CPU time = 1982 ms,  elapsed time = 726 ms.

   CPU time = 1966 ms,  elapsed time = 708 ms.

   CPU time = 1934 ms,  elapsed time = 725 ms.

   CPU time = 1934 ms,  elapsed time = 717 ms.

Batch execution completed 5 times.

========== NON Partitioned table, drop buffers only on first iteration. ====================

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Beginning execution loop

   CPU time = 1030 ms,  elapsed time = 22598 ms.

   CPU time = 1138 ms,  elapsed time = 317 ms.

   CPU time = 1248 ms,  elapsed time = 313 ms.

   CPU time = 999 ms,  elapsed time = 321 ms.

   CPU time = 1248 ms,  elapsed time = 311 ms.

Batch execution completed 5 times.

Look at that. If you actually test the code as it would actually be used, the monolithic non-partitioned index table is roughly twice as fast for both CPU and Duration... just like some of us have been saying all along.

So, please stop saying that partitioned tables provides better performance because, using YOUR test data and YOUR testing method, it clearly doesn't. If you say it does in the movie, you need to change that so that you don't mislead people into thinking it does. AND, you should go to that other LinkedIn thread that you started and save some face by telling folks that you made a mistake during testing and partitioned tables are NOT a panacea of performance.

If you don't, I will and that's a promise. 😉

Now... enquiring minds want to know... do you like pork chops?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 13

Sergiy (10/31/2015)

OK, populated 2 tables from the scripts.

Here is the outcome:

set statistics time on Select REF_AGE , COUNT(*) from Partitiontest123.dbo.partitionedtbl where ref_age between '27' and '30' group by REF_AGE
set statistics time off

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 303 ms.

set statistics time on Select REF_AGE , COUNT(*) from Partitiontest122.dbo.unpartitionedtbl where ref_age between '27' and '30' group by REF_AGE
set statistics time off

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 115 ms.

Clear win for clustered index.

Almost 3 times faster.

Not to mention, that's a really nice machine you've got there, Sergiy. Didn't notice your post before but we proved the same thing. Partitioning is not a panacea of performance and, as we just proved, is actually a bit (a lot, actually) of a performance problem. At least for how this table was partitioned.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 14

Well, colour me (not) surprised Jeff. 😉

Looks like I might need to write a part 2 of the 'query performance tuning methodology' article....

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 15

GilaMonster (10/31/2015)

Well, colour me (not) surprised Jeff. 😉

Looks like I might need to write a part 2 of the 'query performance tuning methodology' article....

Gosh, I'd love to be in on that one, Gail. I've got the "chops" for it. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions