Is hash tables is better than temporary user tables?

Question

Post reply

Is hash tables is better than temporary user tables?

Ramesh Saive

SSC-Insane

Points: 24275
More actions
January 18, 2007 at 11:29 pm

#98033

We are planning to rewrite an SP in which there are many processes and mathematical calculations are to be done. These processes are broke up into smaller SPs which requires the uncommitted data to be available for further processing.

This SP will require data from 4 to 6 tables containing 20000+ records to retrieve only 100+ records at each execution. This SP will be executed atleast 200+ times (depending on the user).

Does use of hash tables hamper/improve the performance of the queries? Or its better to use pre-defined set of temporary user tables?

--Ramesh

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 1

#tables will tend to force procedure recompiles whilst table variables don't tend to optimise well for anything other than small(ish) data sets, the exact size depends. A largish table variable may well write to disk anyway, depends on size and memory available - likewise a small #temp table may well be created in memory.

In my testing #tables optimise much better in joins and were much faster - but before I evoke a wave of posts about table variables it does depend on how many rows, I'm talking 10k + rows with the table variable(s) in 4 to 8 table joins.

To be honest I think you'll have to check each usage, my guess is that 100 rows from 20k may well not work too well - note that table variables only make one type of join and you can use join hints to improve performance. You only get one index per table variable.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 2

Thanks colin...

Since table variables will only available to the scope in which it was declared, thus rules out it's use in this SP.

----------------------------------------

Joins on these tables will not an issue as these tables will never join to other master tables. ( Since data in the tables is enough for processing)

The existing SP uses pre-defined tables & costs 10 secs for its execution which yealds to around 6 to 8 hours for 20000 executions.

Our primary focus is to optimize this SP to 90% less to the current estimates.

----------------------------------------

Is there any place from where I can find the information available about the performance of hash & the temp user tables or on how to analyze/compare the objects performance?

--Ramesh

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 3

In general terms a #table will outperform a permanent table defined for the sp. This is generally down to how sql server optimses #tables ( stats and such like ) If you define a table creation in a proc then you will also have to consider updating stats and such on them.

I tested the same queries ( when I was application tuning ) using table variables, #tables and various variations of permanent tables. The #tables were fastest and produced the best plans from the optimiser. As I said this was 4 - 8 table joins using some reasonable data sets. I can't publish this as it's client data and much too difficult to recreate using test data.

If you're using repetitive calls does this indicate cursors? ( see other threads and posts if so ) Other than that it's a case of examining the query plans and tuning to suit. Storing intermediate result sets is always likely to prove problematic with performance - make sure you're using seeks and not scans and try to avoid bookmark lookups.

It could be you need to consider a different approach, sounds a little as if you're batch processing vs a set based approach

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 4

Thanks a lot darth for your responses over hash/temp tables...

I too feel that use of #tables will perform better than residing tables...

One more thing darth..

Is there any precautions/considerations/setups needed when using #tables? and what other things needed to be done to improve the performance of #tables?

--Ramesh

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 5

You still need to consider stats, indexes and plans, especially if the data is changing in the tables. Having tempdb on a fast drive array seperate to the data array can help, and if you're running multiple concurrent jobs against tempdb then creating tembdb with multiple files where the number of files = number of processors or cores ( don't count HT ) may help, if the job you're running is single threaded than don't bother with the files.

As always, test, monitor ( profile - perfmon ) tune and repeat as required. Last point make sure that mdf and ldf growths don't occur as part of your process, they really can slow things - e.g. pre-grow any files that grow in your process.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

sushila SSC-Dedicated Points: 35293 More actions · Answer 6

sushila

SSC-Dedicated

Points: 35293

January 19, 2007 at 6:01 am

#684254

disclaimer - It's Friday mania and my mind always hones in on irrelevant details..

Why is Colin being called Darth ?!?!?!

**ASCII stupid question, get a stupid ANSI !!!**

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 7

Ramesh Saive

SSC-Insane

Points: 24275

January 19, 2007 at 6:10 am

#684262

Got you darth...................................

--Ramesh

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

What does an SP do that needs to execute "at least 200+" times per user?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

jezemine SSCrazy Eights Points: 8059 More actions · Answer 9

The title of this thread completely confused me at first. Probably you shouldn't call temp tables "hash tables" in the future, lest you confuse others.

In programming, a hash table is a certain type of data structure. It's basically a dictionary of key-value pairs. Real hash tables as such are not available for you to use in t-sql.

Anyway, my experience is that temp tables often perform better than table variables because you can create arbitrary indexes on them. The only index you can put on a table variable is the pk. As you know, indexes have a big effect on how efficient a query is when the tables involved are large, so for large temporary sets, temp tables are preferred.

---------------------------------------
elsasoft.org

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 10

Ramesh Saive

SSC-Insane

Points: 24275

January 23, 2007 at 5:01 am

#684887

Jeff,

Nothing much....it just sums up the incomes for which we all are working....a payroll process

When I started this post, I was in context for #tables & the user tables which confused most of the regular members....

--Ramesh

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Ramesh,

Thanks for the feedback...

Unless your payroll process has hierarchical payments such as those found in an MLM, I'm pretty sure that we can help you figure out a way to do the payroll processing without having to call a stored proc for each individual "user". We would, however, need a bit of detail as to what your tables look like, a bit of sample data, and a description as to what you are actually trying to do.

On the temp table thing... based on your current description of needing uncommitted data to be available to multiple processes and considering the relatively small number (20000) rows of data, I'd probably use a properly indexed temp table for the "batch process" you describe. Even if you build a more permanent "scratchpad" table, I certainly wouldn't commit anything to the source tables until the payroll process completed and the code had checked the new data for validity, etc.

I think the payroll process batch would likely not be executed more than once at the same time... there may be some utility to creating a "scratchpad" table instead of using temp tables to keep from having to worry about "scope" in the transition from proc to proc. That will, however, cause more logging than using a temp table (TempDB recovery is set to SIMPLE), but that shouldn't be a big problem on a properly maintained DB.

The only reason why temp tables may prove a bit faster the a more permanent scratchpad table is because of the recovery settings of TempDB. Your first creation of either will cause the data to cache in memory if it fits and, again, with only 20000 rows, it will likely fit making either very fast especially in the presence of proper indexing.

I wouldn't worry as much about the Temp / Scratchpad comparison. As I implied at the beginning of this message, I'd be more concerned about HOW the process is written. If you are processing a single "user" at a time, you are likely making a huge performance mistake. We can help you make it more set based, if you want, be we need more info.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 12

Jeff,

Thanks for your response...

Honestly saying, I doesn't have much idea about MLM systems...

Since the solution is a front end/back end application having several seperate modules and one of which is a salary process module where by the user has the religion to select 'n' no. of employees for salary generation. I know, I can only call once this SP for 'n' no. of employees using set-based approach but what about the status of the execution.....from where can i get the status of its execution.....how can i know for which employees the salary has processed and for whom it has failed. Since the execution can take hours to complete for n>20000 considering a sec. for a single execution for a single employee, the end user must have the idea about the status of the process and how much more time it will going take etc., he cannot wait till the process ends. This is the reason why I am going for a performance stick..(which is not a big problem for me in this case)

Also the SP (which is the heart of the system) will be using most of the tables (90% of tables) during its execution (this also includes the smaller SPs which stacked into it). DDL not possible for so many tables...

--Ramesh