What's more efficent for search criteria?

Question

Post reply

What's more efficent for search criteria?

thisted

SSCrazy

Points: 2131
More actions
October 5, 2012 at 2:24 pm

#268184

I'm in the process of designing some new tables. The decision has been made to always create a new row when the data for that table is different than any other row that already exists. Currently the table is about 8 columns wide and contains personal identification information.

I'm concerned about building the indexes for this type of table. I'll have a search query with 8 different where clauses.

For example

Where firstname = @firstName, middlename = @middlename, lastname=@lastname, phonenumber=@phonenumber, zipcode = @zipcode...ect

First I'd like your thoughts on how to get the best performance in this type of situation. My first thought is to generate a md5 type of hash based on all of the columns, and then search on that.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

dan-572483 SSCoach Points: 15870 More actions · Answer 1

dan-572483

SSCoach

Points: 15870

October 5, 2012 at 2:48 pm

#1546322

I'm curious why a search query would have so many criteria. Usually a search query will be by a primary key or other unique identifier, or by 1-3 columns - preferabley idexed (firstname, lastname, city), or with a join to another table.

thisted SSCrazy Points: 2131 More actions · Answer 2

Customer requirement is to maintain each and every combination of data exactly as entered. I'm sure there are different approaches to handle this type of requirement, However we've been directed to maintain each unique combination in the manner described above. As a result we need an efficient method of determining whether the newly supplied data matches an existing row, or needs to be created.

andersg98 SSCertifiable Points: 5168 More actions · Answer 3

The table you describe violates normalization rules and will be inefficient just by the nature of the design. It is my opinion that business/user requirments can state WHAT needs to be stored but should never dictate HOW something is stored in the database. That falls on the professional data modeler to correctly implement in a way that provides the business/user with the data they require.

Lowell SSC Guru Points: 323482 More actions · Answer 4

the key to indexing and performance is going to be how they are searched in your WHERE clause.

if all 8 values are absolutely required for every search, an index with all eight columns would be very selective.

if you know that most of the time they search on just lastname or lastname + firstname, then two different indexes, to cover both of those queries would be good.

you'll probably want the GUI to build your query with WHERE statements only for the non-empty values;

you want to avoid trying to create a catch all query;

check out the pitfalls in this article:

http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

more details and we can offer more than general advice.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 5

You could create a unique index or constraint that has all eight columns to guarantee that you cannot have duplicates.

There can be a lot of overhead to such a wide index, so that could be a disadvantage.

In general, code like this should work well with smaller indexes that only have the more selective non nullable columns. If the columns are nullable, you will have to modify the query to allow for that.



insert into MyTable

	(

	Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8

select

	@Col1,@Col2,@Col3,@Col4,@Col5,@Col6,@Col7,@Col8

where

	not exists (

	select *

	from

		MyTable

	where

		Col1 = @Col1	and

		Col2 = @Col2	and

		Col3 = @Col3	and

		Col4 = @Col4	and

		Col5 = @Col5	and

		Col6 = @Col6	and

		Col7 = @Col7	and

		Col8 = @Col8 )

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

thisted (10/5/2012)

I'm in the process of designing some new tables. The decision has been made to always create a new row when the data for that table is different than any other row that already exists. Currently the table is about 8 columns wide and contains personal identification information.

I'm concerned about building the indexes for this type of table. I'll have a search query with 8 different where clauses.

For example

Where firstname = @firstName, middlename = @middlename, lastname=@lastname, phonenumber=@phonenumber, zipcode = @zipcode...ect

First I'd like your thoughts on how to get the best performance in this type of situation. My first thought is to generate a md5 type of hash based on all of the columns, and then search on that.

This type of table is affectionately known as a "Slowly Changing Dimension" or "SCD" for short. It's a time honored method for auditing all actions taken against the table in the same table instead of having a separate audit or history table (which has it's own set of problems, as well).

One of the most effective "SCD" types of tables is the "Type 2 Slowly Changing Dimension". Please see the following URL for the different types of "SCDs" available.

http://en.wikipedia.org/wiki/Slowly_changing_dimension

One of the purposes of Type 2 SCDs is so that you can actually determine what the status of all of the rows in a table where/are for any given point in time and it's very simple and effective in doing so. Don't let talk of the table being "denormalized" or "ineffecient" deter you because the denormalization is necessary and it's far from being ineffecient if done properly.

To make a longer story much shorter, Type 2 SCDs are generally maintained by a well written, high performance trigger that end-dates the old row and creates a new row with a new start date and an "open" end-date of either a NULL or a very large date (I use 9999-12-30 which leaves one day of headroom for some critical criteria calculations). On my systems, I usually also add a quick check in the trigger to prevent any direct manipulation of the start and end date columns to make it more tamper proof and to ensure that the dates never overlap from row to row for any given unique data.

One of the keys to success and performance is how you index for the PK. Whatever you currently use for the PK needs to be modified so that the start-date column is the first column of the PK (and clustered index, BTW) and that what you would normally have for PK column(s) comes after that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions