Flat files - any advantages?

Question

Post reply

Flat files - any advantages?

rosedd

Mr or Mrs. 500

Points: 570
More actions
December 12, 2006 at 7:57 am

#90861

We have a number of tables that are not normalized. The person in charge of the database believes that having flat files will increase the efficiency of queries, since no joins need to be performed.

I am writing the code (in C#) to generate the queries, so having flat files somewhat simplifies my job. However, to me, it seems that in the long run it is better to have things fully normalized. It also seems to me that SQL Server, being a relational database engine, is optimized to do basic things like joins.

So…

How much of a time hit are joins?

What are the advantages of flat files vs. the advantages of having a relational database?

Thanks.

David

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Depends of how much data is required to be kept in the db. If all you need to save is the adress of the company and it's phone numbers, etc. I see no point in creating and deploying a full db. However if you have any needs superior to those I'd suggest you use a normalized system right away. It'll save you a lot of headaches on the long run.

Antares686 SSC Guru Points: 125444 More actions · Answer 2

If reporting, or tracking/trending then denormalized is supposed to be the best bet for performance.

If an application like and Order Fullfilment system then it is poor and each insert may cause a lot of index movement.

I would suggest first he either doesn't understand what he does or is lazy in the later scenario. - Opinion please don't use that statement as is very non-PC.

By making the data flat it puts a lot more into the application design requirements and the cost for long term data storage.

Ex how would you handle an Orders system on a flat table

1) You will have multiple rows many columns with the same data.

But how will you ensure the orde is properly identified as a unit?

What steps will be needed to alter the Shipping Address so no errors occurr?

How will you manage an item being added to the order and ensure remains related properly to the existing items?

Are you really going to store the entire item description in each row of the order?

How will you application on the fullfilment side ensure what address to use if there is a difference and there shouldn't be (what if there should be)?

These questions can be address more fluidly in a normalized system (especially the last if multiple shipping addresses are allowed) and with a lot less work that it will take in the denormalized version. Just consider how ou would address them in a normalized state.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

IMHO... if he wants flat files, why bother with the expense of an RDBMS??? Store it in a FILE and maintain it with a good word processor. And access it with Binary FSO's. Or, a simple spreadheet may be all that's necessary.

If you want to maintain orderly data, then normalize it and put it in a database. In a properly formed database, joins are not all that big a performance hit and the structure of the database makes the data a heck of a lot easier to maintain... just not in flat files.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Ed Zann SSCrazy Eights Points: 8694 More actions · Answer 4

I recently read somewhere that if your DB is used primarily for analysis and reporting and you frequently have to join more than 4 tables, then you should consider denormalization to improve performance. However, the "penalty" for going down this path is additional work to maintain data consistency. If Cust_Name appears in 4 tables, you have to have some way of keeping it the same in all tables if the data is updateable. If it's just a read-only snapshot, then it's probably not such a big deal.

If your DB is primarily used to record transactions, ie, many inserts/updates, then fully normalized is the way to go.

Then again... size matters. If it's not a very large DB to begin with, then normalize the tables. The performance impact of multiple joins would be negligible and you avoid the pitfalls of data inconsistency. It's why we have relational DB's

Jambu Krishnamurthy SSCrazy Points: 2054 More actions · Answer 5

At a very high level OLTP applications benefit

from normalized data, while OLAP applications

benefit from denormalized data.

I am fortunate to have had the experience of

moving from flat file COBOL mainframe era to the

DBMS/RDBMS/ODBMS era.

Flat files are definitely faster. Not just because

they are not normalized, but they do not have the

overhead of an RDBMS engine.

Flat files does not again mean notepad, Word or Excel file.

Flat files also need indexing and in the system I worked

in the 80s, the mainframe gave that support.

Though everyone of us knew that RDBMS is slower than

flat files, we adopted Codd's RDBMS theory because it

eliminated one of the major, untolerable business need,

the curse of Data Inconsistency. The next thing that

RDBMS tried to eliminate and is reasonably successful is

in avoiding Data Dependancy. Remember how Data structures

were embedded into COBOL programs. The same data could

be represented as different data types in different programs.

I am not sure, if the world is moving towards XML databases

but atleast for data transfer, we are in there. And what is

XML? In my opinion, it is nothing but a glorified hierarchical

flat file.

I have cut a long story short. Just to make it very clear,

Data consistency and integrety is more important than speed, and

that is the reason why we should and are still using normalized

databases. There is no viable alternative for that technology yet.

But if speed is of concern, like in today's high volume Data

Warehouse applications, a diluted version of the normalized databases

is being used. It is still not complete denormalized.

hope this helps.

jambu

rosedd Mr or Mrs. 500 Points: 570 More actions · Answer 6

Our database is used primarily for querying. There is no updating within the application that access the database. We get the data from varied sources. Unfortunately, there is no control on the format in which we receive it. Much of it is in dbf format, and that file structure can change on the whim of the provider.

Queries can take hours to run, and performance is the major issue. To be honest, there is not a lot of relational data. The joins would not be very deep (not 4 tables), but probably would impact performance. And there is the maintainance issue - we would have to convert varied non-relational data to relational each time we get it.

I am not a DBA, but my first thought was that normalized is always better. We do have normalized data within the database and I thought that we should normalize more. Now, I am not so sure. Data consistency is not too much of a problem for us. Values that are out of range are much more of a problem and checking would not be helped by normalization.

Thanks for all your comments.

David

Jambu Krishnamurthy SSCrazy Points: 2054 More actions · Answer 7

Jambu Krishnamurthy

SSCrazy

Points: 2054

December 13, 2006 at 8:50 am

#677803

Values that are out of range are much more of a problem and checking would not be helped by normalization.

it would help rosedd. Values that should be in a specific range should be stored in a normalized table and looked up. Values not being within a range is a form of data inconsistency, if I am not wrong.

rosedd Mr or Mrs. 500 Points: 570 More actions · Answer 8

rosedd

Mr or Mrs. 500

Points: 570

December 13, 2006 at 9:00 am

#677811

Yes, it is a form of data inconsistency. But how would normalization help check that a float field contains values between 10.5 and 20.5?

Jambu Krishnamurthy SSCrazy Points: 2054 More actions · Answer 9

Jambu Krishnamurthy

SSCrazy

Points: 2054

December 13, 2006 at 9:13 am

#677818

got you. if that is the case, normalization is not the answer. A control file(table) is the answer.

rosedd Mr or Mrs. 500 Points: 570 More actions · Answer 10

rosedd

Mr or Mrs. 500

Points: 570

December 13, 2006 at 12:32 pm

#677881

Jambu,

What exactly is a control file? Is it just a file with the value ranges that can be used in a query to pull out rows with values that are not within range?

David

Jambu Krishnamurthy SSCrazy Points: 2054 More actions · Answer 11

Jambu Krishnamurthy

SSCrazy

Points: 2054

December 13, 2006 at 1:54 pm

#677904

yes ofcourse

i agree this could be handled in your programming logic as well easily

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

If you use BCP and a format file to import to a table, you don't need the "control file"... just add the necessary constraints to the table and setup BCP to write exceptions to an errata file.

You are all correct... if the data being imported is just going to be used for reporting, then denormalization will certainly work fine... so would a good spreadsheet.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Scott Coleman One Orange Chip Points: 27451 More actions · Answer 13

If join performance is a concern for a normalized design, there are a lot of things to check before going to a denormalized structure.

Do the indexes allow efficient joins? Does the execution plan show merge joins or hash joins?

Does the server have multiple data drives? Can you use filegroups to place tables, indexes, and tempdb on separate drives for max I/O performance? Does the server have enough RAM? Try moving to a 64-bit system with total RAM greater than the combined size of the tables and see what happens.

If the flat file represents some kind of master/detail relationship, the master fields make up a large portion of the total record, and there are a lot of detail records for each unique master record, then the normalized structure will be a lot smaller than the flat structure. Fewer data pages usually translate into faster queries.

You could say that a denormalized structure is sometimes faster. You could say that denormalized is often faster for reporting applications. But someone who says denormalized is always faster is not analyzing the problem at hand, they're just mindlessly reciting a rule of thumb they heard somewhere.

If you think a denormalized structure will be faster for your reporting queries but you don't want to give up the data integrity of a normalized design, you might also consider using indexed views. Note that this will require extra disk space and will slow down updates and inserts.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

Ok, I'm a bit confused... Rose posted "Queries can take hours to run, and performance is the major issue. To be honest, there is not a lot of relational data."

I'm thinking that normalization/denormalization is NOT the real issue here... I'm thinking that there's code with a lot of RBAR in it (pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row").

Rose, it might help if you posted one of your more troublesome queries... post the CREATE TABLE code for the tables that are used, as well. Indexes will be of prime concern as well... you may want to post that code, too.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions