A faster way to prepare dimensional databases

Question

Post reply

A faster way to prepare dimensional databases

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply

Adam Aspin SSCarpal Tunnel Points: 4880 More actions · Answer 1

Hi Magarity,

Absolutely the way to go once the process is stable and populated on an incremental basis.

Tell me, I gather from the discussions on the subject that there is only a 1 in 2^40 chance of a duplicate MD5 checksum (as opposed to the SQL Server CHECKSUM function) - have you ever experienced problems with checksums?

Thanks,

Adam

Adam Aspin SSCarpal Tunnel Points: 4880 More actions · Answer 2

Adam Aspin

SSCarpal Tunnel

Points: 4880

February 25, 2010 at 2:03 am

#1124795

For Dbowlin:

There is a good MD5 checksum approach here:

http://www.tek-tips.com/viewthread.cfm?qid=1268144&page=1

amritpal.parmar SSC-Addicted Points: 424 More actions · Answer 3

Hi,

thanks for the article and example.

what I don't understand and this may be as you don't have complete control of the ETL or you're restricted to on the design, is why do you truncate the Dimensions? could you not just relate this to a Business Key and do appropriate SCD 1 / SCD 2 transformations?

Even with the Fact table, you could just insert for new records and update for existing records.

Regards,

Amrit

Adam Aspin SSCarpal Tunnel Points: 4880 More actions · Answer 4

Hi Amrit,

This is explained in my reply to Magnus, above - the design is in a considerable state of flux, and we wnated complete teardown on every process run, to guarantee coherence and to avoid having "old" data cluttering up the DW.

Sorry that this wasn't clear in the article.

Otherwise yes - standard techniques could (and probably will) be used.

Regards,

Adam

amritpal.parmar SSC-Addicted Points: 424 More actions · Answer 5

amritpal.parmar

SSC-Addicted

Points: 424

February 25, 2010 at 2:58 am

#1124814

Ok, great!

Sorry if I misunderstood.

magarity kerns SSCrazy Points: 2763 More actions · Answer 6

magarity kerns

SSCrazy

Points: 2763

February 25, 2010 at 12:16 pm

#1125159

No, I've never had a problem with using md5. I'm inspired to write my own article on how to use md5 in an update/insert strategy for loading data warehouses if I can get Steve the admin to accept it.

tskelley SSCommitted Points: 1870 More actions · Answer 7

magarity kerns (2/25/2010)

No, I've never had a problem with using md5. I'm inspired to write my own article on how to use md5 in an update/insert strategy for loading data warehouses if I can get Steve the admin to accept it.

By all means, please write away. We currently truncate and replace, but would be extremely interested in a more incremental strategy. For example, my understanding is BIDS Slowly Changing Dimension transformation is a performance hound for large tables.

magarity kerns SSCrazy Points: 2763 More actions · Answer 8

magarity kerns

SSCrazy

Points: 2763

February 25, 2010 at 11:02 pm

#1125363

We don't need no stinkin' changing dimension widgets... Took me a while with the article-submitting editor, but I got my treatise on using MD5 submitted. We'll see how fast it gets rejected.

Adam Aspin SSCarpal Tunnel Points: 4880 More actions · Answer 9

Adam Aspin

SSCarpal Tunnel

Points: 4880

February 26, 2010 at 1:51 am

#1125421

Fabulous! I look forward to reading it!

Adam

SQLRNNR SSC Guru Points: 281324 More actions · Answer 10

SQLRNNR

SSC Guru

Points: 281324

December 17, 2010 at 12:08 am

#1263914

Thanks for the article.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 11

Its a good article.

However, did you experience any problems with datatypes being incorrectly assigned using the SELECT..INTO.. method, I notice you mention that nulls are handled in the consolidation layer so it may mitigate the issue.

I'm also not sure how this can be adapted to an incremental DW unless you are not concerned about maintaining the history, as by dropping the fact table you loose the ability to maintain the historic Fact record, and thus you surely defeat the object one of the objectives of an incremental, which is to allow you to perform point in time analysis, which can be quite critical for DW's in the Finance industry.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

hallidayd SSCrazy Points: 2490 More actions · Answer 12

hallidayd

SSCrazy

Points: 2490

December 17, 2010 at 2:33 am

#1263950

INSERT INTO <table> WITH (TABLOCK)

will minimally log in the same way SELECT .... INTO will by default.

dave-dj SSChampion Points: 11297 More actions · Answer 13

For example, my understanding is BIDS Slowly Changing Dimension transformation is a performance hound for large tables.

FAO tskelly:

If your finding the standard SCD component of SQL slow, have a look at Todd McDermitts Kimball SCD component.

I've switched to using his Kimball SCD for a number of reasons. First of all it's faster. As mentioned in the details, it doesn't destroy the dataflow when you need to make changes. Also, it provide a wealth of auditing information. Well worth a look!

You can find it here: http://kimballscd.codeplex.com/

_____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

Gosta Munktell SSCertifiable Points: 5973 More actions · Answer 14

Thank you Adam.

At first it is almost necessary to have control of the the whole "ETL" process. Missing data or

bad quality is major problem in many cases. You can seldom avoid outer joins. I use ISNULL(expression,'MISSING') and have a default dimension member MISSING in most of the dimensions. Your approach is also usefull in "non OLAP" situations like with the new feature PowerPivot.

We do have dimension tables and one or more fact tables to import. PowerPivot then creates the "CUBE" by it self.

Gosta M

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 15

dave-dj (12/17/2010)

For example, my understanding is BIDS Slowly Changing Dimension transformation is a performance hound for large tables.

FAO tskelly:

If your finding the standard SCD component of SQL slow, have a look at Todd McDermitts Kimball SCD component.

I've switched to using his Kimball SCD for a number of reasons. First of all it's faster. As mentioned in the details, it doesn't destroy the dataflow when you need to make changes. Also, it provide a wealth of auditing information. Well worth a look!

You can find it here: http://kimballscd.codeplex.com/%5B/quote%5D

We use Todds component as well, it has a few performance issues with large datasets, we did some alpha testing the v1.6 of the component and that seems to have fixed some of the perfomance issues, unfortuantely we're just waiting for the final release.

_________________________________________________________________________
SSC Guide to Posting and Best Practices