Other types of database

Question

Post reply

Other types of database

Keith Henry

Hall of Fame

Points: 3759
More actions
November 25, 2003 at 11:04 am

#107108

My boss has got quite excitied recently about a new type of data warehousing he as come across. It claims to be faster than OLAP while keeping the qualities of structured relational data that OLAP looses.

An example of this is Sand (http://www.sand.com), and their site goes on at length about bitmap indexes etc... They describe themselves as "column based relational analysis servers".

Info on this kind of tech is thin on the ground, anyone here got any info on it? I've seen demos that appear very fast but I'd need to see what it's like with millions of records on massive columns.

Keith Henry

DBA/Developer/BI Manager

Keith Henry

According to everyone I know I "do something with computers?" for a living, so there you go.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 1

Keith,

I can never keep up with where the non-dsiclosure is with Yukon, but if you take a look around the MSwebsite at the AS side of the Yukon release (or even join the beta?), you'll see some definite similarities between the stories re: merging of OLAP and relational.

There is public info here http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/next/DWSQLSY.asp , if you select the Analysis Services link (from TOC), then look for the Proactive cache and also UDM (Universal Data Model). (Looks like most of the NDA has been lifted [:-)]

There's a lot of tweaking options meant to keep the response times extremely quick with the proactive cache.

Hope this info helps (you stay with MSSQL [:-)]).

Cheers,

Steve.

Frank Kalis SSC Guru Points: 111183 More actions · Answer 2

quote:

My boss has got quite excitied recently about a new type of data warehousing he as come across. It claims to be faster than OLAP while keeping the qualities of structured relational data that OLAP looses.

An example of this is Sand (http://www.sand.com), and their site goes on at length about bitmap indexes etc... They describe themselves as "column based relational analysis servers".

Info on this kind of tech is thin on the ground, anyone here got any info on it? I've seen demos that appear very fast but I'd need to see what it's like with millions of records on massive columns.

if you are seriously interested, ask the company for an inhouse presentation and for contacts to other clients.

In most cases thing are getting relative this way

Frank

http://www.insidesql.de

http://www.familienzirkus.de

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Keith Henry Hall of Fame Points: 3759 More actions · Answer 3

Thanks for the response...

quote:

Hope this info helps (you stay with MSSQL [:-)]).

Not much chance of me leaving it to be honest, but I like to know the competition pretty well so that when my boss starts talking about some new tech he has a flyer for, I have an answer as to why our current strategy is still better. It's kept me out of Oracle and Sybase so far

quote:

if you are seriously interested, ask the company for an inhouse presentation and for contacts to other clients.

I'd like to find out a bit more about this technology before I talk to any of them if possible. We have already had a demo from Alterian and they only talked a great deal about end user functionality, when what I need to know is architecture. Also only Croesus could afford it.

What are "bitmap indexes" and "column based databases"? How are they different from the B-Tree indexes I'm familiar with?

We have a specific problem, we have a few VLDBs which are also very complex (so many columns that its split across 8 tables, before we even look at the normalised stuff).

The problem is that a lot of the figure used to query this system as statistical probabilties and so overlap. They also need to prioritise a lot of client requirements to what is a complex relational data set. This rules out MS OLAP and takes a few thousand lines of stored procs code.

So I wrote a system that builds a result set that meets their criteria and then builds a cube on the fly to analyse.

The problem with this is that it is too slow (5-45 mins, which is still 3 days quicker than the entire department who used to do it). My boss thinks Alterian or Sand could do it quicker (ie in seconds), and I think they're just a different way of looking at metadata and so would have the same problem as OLAP.

I guess the root of it is that I just don't like not knowing how something works :-S

Keith Henry

DBA/Developer/BI Manager

Keith Henry

According to everyone I know I "do something with computers?" for a living, so there you go.

Antares686 SSC Guru Points: 125444 More actions · Answer 4

First off there are many types of database infrastructures and of course all claim to be the best. Check with some of the major benchmarkers to see if they have anything on this company with a direct comparison. Also, keep in mind as if you don't have the proper knowledge you don't get the full performance if designed improperly. I hate dealing with companies that they have to send someone (at my cost) to setup and tweak the database. Get more info on the syntax, especially specialty syntax and if they provide proper teacher in the package cost.

Keith Henry Hall of Fame Points: 3759 More actions · Answer 5

Turns out that bitmap indexes are quite a standard form of index.

They are used for highly non-specfic but non-unique data in large datasets (eg Actual Company Turnover which is a money field or our models, which are floats)

You can declare them explicitly in Oracle.

SQL uses them behind the scenes in Hash & Merge joins and Group By queries:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sqlbitmaps.asp

Normal (B-Tree) index:

Index Index Index Data

A - m -> a - e -> a - c

N - z f - m d - e -> David

Duncan

Eddie

etc

Bitmap Index:

RecID Name Bitmap

1 David 0010000000

2 Keith 0100000000

3 Ian 0000010000

4 Ian 0000010000

5 Eddie 0000000100

6 David 0010000000

This is sometimes called a column index because of the ways the index works. There are as many columns as there are different values, although as each column is one bit the index is still quite small, so to find "David" in the records above we just look at that column

RecID Name Bitmap

David __1_______

__0_______

David __1_______

Bitmap indexes carry a big maintenance overhead, and really should only be applied to read-only data, they function in a lot of ways very like aggregations.

In fact Microsoft OLAP uses bitmap indexes in MOLAP cubes, and can be overridden to use B-tree in certain circumstances:

http://msdn.microsoft.com/library/en-us/dnsql2k/html/sql2k_anservregsettings.asp#sql2k_anservregsettings_topic11

There are better algorithms yet, but they are all the preserve of academic research and specialised systems (the system used on the billons of records in the known star database makes interesting reading)

Keith Henry

DBA/Developer/BI Manager

Keith Henry

According to everyone I know I "do something with computers?" for a living, so there you go.

r_achar SSC-Addicted Points: 432 More actions · Answer 6

Keith,

Just adding my two cents to the information you have provided:

It is recommended that bitmaps are best suited when the cardinality of the column is low.

For example : Bitmapped Index on gender (Male/Female) of customers.

I am not sure if creating a bit-mapped index on names is recommended. Perhaps you mentioned it only as an example and do not intend to use it that way in practice. (Or you might have very few distinct names in a relatively huge table).

Thanks

Ram

Keith Henry Hall of Fame Points: 3759 More actions · Answer 7

Thanks for the response

quote:

It is recommended that bitmaps are best suited when the cardinality of the column is low.

For example: Bitmapped Index on gender (Male/Female) of customers.

You see this is where I encounter some confusion. Low cardinality data is ideally suited to OLAP. My problem is that the cardianality of my data is very high (lots of floating point values), and this was where bitmap indexes were suggested to me.

Here is an articel about using bitmap indexes with scientific star chart data: http://sdm.lbl.gov/sdmcenter/pub/SD.john.0209-SDM-Progress.ppt

What it's saying is that a basic bitmap index is for low cardinality data, but by using their own compression routines they were able to use it for high cardinality fields.

Apparently Oracle uses BBC compression to allow bitmap indexes on high cardinality stuff.

So, still confused.

quote:

I am not sure if creating a bit-mapped index on names is recommended. Perhaps you mentioned it only as an example and do not intend to use it that way in practice. (Or you might have very few distinct names in a relatively huge table).

No, I'm not either. It was just an example of how they work.

Keith Henry

DBA/Developer/BI Manager

Keith Henry

According to everyone I know I "do something with computers?" for a living, so there you go.