Other types of database

  • My boss has got quite excitied recently about a new type of data warehousing he as come across. It claims to be faster than OLAP while keeping the qualities of structured relational data that OLAP looses.

    An example of this is Sand (http://www.sand.com), and their site goes on at length about bitmap indexes etc... They describe themselves as "column based relational analysis servers".

    Info on this kind of tech is thin on the ground, anyone here got any info on it? I've seen demos that appear very fast but I'd need to see what it's like with millions of records on massive columns.

    Keith Henry

    DBA/Developer/BI Manager




    Keith Henry



    According to everyone I know I "do something with computers?" for a living, so there you go.

  • Keith,

    I can never keep up with where the non-dsiclosure is with Yukon, but if you take a look around the MSwebsite at the AS side of the Yukon release (or even join the beta?), you'll see some definite similarities between the stories re: merging of OLAP and relational.

    There is public info here http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/next/DWSQLSY.asp , if you select the Analysis Services link (from TOC), then look for the Proactive cache and also UDM (Universal Data Model). (Looks like most of the NDA has been lifted [:-)]

    There's a lot of tweaking options meant to keep the response times extremely quick with the proactive cache.

    Hope this info helps (you stay with MSSQL [:-)]).

    Cheers,

    Steve.

    Steve.

  • quote:


    My boss has got quite excitied recently about a new type of data warehousing he as come across. It claims to be faster than OLAP while keeping the qualities of structured relational data that OLAP looses.

    An example of this is Sand (http://www.sand.com), and their site goes on at length about bitmap indexes etc... They describe themselves as "column based relational analysis servers".

    Info on this kind of tech is thin on the ground, anyone here got any info on it? I've seen demos that appear very fast but I'd need to see what it's like with millions of records on massive columns.


    if you are seriously interested, ask the company for an inhouse presentation and for contacts to other clients.

    In most cases thing are getting relative this way

    Frank

    http://www.insidesql.de

    http://www.familienzirkus.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Thanks for the response...

    quote:


    Hope this info helps (you stay with MSSQL [:-)]).


    Not much chance of me leaving it to be honest, but I like to know the competition pretty well so that when my boss starts talking about some new tech he has a flyer for, I have an answer as to why our current strategy is still better. It's kept me out of Oracle and Sybase so far

    quote:


    if you are seriously interested, ask the company for an inhouse presentation and for contacts to other clients.


    I'd like to find out a bit more about this technology before I talk to any of them if possible. We have already had a demo from Alterian and they only talked a great deal about end user functionality, when what I need to know is architecture. Also only Croesus could afford it.

    What are "bitmap indexes" and "column based databases"? How are they different from the B-Tree indexes I'm familiar with?

    We have a specific problem, we have a few VLDBs which are also very complex (so many columns that its split across 8 tables, before we even look at the normalised stuff).

    The problem is that a lot of the figure used to query this system as statistical probabilties and so overlap. They also need to prioritise a lot of client requirements to what is a complex relational data set. This rules out MS OLAP and takes a few thousand lines of stored procs code.

    So I wrote a system that builds a result set that meets their criteria and then builds a cube on the fly to analyse.

    The problem with this is that it is too slow (5-45 mins, which is still 3 days quicker than the entire department who used to do it). My boss thinks Alterian or Sand could do it quicker (ie in seconds), and I think they're just a different way of looking at metadata and so would have the same problem as OLAP.

    I guess the root of it is that I just don't like not knowing how something works :-S

    Keith Henry

    DBA/Developer/BI Manager




    Keith Henry



    According to everyone I know I "do something with computers?" for a living, so there you go.

  • First off there are many types of database infrastructures and of course all claim to be the best. Check with some of the major benchmarkers to see if they have anything on this company with a direct comparison. Also, keep in mind as if you don't have the proper knowledge you don't get the full performance if designed improperly. I hate dealing with companies that they have to send someone (at my cost) to setup and tweak the database. Get more info on the syntax, especially specialty syntax and if they provide proper teacher in the package cost.

  • Turns out that bitmap indexes are quite a standard form of index.

    They are used for highly non-specfic but non-unique data in large datasets (eg Actual Company Turnover which is a money field or our models, which are floats)

    You can declare them explicitly in Oracle.

    SQL uses them behind the scenes in Hash & Merge joins and Group By queries:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sqlbitmaps.asp

    Normal (B-Tree) index:

       Index    Index    Index    Data

       A - m -> a - e -> a - c

       N - z    f - m    d - e -> David

                                  Duncan

                                  Eddie

                                  etc

    Bitmap Index:

       RecID    Name     Bitmap

       1        David    0010000000

       2        Keith    0100000000

       3        Ian      0000010000

       4        Ian      0000010000

       5        Eddie    0000000100

       6        David    0010000000

    This is sometimes called a column index because of the ways the index works. There are as many columns as there are different values, although as each column is one bit the index is still quite small, so to find "David" in the records above we just look at that column

    RecID Name Bitmap

       RecID    Name     Bitmap

                David    __1_______

                         __0_______

                         __0_______

                         __0_______

                         __0_______

                David    __1_______

    Bitmap indexes carry a big maintenance overhead, and really should only be applied to read-only data, they function in a lot of ways very like aggregations.

    In fact Microsoft OLAP uses bitmap indexes in MOLAP cubes, and can be overridden to use B-tree in certain circumstances:

    http://msdn.microsoft.com/library/en-us/dnsql2k/html/sql2k_anservregsettings.asp#sql2k_anservregsettings_topic11

    There are better algorithms yet, but they are all the preserve of academic research and specialised systems (the system used on the billons of records in the known star database makes interesting reading)

    Keith Henry

    DBA/Developer/BI Manager




    Keith Henry



    According to everyone I know I "do something with computers?" for a living, so there you go.

  • Keith,

    Just adding my two cents to the information you have provided:

    It is recommended that bitmaps are best suited when the cardinality of the column is low.

    For example : Bitmapped Index on gender (Male/Female) of customers.

    I am not sure if creating a bit-mapped index on names is recommended. Perhaps you mentioned it only as an example and do not intend to use it that way in practice. (Or you might have very few distinct names in a relatively huge table).

    Thanks

    Ram

  • Thanks for the response

    quote:


    It is recommended that bitmaps are best suited when the cardinality of the column is low.

    For example: Bitmapped Index on gender (Male/Female) of customers.


    You see this is where I encounter some confusion. Low cardinality data is ideally suited to OLAP. My problem is that the cardianality of my data is very high (lots of floating point values), and this was where bitmap indexes were suggested to me.

    Here is an articel about using bitmap indexes with scientific star chart data: http://sdm.lbl.gov/sdmcenter/pub/SD.john.0209-SDM-Progress.ppt

    What it's saying is that a basic bitmap index is for low cardinality data, but by using their own compression routines they were able to use it for high cardinality fields.

    Apparently Oracle uses BBC compression to allow bitmap indexes on high cardinality stuff.

    So, still confused.

    quote:


    I am not sure if creating a bit-mapped index on names is recommended. Perhaps you mentioned it only as an example and do not intend to use it that way in practice. (Or you might have very few distinct names in a relatively huge table).


    No, I'm not either. It was just an example of how they work.

    Keith Henry

    DBA/Developer/BI Manager




    Keith Henry



    According to everyone I know I "do something with computers?" for a living, so there you go.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply