Help with a first attempt at Analysis Services

  • Let me start this post with a quick apology if this seems too basic. I have been tasked with creating a cube for some data and having worked with OLTP databases for quite a while am having some difficulty with grasping OLAP design. I need to be able to create some fairly complex reports from what amounts to a single flat file from a non-relational database. Currently we suck the data into a table within SQL server 2000. The application will give summary data at multiple levels for students at our school district. This data is for one test.

    Here are the fields:

    1. StudentID - This relates to a student table

    2. Building - Building the child was tested at

    3. MathProf - a number 1,2,3,4 meaning not proficient, below prof,prof and advanced

    4. ReadProf - same as Math Prof for scale

    5. WritProf - same as above

    6. Grade - the grade the student is in

    7,EthnicCode -

    8.Gifted

    There is a table of teachers with what studIDs are in their classes.

    I need to sumarize data across buildings, for a whole building, for an individual teacher, by reading,writing or math. I also need to sumarize across all of the same levels by just 6th graders or by ethnicCode or by gifted. etc.

    Does the above make since? Any ideas on how to massage the data to fit this reporting and summary type?

    Thanks in advance,

    Mike

  • Hi Mike,

    I'm new to OLAP myself, but hopefully I can push you in the right direction.

    First, create an OLAP database in Analysis Manager, then click on it to expand it. Right click on "Data Sources" and setup a data source pointing to your SQL2k db.

    Each cube in an OLAP database is made up of one fact table, and many dimensions/measures etc...

    The fact table is the key part. Facts refer to "numerical facts", i.e something you can analyse. They contains numeric data only (not foreign keys etc..).

    When you create your cube, you're first asked to provide a fact table from your DBMS (SQL2K in your case). Select only the fields in this table that you want to perform numerical analysis on..

    e.g - you may want to display average MathProf, ReadProf and WritProf results - maybe even average grade for groups of students falling into the criterion you are about to define.

    Ignore the rest of the fields - they can't be analysed numerically.

    The next thing you do is create "dimensions"... these are ways you want to group/categorise the data... so, in your case, you'd want a Building dimension, Ethnic dimension, and possibly Gifted Dimension.

    When you create a dimension using the dimension wizard, it asks you the type of dimension you wish to create. A "Snowflake-Schema" is beyond what you need here. It covers joins between multiple tables in an RDBMS. Select "Star", and continue to the next screens. Again, you need to choose a table for your dimension, which will be the same single table in your sql2k db.

    Say you're creating your Building dimension... choose the building field as your dimension data and browse the data to make sure you get what you expect. I forget the next steps off the top of my head, but you want to make sure NOT to click "create hierarchy". Also, at this point it probably won't matter whether you choose to share the dimension or not. There are differences between shared dimensions and private ones, but nothing to concern yourself with yet.

    Anyway.. you'll get to the end of this wizard and have a dimension by which you can categorise/split your data. You'll want to go through and do this again for all of your required categorisations (dimensions).

    Once you've added all your dimensions, right clik on the cube in analysis manager, and select "process cube". Hopefully this will complete successfully, after which you can sright click the cube again and select "browse data". If all has gone well, you'll have sums of all the data in the fields.

    In your case you'll have to change a lot, if not all of the calculations to averages..

    One other very common dimension (and it even has it's own type) is a "Time Dimension". You'll probably want to be able to tell how many proficient, non proficient etc.. students there were in each grade in each year... That wasn't mentioned in your question, but I thought it worth pointing out. All you need is a datetime field in your sql server database... analysis services will even prompt you based on only the relevant fields for which one you want to use. In your case, you might even get away with just a smallint field, for graduation year, or something.. but that's beyond what I've done.

    Sorry if that seems a little brief, but there are books on this stuff, and I still need to read all of them 😉

    I'm sure people will post any corrections/amendments to what I've said, but hopefully it'll get you started.

    All the best

    Mark

  • Mike,

    You'll definitely want to add the Time dimension that Mark suggested, without this, any students that repeat a grade(incl exams) will skew results when looked at from a summary level. It will also allow for good time series analysis of whether teachers (or I guess students) are improving year on year (ie better average grades), or not, as the case may be.

    You'll want to get teacher in there also, I personally would have a Student dimension and maybe a Teacher dimension with a Teacher-Class-Student type hierarchy. Just be careful when using the same members (ie "things") within separate dimensions. Inexperienced users can end up with strange results e.g. I filter to SteveA in the student dimension, and all of my Teacher values look too low; or vice-versa I filter to a particular class and then SteveA's values are zero'd. In the first case filtering to a single student will mean that all teacher/class combinations where that student *doesn't* appear will go to zeros. In the second, filtering to a particular class will result in the values for all students *not* in that class to be zero's.

    You'll also probably want to create a Distinct Count for the students. Depending on your data set size, you may want to do this (read as "should do this" in most cases) in a separate cube (that has the same dimensions). There are performance issues (mainly related to aggregations) that are encountered when using Distinct Counts, so the prefered method is do them (singular) in a separate cube and then use Virtual cube(s) to view them with the other "standard" measures.

    Having said all of that, they key place to start is in your design of your DB, and ensuring you have a feed mechanism to populate this minimart that works in a reliable and consistent manner. Having a solid base definitely helps in designing any OLAP models off the top of the system.

    Once you have this, a great way to start building cubes is to use the wizard(s) that Mark mentioned, and then build on your knowledge from there.

    Hope it goes well. Make a start and if you run into issue/questions, shoot them thru to here, there's quite a few people viewing these threads that should be able to help you out.

    Steve.

    Steve.

  • Thanks a bunch, I am well on my way. I have a cube with everything I need but averages. I have sums. You mention switching the aggragate functions to averages, but that is not a choice. Do you have an idea how to do this? I see the calculated member options but have not found how to use them to just get a basic average or if I have to use them at all. Again, thanks for the time in helping me!

    Mike

  • Ok, I got the averages by creating both a SUM measure and a count measure for each score (reading, math,writing) I then created a calculated measure for each score in the format :

    [Measures].[Math_SUM]/[Measures].[Math_count]

    Does this make sence to do it this way or is there a better way?

    Thanks again,

    Mike

  • Mike -

    You don't need a count measure to generate an average - though the count measure will allow you to analyze how many tests were given.

    To generate an average measure within the cube use the Avg() MDX function. This Function will return the average value of a set so, Avg(Student.Members, Math_Sum) will give you an average math score for all students. If you slice on grade, you will have the average math score for students in the selected grade.

    Scot J Reagin

    sreagin@aspirity.com

    Scot J Reagin
    sreagin@hitachiconsulting.com

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply