Performance Tuning - Analysis Server Options

  • I have a question about the impact on Performance regarding two of the options on Analysis Services. The two options are: 'Read Ahead Buffer' and 'Process Buffer Size'.

    Problem:

    I have a Datamart that we re-build every night that takes about 3 hours to complete. It takes us over 6 hours to re-process the cubes every night. We use Informatica as our ETL tool. At this point in time, we are unable to perform an incremental update on our datamart schema. We hope to switch to incremental loading by Q2 of 2003, depending on Schema changes to the source system by the Vendor. Until then, I need to improve processing time as best I can.

    I have read articles where they mention that a perfomance gain can be made by increasing the values of the Read Ahead Buffer size (default 4 MB) and Process Buffer Size (default 32 MB). Our Analysis Server is a Pentium 4 Xeon 2.4Ghz with 2 CPU's, 4GB of RAM, 250+ GB of available Drive space.

    Question:

    I would like to know if anyone has any experience with changing either of these values and their own performance impact, good, bad or none.

    I would also like to know if anyone knows of any good performance tuning books / resources for Analysis Services.

    Thanks

  • We have basically the same process. We fully process our cubes every night after doing a daily ETL process. We have held off on incremental updates for various reasons. Partitionaing the cubes is not possible since we are not on the Enterprise edition.

    I have played with the AS settings (and SQL settings) extensively in order to tweak performance. I've found that the two settings 'read ahead buffer' and 'processs buffer size' don't make any (or much) difference in processing times. Actually, the RAB is used to tune the user's queries against the cubes, so that's not even a factor. I think the bulk of the optimizing work needs to be spent on the SQL side (the queries that return your fact and dimension data). Add indexes and clustered indexes where appropriate to optimize the queries. Also, ensure that the db is defragged and that update statistics are done.

    On the AS side, I would suggest to allocate a fixed memory size for SQL Server, rather than relying on the Dynamic memory allocation. Since you have 4GB, I'd suggest (if you are not already) changing SQL Server to a fixed 2GB of memory (or more if you have Enterprise Ed). In AS, set the min memory to 1GB or so (or more if you have nothing else running), and set the RAB and PBS to something small and reasonable, say 16-64MB each.

    FYI, our server is a 4 CPUs P3-550 Xeon with 2GB RAM. The memory settings are:

    SQL Server -

    Memory tab:

    Fixed 1280MB

    Reserve phys memory for SQL YES

    Processor tab:

    Boost SQL Priority for Windows NO

    Use Win NT fibers NO

    Analysis Services -

    Environment tab:

    Min allocated memory 384

    Memory conserv. threshold 640

    Processing tab:

    Read ahead buffer 4

    Process buffer size 64

    I've played with the setting up and down, left and right, and these are as good as anything I have tried.

  • Oops, meant to also mention this....I have not found any good books on AS and performance tuning. I have the Microsoft OLAP Solutions book (Wiley Press) which has a whopping 2/3rds of a page dedicated to these server settings. I've found the most useful information on this website:

    http://www.sql-server-performance.com/olap_performance.asp

    Hope that helps...

  • Thanks for the input!

    I tried changing the setting on the Process Buffer Size and it had no impact on processing. I will see what tuning we can do on the database side to help improve performance.

  • I've had basically the same experience as jayharper, and agree with him. In my case though, the processes were IO bound, and working with the filegroups of my server produced dramatic results. Just a suggestion of something else to look into.

  • Have you looked at the 'Optimize Schema' function (Tools menu, Cube Editor)?

    Though this function assumes some design considerations in the supporting schema, it can have a huge impact on processing performance.

    Essentially, this design presumes that each fact row contains the actual member key for all dimensions in the cube. IE: customer_ID column in the fact table is the same as the LEAF level member key in the customer dimension.

    If this is true for all dimensions then all joins to dimension tables can be eliminated during cube processing.

    In a large dataset with several dimensions in play, this can be a significant savings on processing time.

    Finally, it's often the case that most processing time is spent in aggregations, not loading fact data. Make sure your aggregation level is reasonable (something like 25% is a good start, though very large cubes with many dimensions may choke on a value this high).

    Hope this helps.

    Scot J Reagin

    sreagin@aspirity.com

    Scot J Reagin
    sreagin@hitachiconsulting.com

  • Once again, I thank everyone for their suggestions.

    Scorpion's post made me think of something obvious: CPU vs IO bound

    I decided to connect at night and check the server. I noticed that Analysis Server is only taking about 50% of the CPU utilization. It looks like there may be a problem on the Database server side. I don't know if I mentioned it, but I am executing the reprocess via a DTS / SQL Agent job. I need to verify that the SQL Server Agent properties are set for multi-tasking properly. I am not explicitly processing all of the cubes in a separate re-process task. I have selected the cubes level and reprocessing all of them in that fashion.

    I am testing a data refresh on our development server. For this test, I have two separate packages. One that refreshes all cubes (1 task) and another that refreshes each cube separately, with no precedence between them.

    My hope is that the data refresh with multiple tasks will ultimately provide a performance increase due to multiple cubes being processed simultaneously.

    Has anyone else tried something like this or is this another one of those "sounded great on paper but ..." ideas?

  • Samiu2 - I would recommend to create a DTS package which processes all your cubes sequenctially, one after the other. We had a similar situation when I started - our cubes were all being processed simultaneously. The first this I did was redo the package to process them sequentially and we got about a 40% boost in throughput. It probably depends a lot on the size of your cubes and the processing power (cpu, memory) of your server. We have a fairly low end server and are processing rather large cubes.

    shotofirish - I have a question for you about the Optimize Schema. I run this and it comes back to say

    "The following dimensions cannot be optimized: (list of dimensions).

    Private and virtual dimensions cannot be optimized."

    However, the list of dimensions are SHARED dimensions. Was this just an oversight in the error message? Also, once you have done this, what next? Is it now optimized?

    Thanks!

  • Hello jayharper -

    It's true that Private and Virtual dimensions cannot be optimized. Note also that Virtual dimensions are always Shared.

    However, if your dimensions are shared and not virtual then the message is misleading for your situation (it's generic text).

    Typically, a shared dimension cannot be optimized if one or both of the following conditions are not met:

    1. Member Keys at the leaf level are not unique (check the level property)

    2. Member Keys are derived using SQL manipulation in the Member Key property (leaf level)

    3. Member Keys are not the values used in the PK/FK relationship

    If these conditions are met, optimizing the schema changes the reference at the leaf level of the dimension to the fact table (rather than the dimension table). Note that this change is made in the cube reference (cube editor) only - not in the shared dimension itself.

    To see the effect, optimize the schema for the Sales cube in the Foodmart database. Look at the Member Key Column property of the leaf level (Store Name) of the Store Dimension. It now references a column in the fact table.

    Process this cube and view the SQL statement used to process the Sales Partition. Note that the join to the Sales table is no longer in the Where clause. The join is no longer necessary.

    A pretty long-winded explanation. Let me know if this doesn't make sense.

    Scot J Reagin

    sreagin@aspirity.com

    Scot J Reagin
    sreagin@hitachiconsulting.com

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply