Fiscal Year Partition with Fact tables

Question

Post reply

Fiscal Year Partition with Fact tables

BigB

SSC Eights!

Points: 828
More actions
September 3, 2018 at 5:17 am

#362288

Hi,

I want to implement fact table to hold current and last 2 fiscal year data and thought to implement partition function and scheme for the same. But I am not understanding what will happen with partitions and fact data when fiscal year changes?
Also I want to pull current and last 2 fiscal year data in first load but in subsequent executions I just want to pull current fiscal year data as previous fiscal year data is not going to change.

Please let me know the overall approach or any helpful link to implement this.

Regards,
B

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110192 More actions · Answer 1

BigB - Monday, September 3, 2018 5:17 AM

Hi,

I want to implement fact table to hold current and last 2 fiscal year data and thought to implement partition function and scheme for the same. But I am not understanding what will happen with partitions and fact data when fiscal year changes?
Also I want to pull current and last 2 fiscal year data in first load but in subsequent executions I just want to pull current fiscal year data as previous fiscal year data is not going to change.

Please let me know the overall approach or any helpful link to implement this.

Regards,
B

Create a clustered index having TransactionDate as the 1st column and forget about partitioning.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

It sounds to me that you don't even know if partitioning is the right thing to do never mind how to do it.

BigB - Monday, September 3, 2018 5:17 AM

Hi,

I want to implement fact table to hold current and last 2 fiscal year data and thought to implement partition function and scheme for the same. But I am not understanding what will happen with partitions and fact data when fiscal year changes?
Also I want to pull current and last 2 fiscal year data in first load but in subsequent executions I just want to pull current fiscal year data as previous fiscal year data is not going to change.

Please let me know the overall approach or any helpful link to implement this.

Regards,
B

I agree with Sergiy. Unless you have a compelling reason to partition, don't. Partitioning is usually not a performance tool and frequently causes slower performance.

So, to ask the question directly, why do you think you want to partition your table? What are the compelling reasons? I ask because THAT will be the necessary information as to what the overall approach should be, which may also be to not partition.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

BigB SSC Eights! Points: 828 More actions · Answer 3

Sergiy - Monday, September 3, 2018 6:23 AM

BigB - Monday, September 3, 2018 5:17 AM

Hi,

I want to implement fact table to hold current and last 2 fiscal year data and thought to implement partition function and scheme for the same. But I am not understanding what will happen with partitions and fact data when fiscal year changes?
Also I want to pull current and last 2 fiscal year data in first load but in subsequent executions I just want to pull current fiscal year data as previous fiscal year data is not going to change.

Please let me know the overall approach or any helpful link to implement this.

Regards,
B

Create a clustered index having TransactionDate as the 1st column and forget about partitioning.

Hi Sergiy,

Thanks for your reply. Let me tell you I am expecting close to 15 millions records for each fiscal year. If partitions are not in place then updates on the entire table will take time while for inserts I need to drop and create indexes every time. Having partitions, I can only perform my inserts/updates on current fiscal year fact partition. Please help me in implementing this.

BigB SSC Eights! Points: 828 More actions · Answer 4

Jeff Moden - Monday, September 3, 2018 8:23 PM

It sounds to me that you don't even know if partitioning is the right thing to do never mind how to do it.

BigB - Monday, September 3, 2018 5:17 AM

Hi,

I want to implement fact table to hold current and last 2 fiscal year data and thought to implement partition function and scheme for the same. But I am not understanding what will happen with partitions and fact data when fiscal year changes?
Also I want to pull current and last 2 fiscal year data in first load but in subsequent executions I just want to pull current fiscal year data as previous fiscal year data is not going to change.

Please let me know the overall approach or any helpful link to implement this.

Regards,
B

I agree with Sergiy. Unless you have a compelling reason to partition, don't. Partitioning is usually not a performance tool and frequently causes slower performance.

So, to ask the question directly, why do you think you want to partition your table? What are the compelling reasons? I ask because THAT will be the necessary information as to what the overall approach should be, which may also be to not partition.

Hi Jeff,

Thanks for your reply. I would like you to read my reply to Sergiy's post.

pietlinden SSC Guru Points: 63198 More actions · Answer 5

BigB - Monday, September 3, 2018 5:17 AM

Hi,

I want to implement fact table to hold current and last 2 fiscal year data and thought to implement partition function and scheme for the same. But I am not understanding what will happen with partitions and fact data when fiscal year changes?
Also I want to pull current and last 2 fiscal year data in first load but in subsequent executions I just want to pull current fiscal year data as previous fiscal year data is not going to change.

Please let me know the overall approach or any helpful link to implement this.

Regards,
B

Fact tables? Are you "mixing metaphors" or is this a data warehouse/star schema?
Before you spend all the time partitioning your tables, you should read Gail Shaw's article on it: Don't do something that won't help solve your problem.
https://www.red-gate.com/simple-talk/sql/database-administration/gail-shaws-sql-server-howlers/

Sergiy SSC Guru Points: 110192 More actions · Answer 6

BigB - Monday, September 3, 2018 10:48 PM

Sergiy - Monday, September 3, 2018 6:23 AM

BigB - Monday, September 3, 2018 5:17 AM

Hi,

I want to implement fact table to hold current and last 2 fiscal year data and thought to implement partition function and scheme for the same. But I am not understanding what will happen with partitions and fact data when fiscal year changes?
Also I want to pull current and last 2 fiscal year data in first load but in subsequent executions I just want to pull current fiscal year data as previous fiscal year data is not going to change.

Please let me know the overall approach or any helpful link to implement this.

Regards,
B

Create a clustered index having TransactionDate as the 1st column and forget about partitioning.

Hi Sergiy,

Thanks for your reply. Let me tell you I am expecting close to 15 millions records for each fiscal year. If partitions are not in place then updates on the entire table will take time while for inserts I need to drop and create indexes every time. Having partitions, I can only perform my inserts/updates on current fiscal year fact partition. Please help me in implementing this.

It's only true if you have a clustered index on an IDENTITY PK.
I had a table which received 3 millions of new records per day.
Guys like you created partitions for every week.
Because they could not think of any other way.
The only reason I appeared there was - try to guess - performance issues. Well, not issues, problems.
4 pretty powerful servers could not cope with data processing together with numerous replications anymore.

Over a week I created a new table with an appropriate clustered index and copied all 440 million records there.

Suddenly - you won't believe it - all the problems were gone.
Reports started taking 15 minutes instead of 2 hours, new data uploads did not interrupt other processes anymore, not to mention - the work of BA became much simpler, as she could write queries in minutes, instead of juggling those partitioned views trying to figure out which ones to use for any particular report.

You should try that magic pill yourself one day. 🙂

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

BigB - Monday, September 3, 2018 10:48 PM

Sergiy - Monday, September 3, 2018 6:23 AM

BigB - Monday, September 3, 2018 5:17 AM

Hi,

I want to implement fact table to hold current and last 2 fiscal year data and thought to implement partition function and scheme for the same. But I am not understanding what will happen with partitions and fact data when fiscal year changes?
Also I want to pull current and last 2 fiscal year data in first load but in subsequent executions I just want to pull current fiscal year data as previous fiscal year data is not going to change.

Please let me know the overall approach or any helpful link to implement this.

Regards,
B

Create a clustered index having TransactionDate as the 1st column and forget about partitioning.

Hi Sergiy,

Thanks for your reply. Let me tell you I am expecting close to 15 millions records for each fiscal year. If partitions are not in place then updates on the entire table will take time while for inserts I need to drop and create indexes every time. Having partitions, I can only perform my inserts/updates on current fiscal year fact partition. Please help me in implementing this.

That helps a whole lot. Any chance of you posting the CREATE TABLE statement for the table to be partitioned along with any indexes, triggers, and constraints that may be on the table so that we can provide some suggestions and also identify any caveats that may cause some serious problems down the road?

Also, is the older data ever updated or does it become totally static? If the latter, then partitioning can really help keep backups and maintenance time down to a minimum as it did with some of my larger tables.

And, yes, I still agree with what Sergiy has posted about performance issues but doing things in a similar fashion to partitioned tables or views will lessen some of the very problems he's talking about. I say lessen because queries played against partitioned tables are almost always slower.

They can be incredible for maintenance, as I've previously stated, which seems to be a part of your goal.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ScottPletcher SSC Guru Points: 100262 More actions · Answer 8

What often happens with partitioning is that, as part of the partitioning process, the table clustering is changed to be on date rather than some other column, such as identity. The performance gets much better, and people assume it is the partitioning that did it, when it reality it was the re-clustering.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

ScottPletcher - Tuesday, September 4, 2018 1:28 PM

What often happens with partitioning is that, as part of the partitioning process, the table clustering is changed to be on date rather than some other column, such as identity. The performance gets much better, and people assume it is the partitioning that did it, when it reality it was the re-clustering.

The only time I've ever seen performance get better for such a thing is because they had truly crap code and partition elimination simply made their scans be smaller. Fixing the code to run on the original monolithic table blew the supposed performance gains on the partitioned table away. The monolithic table also cleanly beat the portioned table with the properly written code and the help of a non-clustered index. Dropping that non-clustered index and substituting the clustered index as you say slowed things down because the query had to deal with wider pages meaning more pages because there were fewer rows per page.

I DO agree that if you need to return most of the columns of a wide leaf level most of the time then, yes, clustering as you say is the way to go. If you're not going to return most of the column most of the time, it's not.

As with all else in SQL Server, "It Depends".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 10

Didn't get home very long ago. Really full days lately but figured I tell you what I've done for my largest portioned table, which consists of a little bit of "normal" data and a LOB column, which contains a compressed type of WAV file for telephone call recordings.

The first realization that I had is that this is a WORM table (Write Once, Read Many). Once a row appears, it never changes.

The second realization I had is a simple "laundry fact". Ask yourself... when you do the laundry, which clothes do you wash. The clean clothes, the dirty clothes, or all the clothes. Hopefully, it's just the clean clothes. 😛 The same thing holds true for data... what really needs to be backed up more than once? Only the "dirty" data which, in this case, is only new data.

The third realization I had was that I was inserting multiple rows at once but only reading one row at a time. That meant I didn't have to concern myself to much with reading batches of data. I just need to ensure that the ID of the call was unique and that was easy because the Calls table (which was the header table for the recordings table, like the proverbial Invoice/Invoice detail setup) had a Clustered PK on the CallID. That CallID was duplicated in each row of the Recordings table. The trouble is, an IDENTITY column doesn't provide a temporal boundary so I had to modify the Recordings table and the code that populated it to contain the exact same date/time as what was in the header. The bummer that is that it made an already large table a little bit larger but it was totally necessary. Keep in mind that the clustering column is added to every unique index, which means it's no longer usable as a single column FK unless you don't mind having non-aligned indexes, which is pretty stupid because they will prevent the use of Switch, which is key to being able to pack all the free space out of the partition unless you don't mind dropping the non-aligned index and maintaining the damned thing because it's not going to be Read_Only.

My plan was to partition by month. In order to ensure that I could remove any free space from the current month when the clock said it's now the previous month, I created 1 file group for each month and created one file for each file group. I also create one extra filegroup/file for "next month" so the transition from the current month to the next month would be done auto-magically.

When the current month becomes the previous month according to the clock, I wait about 4 hours to ensure that the call transfer system (the thing that extracts calls and call recordings from the phone system to the database) isn't still processing latent or lengthy calls. Then, I run the system through the following steps.

1. I remove any free space from the now previous month partition. That's not as easy as it sounds because rebuilding any index over 128 extents (which is only 8 MB and I have call recordings longer than that) will preserve the original index until the new index is built and committed and then the old index is erased leaving a shedload of free space in that partition. So you have to do a song and dance to switch out the partition, rebuild the index to a new partition with the proper growth pattern, and switch it back in. You can't do that in the in the same partition for obvious reasons.
2. Then, I set the now previous and "well packed" partition to Read_Only. That actually requires an outage for the table (about 8 seconds on the current fairly old box) because it's a 2005 box (still) and no one can be in the table to set even 1 partition to Read_Only.
3. Once I've packed and made the partition Read_Only, I back it up one final time.
4. All this time, the new current month partition (which used to be the empty next month partition) is taking on new rows every minute. In order for that to happen as cleanly next month, I build the new next month partition and I'm done for the month.

If you have something like an Invoice/Invoice Detail system where the detail rows are updated once each line item is shipped, you may want to put of the immediate closure of the new previous month for a month or two to be sure but the same principle applies each month. The oldest open partition should be squeezed and then made Read_Only and a new next month partition should be made.

Of course, if you expect all of the partitions to always be active, that's a whole 'nuther story. But, for my monster "WORM" table, it made it so I don't have any regular index maintenance (I only rebuild indexes on a partition when it becomes the previous month) and instead of backups taking more than half a day (I have 9 years of recordings), I only have to backup the current month, which starts out empty. Near the end of the month, the backup still only takes about 6 or 7 minutes.

It's all been working auto-magically for more than 3 years now and you'd think I'd be elated. I'm not. I'm pissed of at myself for drinking the purple Koolaid that Microsoft served up with all it's stupid recommendations about how and why it recommends Partitioned Tables over Partitioned Views.

Partitioned Tables are the berries if you want to maintain an IDENTITY column. I don't need to do that and I find most people don't need to on WORM and near-Worm tables (like and Invoice Detail table, for example). One of the drawbacks about Paritioned Views is that it's somewhat difficult and a bit tricky to include an IDENTITY column. But that's that's really the only logical difference. The big physical difference is that you CANNOT restore just some of the partitions of a Partitioned Table to a lower environment. It won't let your RECOVER the table unless all partitions are restored and it won't let you change the Parition Function or Scheme to drop any missing partitions because they're missing.

With a partitioned view (say, 1 month per database), you can restore any bloody month you want and then just change the view (REALLY REALLY important if you want to use just a couple of months in your Dev or Test environments) It simulates SWITCH just by changing the view without all the mumbo-jumbo you have to deal with in Partition Functions and Partition Schemes. And, yes... you can make Partitioned Views updateable without too much effort. You can only have IIRC 253 partitions in a Partitioned View (a limit by SELECT UNION ALL rather than anything else) but consider that 253 months is more than 21 years. If you're still around then, you can combine a couple of the years worth of monthly databases into yearly databases, rebuilt the view (takes less than 63 ms in most cases) and Bob's your uncle! 😀

And in the process, all code that reads a row from the Recording table slowed down 100%. While it's all measured in milli-seconds, just remember that if you're going to be reading more than 1 row at a time. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions