Questions regarding database schema design that may get really funky

  • Hey Guys --

    Hopefully this doesn't fall on deaf ears (or blind eyes rather) 🙂 We've just started hashing out some requirements for a new accounting application that will be based on either Investor or Pool (Pool being a collection of Investors) and I'd like to bounce a suggestion to the group to see if you guys have a better solution or possibly ways to improve my angle.

    This is still concept so I have no code per say, but the basic layout of tables/schemas will be as such:

    Investor Table - containing all Investor information

    Pool Table - containing all Pool Information

    XRefInvestorPool - Contains which Investors link to which Pools

    Cash Flows - All financials coming in and out

    Calculations - NPV, IRR, etc

    Comments - Comments entered by users

    Contact Information - Contact info for whatever

    etc...

    The catcher is the system can be rolled out as either everything summing up to the Investor level or to the Pool level, so in one instance of the application everything may be keyed on Investor so all Cash Flows, financial calculations, etc are summed to the Investor, but in anther instance of the application it could all roll-up to the Pool instead.

    The idea I'm running with now is to use GUID's as keys for each table, and for example if it's Investor Based then the GUID in the CashFlow, Calculations, etc tables would be from the Pool table, and likewise for Investor-based systems.

    Also comments could be entered based on any of these (Investor, Pool, cash flow, etc) so having a LinkGUID field in the Comment table could then tie to any table that's keyed with a GUID.

    I know the pro's and con's of using a GUID since it's 16 bytes per record and doesn't index well, but it seems to be the best logical fit so the same database can be used in either an Investor or Pool based system.

    The biggest hang-up I'm running into is doing this won't allow me to really have any primary/foreign key constraints. I like doing database diagramming so the developers have a graphical representation of the database, but without the key constraints each time the database changes I'll have to manually update any graphical schema I generate.

    So firstly how sound is the GUID idea and not using key constraints? I'll code the procedures in such away that there shouldn't be any problems, but I still like that comfort of knowing the key constraints are keeping everything in line.

    And if this isn't a 'best practice' way of doing it, what suggestions can you guys offer to make this work?

    Thanks for any suggestions or pointers... Take care,

    Sam

  • Why not just make it so that all cash flows are at the pool level, and create pools with a single investor where the cash flow has to be at the investor level.

  • I don't completely understand what's going on, so please excuse my ignorance.

    First, can you explain in further detail why you can't have constraints?

    I'm not a huge fan of GUIDs, but they're not the end of the world. The question I have is, do you have a natural key in addition to the GUID? If not, on top of not having referential constraints, you're looking at the possibility of a lot of data duplication.

    In general, again, if my understanding of what you've outlined is correct, having a common table between lots of disparate tables that are otherwise unrelated frequently leads to this type of compromise on design. Because of it, I'd put in more than one comments table and then plan on grouping that data together through other means, as necessary. That way you don't have to shoe-horn the data in.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • I vote for Michael's solution.

    No reason not to handle it at "pool" level - I would probably rename it as "investor entity" so to differentiate from "investor individual".

    Most probably I'll design in a way that allows for a single "investor individual" to be part of 1-n "investor entity"

    This way you can easily get granular-details/positions/aggregations/reports at either "investor entity" and investor individual" levels

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Hey Guys --

    Thanks for the great advice, but the problem we're running into is we don't know if a client will want cash flows and other tables that roll up to be at the Pool Level or Investor Level. If I roll everything up to the Pool by default (which a Pool is just a collection of Investors) and a client asks to use the system based on Investors instead that data won't be easily available since I've rolled everything up to the Pool.

    My first thought was to roll everything to the Investor then roll these up to the Pool, but that creates an extra layer that could lead to latency since we could have literally thousands of Pools/Investors and millions of cash flows.

    Also aside from this we have tables like Comments, Address, etc that could be used by any other entity within the database (Pool, Investor, Guarantor, Borrower, Loan, etc) so what's the best scenario for creating key constraints in this situation? In OOP terms it's like the Comments table, Address table, etc are objects and I need to instantiate a row within each for any other given table. My thought of using GUID's was these global tables like Comments and Address would have a LinkGUID that could contain the GUID of any other row in any other table, but again doing this won't allow key constraints.

    I'll create some code showing exactly what I'm trying to do and post it in a bit, but it's like I have a theory on how this will work (and even have a small concept DB written), yet it won't let me easily do key constraints since most tables can link to most any other table through the GUID.

    Take care --

    Sam

  • samalex (10/6/2010)


    Hey Guys --

    Thanks for the great advice, but the problem we're running into is we don't know if a client will want cash flows and other tables that roll up to be at the Pool Level or Investor Level. If I roll everything up to the Pool by default (which a Pool is just a collection of Investors) and a client asks to use the system based on Investors instead that data won't be easily available since I've rolled everything up to the Pool.

    My first thought was to roll everything to the Investor then roll these up to the Pool, but that creates an extra layer that could lead to latency since we could have literally thousands of Pools/Investors and millions of cash flows.

    Also aside from this we have tables like Comments, Address, etc that could be used by any other entity within the database (Pool, Investor, Guarantor, Borrower, Loan, etc) so what's the best scenario for creating key constraints in this situation? In OOP terms it's like the Comments table, Address table, etc are objects and I need to instantiate a row within each for any other given table. My thought of using GUID's was these global tables like Comments and Address would have a LinkGUID that could contain the GUID of any other row in any other table, but again doing this won't allow key constraints.

    I'll create some code showing exactly what I'm trying to do and post it in a bit, but it's like I have a theory on how this will work (and even have a small concept DB written), yet it won't let me easily do key constraints since most tables can link to most any other table through the GUID.

    Take care --

    Sam

    The way to go with something like Address is to create the one address table and then multiple mapping tables. That way, if you have two different objects, they can share an address, without having to sweat what kind of object is using the address. But, you'll really want to have referential integrity in place, otherwise, someone can delete the address in use by multiple other entities and then you'll be in a world of pain.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • samalex (10/6/2010)


    Hey Guys --

    Thanks for the great advice, but the problem we're running into is we don't know if a client will want cash flows and other tables that roll up to be at the Pool Level or Investor Level. If I roll everything up to the Pool by default (which a Pool is just a collection of Investors) and a client asks to use the system based on Investors instead that data won't be easily available since I've rolled everything up to the Pool.

    My first thought was to roll everything to the Investor then roll these up to the Pool, but that creates an extra layer that could lead to latency since we could have literally thousands of Pools/Investors and millions of cash flows.

    I dunno... I think PaulB is more on the right track. If you are by default always "rolling up" to the pool level, then you will use a consistent "join path" to get the investor/cashflow relationship regardless of whether its at the pool level or investor level, because with both methods, an investor will have an associated pool even if he is the only guy in the pool. This seems to make the most logical sense.

  • Grant Fritchey (10/6/2010)


    The way to go with something like Address is to create the one address table and then multiple mapping tables. That way, if you have two different objects, they can share an address, without having to sweat what kind of object is using the address. But, you'll really want to have referential integrity in place, otherwise, someone can delete the address in use by multiple other entities and then you'll be in a world of pain.

    Is something like this what you're talking about doing where [Address] contains all the addresses then [AddressXRef] contains all links between the Address table and any other tables via LinkGUID?

    CREATE TABLE [Address] (

    [AddressID] Numeric(18,0) IDENTITY PRIMARY Key,

    [City] NVARCHAR(100) NOT NULL,

    [State] NVARCHAR(100) NOT NULL,

    [PostalCode] NVARCHAR(100) NOT NULL)

    CREATE TABLE [AddressXRef] (

    [LinkGUID] UNIQUEIDENTIFIER NOT NULL,

    [AddressID] Numeric(18,0) NOT NULL)

    ALTER TABLE [AddressXRef] ADD CONSTRAINT FK_AddressXRef

    FOREIGN KEY ([AddressID]) REFERENCES [Address] ([AddressID])

    If so what's different between this and having the LinkGUID in [Address]? Either way it doesn't seem possible to create a key constraint between LinkGUID and the various other tables it could pull from. I could write a custom constraint to check the GUID against other valid tables, but as the database evolves we'll have to update the constraint as new tables are created which could be a management nightmare.

    As for the scenario of having Investor and Pool tables where one is priority below is a quick sample of how I'm envisioning setting it up, though it's pretty basic:

    CREATE TABLE [Pool] (

    PoolGUID UniqueIdentifier DEFAULT NEWSEQUENTIALID() PRIMARY Key,

    [Name] NVARCHAR(50) NOT NULL DEFAULT(''))

    CREATE TABLE [Investor] (

    InvestorGUID UniqueIdentifier DEFAULT NEWSEQUENTIALID() PRIMARY Key,

    [Name] NVARCHAR(50) NOT NULL DEFAULT(''),

    PoolGUID UniqueIdentifier NOT NULL)

    ALTER TABLE [Investor] ADD CONSTRAINT FK_Investor

    FOREIGN KEY (PoolGUID) REFERENCES [Pool] (PoolGUID)

    CREATE TABLE [CashFlow] (

    -- LinkGUID could link to either [Investor].InvestorGUID or [POOL].PoolGUID

    -- depending on system configuration

    LinkGUID UniqueIdentifier NOT NULL,

    -- Cash Flows are by month so ActiveDate is first day of the month where these amounts apply

    ActiveDate DATETIME NOT NULL,

    Income NUMERIC(18,2) NOT NULL,

    Expense NUMERIC(18,2) NOT NULL)

    ALTER TABLE [CashFlow] ADD CONSTRAINT [PK_CashFlow]

    PRIMARY KEY CLUSTERED (LinkGUID, ActiveDate ASC)

    CREATE TABLE [Comments] (

    CommentID Numeric(18,0) IDENTITY PRIMARY Key,

    -- LinkGUID could link to any other table with a valid GUID

    LinkGUID UniqueIdentifier NOT NULL,

    CommentText NVARCHAR(MAX) NOT NULL,

    CommentDate DATETIME NOT NULL DEFAULT(GETDATE()),

    EnteredBy NVARCHAR(50) NOT NULL)

    The Comments table is similar to the Address scenario we talked about up stream in this post, but I have the LinkGUID within the table as opposed to having a separate cross reference table.

    Thanks for any advise on a better way to do this --

    Sam

  • Yes, the interim table approach will allow for data integrity. But I wouldn't try to do the universal interim table. Despite the fact that it adds to the number of tables in the system, I'd go with one per entity. Yeah, that means as you add stuff to the system you'll have to build out the constraints, etc., but the question is, do you want data integrity or not? If you do, you use the mechanisms provided.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Grant Fritchey (10/6/2010)


    Yes, the interim table approach will allow for data integrity. But I wouldn't try to do the universal interim table. Despite the fact that it adds to the number of tables in the system, I'd go with one per entity. Yeah, that means as you add stuff to the system you'll have to build out the constraints, etc., but the question is, do you want data integrity or not? If you do, you use the mechanisms provided.

    Fair enough... I've started building triggers to check for inserted records and verify the GUID's match the appropriate tables. If not it rolls back the insert and logs it to an Error table plus presents the user with a custom error stating what the problem is. Seems to work rather well in testing so this should do the trick.

    Thanks again --

    Sam

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply