Entity-Attribute-Value is evil ? how to avoid...

  • Jason A. Long - Friday, November 30, 2018 12:32 AM

    ...

    Is there a moral to the story?  Yes, I believe that I may have found, not just a legitimate use case for EAVs, but a case where they actually are the best possible solution. 
    Has this changed my opinion of EAVs? Maybe a little but not really... My problem with EAVs is based on the fact that the majority of them get used in places and in ways that simply aren't appropriate and are created by developers that think they've discovered a clever way to avoid the hassles of adding or modifying database objects...

    Edit: I think it should be noted that whatever method Amazon is actually using, it doesn't involve SQL Server or any other ACID compliant RDBMS. https://www.quora.com/What-database-does-Amazon-use-to-store-product-information#bwTeT

    Again, it is a tool.  Used appropriately it is good.  Used inappropriately it is evil.  Just like any tool.

  • carlo.pagliei 63874 - Friday, November 30, 2018 12:57 AM

    Steve Jones - SSC Editor - Thursday, November 29, 2018 9:04 AM

    @Carlo P - I think my handling tends to be more consistent where if you have translations, you're a) putting one language ahead of others and b) means I need to handle translations differently than one language. I think both of those incur some technical debt, but at the expense of some (perhaps) more initial development.

    Mileage may vary. I think I'd be interested in what you choose, and really be interested in an article on how/why you did this if you're up for it.

    Steve
    Consider the following scenario: you operate in a video production house that sells series to different countries...a european customer buys one of your productions for broadcasting in 4 countries and requires you to package the files for portugal, italy, spain and france with proper subtitles, paybill and abstract, all localized in their respective languages. When configuring the package builder you say "use the X metadata in the Y language and put it in a file called abstract.txt". If I use your way I have to store language and "metadata name" and use them both in a where filter, in the other way I store two IDs and need a join with the lang_id... at first look seems to me more "normalized" and structured to bind the configuration to a unique ID because the ID is a precise reference to a metadata with a defined meaning... "metadata name" is something weaker, left to the user and may change over time as I need also to translate both the name and the value of the metadata (the name for the user ui, the value for both user ui and system operations).
    What do you think ?

    I think you always send in (and store) lang_id. You can have a default here, but you always do that, so that there isn't an exception if you have more localization occurring.

  • I once had a private hobby project where I wanted to create a database to automate creation of D&D scenarios.

    The requirements (that apply to this discussion) were:

    1. The user would be able to create "templates" for topics (such as a geographical area or political entity) that had a list of attributes they supplied.
    2. The value going into these templates would always be one or more paragraphs of text.
    3. Using these templates the user would enter an arbitrary number of the desired topic.
    4. The user would have the ability to link template topics into a hierarchy for A) trading to other users and B) extracting all details lower in the hierarchy for printing out information.

    Given those criteria, what other structures other than an EAV would solve the problem? There were a FEW additional issues that could be expressed via relational terms (such as aliases for the topic title) but for the most part the topics and their templates would be largely unknowable in advance, so normal relational design simply wasn't possible.

    After all, we're talking about a semi-structured database. It would certainly be possible (and more efficient) to create a relational database if the topics were known in advance but since they weren't (and couldn't possibly be!) that approach simply doesn't work.

    Another use of EAV is SQL Server itself. The tables and columns are an EAV! Certainly limited in scope and optimized to death, but an EAV design none the less.

    So EAV isn't evil so much as having very limited applicability. Like any tool (NVARCHAR(MAX), for example) you can't use it indiscriminately, but when you need it, nothing else will work.

  • Depending on the data access pattern, other database platforms like Azure Document DB, Azure Table storage, or Azure Key Vault are better optimized for entity-attribute-value or JSON/XML document implementations. When it comes to polygot database solutions, the focus should be on integration between multiple best choice databases and file systems, not stretching the standard SQL Server row storage engine beyond what it was originally intended for.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Yesterday I had an interesting talk with a guy that adopt the following solution: they post their db schema into a xml structure and have a parser that uses the xml and updates the database schema. When they need to add an attribute to a table, depending on the attribute typethey just add one or two columns to the xml-schema and then apply changes to the db . For querying they use the same xml-schema, once you need to query the entity they just parses the xml-schema and build query on the fly. They also said that building the parser and updater was a fairly simple task, compared to other problems they would have to face if they stayed to an EAV pattern.
    Hope that someone may find this an interesting approach... the thing that really impress is the fact that with this solution you address both the EAV pattern and the need to apply incremental changes to the db as the project proceeds.

  • carlo.pagliei 63874 - Saturday, December 8, 2018 2:47 AM

    Yesterday I had an interesting talk with a guy that adopt the following solution: they post their db schema into a xml structure and have a parser that uses the xml and updates the database schema. When they need to add an attribute to a table, depending on the attribute typethey just add one or two columns to the xml-schema and then apply changes to the db . For querying they use the same xml-schema, once you need to query the entity they just parses the xml-schema and build query on the fly. They also said that building the parser and updater was a fairly simple task, compared to other problems they would have to face if they stayed to an EAV pattern.
    Hope that someone may find this an interesting approach... the thing that really impress is the fact that with this solution you address both the EAV pattern and the need to apply incremental changes to the db as the project proceeds.

    https://azure.microsoft.com/en-us/services/storage/tables/

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • carlo.pagliei 63874 - Saturday, December 8, 2018 2:47 AM

    Yesterday I had an interesting talk with a guy that adopt the following solution: they post their db schema into a xml structure and have a parser that uses the xml and updates the database schema. When they need to add an attribute to a table, depending on the attribute typethey just add one or two columns to the xml-schema and then apply changes to the db . For querying they use the same xml-schema, once you need to query the entity they just parses the xml-schema and build query on the fly. They also said that building the parser and updater was a fairly simple task, compared to other problems they would have to face if they stayed to an EAV pattern.
    Hope that someone may find this an interesting approach... the thing that really impress is the fact that with this solution you address both the EAV pattern and the need to apply incremental changes to the db as the project proceeds.

    My concern there would be that they've made it so easy that someone may adopt the use of EAVs as a panacea (and it is patently NOT a panacea) for everything.  They also need to remember that DRI will continue to be a problem for EAVs and their near cousins, NVPs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jason A. Long - Thursday, November 29, 2018 2:11 PM

    1) Why is "WITHOUT adding a new column?" a reasonable restriction?
    If a particular attribute is indeed an attribute under the rules of normalization, why would you not add that column to the table?

    2) Maybe relation databases aren't your thing... Have you considered NoSQL  the databse where everything's made up and the points don't matter.

    Just had to let you know just much of a good laugh I got out of this...   Fabulously hilarious!   Still giggling somewhat....  and here it is an hour later.... (ate lunch in between reading the post and replying.)

  • sgmunson - Monday, December 17, 2018 10:30 AM

    Just had to let you know just much of a good laugh I got out of this...   Fabulously hilarious!   Still giggling somewhat....  and here it is an hour later.... (ate lunch in between reading the post and replying.)

    I still think that should be NoSQL's tag line...

  • EAVs can be a good way to scaffold an application until the users understand what they want as it gives ultimate flexibility on the data management side with 'relatively' little development cost, but as is already stated it comes with a performance and query complexity overhead. 

    Sequel Server is a RELATIONAL database; EAV attributes are a perfect candidate for a NoSQL DOCUMENT database where the data is stored as JSON documents and is basically unstructured (at least as enforced by the database). This means that the data validation has to be handled by the business application layer and parsing the data has an overhead, but it would in SQL as well. If you need to store loosely structured data, I would probably start with EAV but with the clear message to management that it is a stop-gap to create a minimum viable product and look to leverage a NoSQL solution or if you really need to keep it in SQL then as an XML or JSON data type along with the parent record; I would strongly advocate the use of XML as it is more easily read and can be run through a schema or DTD parser for validation, especailly as you have loosely related attributes and RW permissions.

  • aaron.reese - Tuesday, December 18, 2018 6:48 AM

    EAVs can be a good way to scaffold an application until the users understand what they want as it gives ultimate flexibility on the data management side with 'relatively' little development cost, but as is already stated it comes with a performance and query complexity overhead. 

    Sequel Server is a RELATIONAL database; EAV attributes are a perfect candidate for a NoSQL DOCUMENT database where the data is stored as JSON documents and is basically unstructured (at least as enforced by the database). This means that the data validation has to be handled by the business application layer and parsing the data has an overhead, but it would in SQL as well. If you need to store loosely structured data, I would probably start with EAV but with the clear message to management that it is a stop-gap to create a minimum viable product and look to leverage a NoSQL solution or if you really need to keep it in SQL then as an XML or JSON data type along with the parent record; I would strongly advocate the use of XML as it is more easily read and can be run through a schema or DTD parser for validation, especailly as you have loosely related attributes and RW permissions.

    Heh... I hope you don't mind a contrary opinion to that.  EAVs are bad enough as they are when they're a bad thing.  XML in SQL Server is terrible especially when it comes to performance and space requirements.  JSON is better at performance and only has half the space problem but it's still a problem and, like an EAV, neither have an ice cube's chance in hell of using DRI.  Someone that's good at either may disagree but I wouldn't compound one problem with another and it complicates use when the EAV is actually a good thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Entity-Attribute-Value tables are good for modeling operational configuration data; things like configuration strings, thresholds, and client implementation settings. The benefit is that it allows the application developers to easily create and deploy new data driven settings that work in tandem with the programming logic they are coding. Azure has a database and API called Table Storage that designed for EAV patterns, and then there is Azure Key Vault for securables like database connection strings, keys, and passwords, which is better than web.config.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell - Tuesday, December 18, 2018 7:45 AM

    Entity-Attribute-Value tables are good for modeling operational configuration data; things like configuration strings, thresholds, and client implementation settings. The benefit is that it allows the application developers to easily create and deploy new data driven settings that work in tandem with the programming logic they are coding. Azure has a database and API called Table Storage that designed for EAV patterns, and then there is Azure Key Vault for securables like database connection strings, keys, and passwords, which is better than web.config.

    Totally agreed on that use.  And it's pretty cool that the folks driving Azure realized that.  Do you know if, behind the scenes for that API, that they actually do use an EAV table or is it a clever disguise for XML or JSON behind the scenes?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Tuesday, December 18, 2018 8:00 AM

    Eric M Russell - Tuesday, December 18, 2018 7:45 AM

    Entity-Attribute-Value tables are good for modeling operational configuration data; things like configuration strings, thresholds, and client implementation settings. The benefit is that it allows the application developers to easily create and deploy new data driven settings that work in tandem with the programming logic they are coding. Azure has a database and API called Table Storage that designed for EAV patterns, and then there is Azure Key Vault for securables like database connection strings, keys, and passwords, which is better than web.config.

    Totally agreed on that use.  And it's pretty cool that the folks driving Azure realized that.  Do you know if, behind the scenes for that API, that they actually do use an EAV table or is it a clever disguise for XML or JSON behind the scenes?

    That's a good question, Jeff. Azure gets interesting when you peel back the layers of the onion. Table Storage is part of the Azure CosmosDB family, and my understanding is that there is this low level general purpose data format called atom-record-sequence that sits on top of Azure Storage. My take on it is that each of the Cosmos database types (ie: DocumentDB, Graph, Table Storage, ColumnFamily, etc.) stack their own meta-data layer on top of ARS to project a logical access view of the data and optimize their data access patterns. For example, what makes DocumentDB a "document database" and Azure Graph a "graph database" is largely a function of meta-data, specialized indexing, and the API. Microsoft documentation speaks little about ARS storage, but comments made by Azure gurus and mystics with inside knowledge about such things seem to suggest that ARS is sort of like JSON.

    .. Azure Cosmos DB as a multi-model, multi-API globally distributed database platform. Azure Cosmos DB natively supports multiple data models. The core type system of Azure Cosmos DB’s database engine is atom-record-sequence (ARS) based. Atoms consist of a small set of primitive types e.g. string, bool, number etc., records are structs and sequences are arrays consisting of atoms, records or sequences. The database engine of Azure Cosmos DB is capable of efficiently translating and projecting the data models onto the ARS based data model...

    https://azure.microsoft.com/en-us/blog/a-technical-overview-of-azure-cosmos-db/

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I don't think we should shape and force a RDMS like SQL Server to act something like a EAV or document database. We've got more data storage options inside SQL Server today than we did 10 years ago. Having both RowStore and ColumnStore living side by side in SQL Server can make a lot of sense. But really, if you're wanting to store data as documents, blobs, key values, or graph relationships, there are other database platforms better suited for the purpose.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 31 through 45 (of 47 total)

You must be logged in to reply to this topic. Login to reply