How to design a "Tagging" system

  • You've seen those websites where the users add tags to topics, then the tags get displayed alongside the topic with the font-size proportional to the number of tags. Its used in many places these days instead of assigning a fixed category to items. e.g. http://www.last.fm/charts/toptags

    I want implement a simple variety of this. For simplicity I'm starting with a User table, Author and Book tables and an AuthorBookLink table. Instead of giving each book a single category, a tagging system is to be applied on top of this existing schema. Multiple users can apply any number of tags to any book and any number of tags to any author. I'm only looking for a database schema that will easily support the requirement.

    I could probably come up with a working database design fairly easily, but a littke investigation reveals much complexity, and I'm bound to miss something so before doing so I've googled for ideas. I haven't found anything useful - try putting 'tag database sql schema' in any permutation in to a search engine - the results are not so useful. So If anyone knows of a link to examples of designing such a tagging system then please post them here. Book references would help too.

  • Well, this is how I would start; The straight-forward relational way:

    /*

    User table, Author and Book tables and an AuthorBookLink table.

    Instead of giving each book a single category, a tagging system

    is to be applied on top of this existing schema. Multiple users

    can apply any number of tags to any book and any number of tags

    to any author. I'm only looking for a database schema that will

    easily support the requirement.

    */

    --======= Tables for the Principal Objects:

    Create Table Users(

    UserId int Identity(1,1)

    , FullName varchar(32)

    , CONSTRAINT Users_PK_1 Primary Key (UserID)

    )

    GO

    Create Table Authors(

    AuthorID int Identity(1,1)

    , FullName varchar(32)

    , CONSTRAINT Authors_PK_1 Primary Key (AuthorID)

    )

    GO

    Create Table Books(

    BookID int Identity(1,1)

    , BookName varchar(64)

    , CONSTRAINT Books_PK_1 Primary Key (BookID)

    )

    GO

    Create Table BookAuthors(

    BookID int

    , AuthorID int

    , CONSTRAINT BookAuthors_PK_1 Primary Key (BookID, AuthorID)

    --NOTE: Needs another index with AuthorID first.

    )

    GO

    --======Tables to support Tags:

    Create Table Tags(

    TagID int Identity(1,1)

    , Label varchar(24)

    , CONSTRAINT Tags_PK_1 Primary Key (TagID)

    )

    GO

    Create Table BookTagUsers(

    BookID int

    , TagID int

    , UserID int

    , CONSTRAINT Tags_PK_1 Primary Key (BookID, TagID, UserID)

    --NOTE: Needs another index with (TagID, BookID, UserID) and (UserID, BookID)

    )

    GO

    Create Table AuthorTagUsers(

    AuthorID int

    , TagID int

    , UserID int

    , CONSTRAINT Tags_PK_1 Primary Key (AuthorID, TagID, UserID)

    --NOTE: Needs another index with (TagID, AuthorID, UserID) and (UserID, AuthorID)

    )

    GO

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • The obvious supporting views:

    --====== add some views

    Create View vwBookTags as

    Select BookID

    , TagID

    , Min(Tags.Label)

    , Count(UserID) as TagCount

    From BookTagUsers

    Join Tags ON BookTagUsers.TagID = Tags.TagID

    Group By BookId, TagID

    GO

    Create View vwAuthorTags as

    Select AuthorID

    , TagID

    , Min(Tags.Label)

    , Count(UserID) as TagCount

    From AuthorTagUsers

    Join Tags ON AuthorTagUsers.TagID = Tags.TagID

    Group By AuthorId, TagID

    GO

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks Barry

    I had thought about separate Tags tables originally: BookTags and AuthorTags - but a single Tag table does simplify things. I'd wondered on how to prevent the same user applying the same tag twice - your constraints does the trick well.

    I'd also thought about extensibility - e.g. what if the site wanted to do CDs or dentists as well (to pick examples enterely at random). Then I'd need a CDTagUsers table or a DentistTagUsers table, so can you see any merit in having a generic UserTags Table

    CREATE TABLE UserTags (

    TagID INT,

    UserID INT,

    TagTableID INT,

    CONSTRAINT UserTags_PK PRIMARY KEY(TagID, TagTableId,UserID)

    )

    But of course, then you need a lookup to TagTable.

    CREATE Table TagTableLookup (

    TagTableID INT IDENTITY(1,1) PRIMARY KEY,

    TableName sysname)

    Can you think of any potential pitfalls?

  • Tom Brown (8/14/2008)


    Then I'd need a CDTagUsers table or a DentistTagUsers table, so can you see any merit in having a generic UserTags Table

    Functionally, its only benefit is if you want to return everything associated with a tag all moshed together is some kind of common order.

    Operationally, it does cut down on your design and maintenance overhead (less tables), but you tend to lose that in the code complexity when you have to go from the Tag to the Items, especially if there are multiple types being returned together. And it is generally faster to keep them apart.

    I wouldn't consider putting them all together until I had more than 6 different types, and was pretty sure that I was going to reach 10 or more someday soon. And then I would actually want to look at a GenericItems table with the more common properties and then hang type-specific properties in seperate tables from that (BookProperties, AuthorProperties, CDProperties, etc.)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply