Redundant Columns

  • I have an album table. Each album MUST have a userid.

    I have an image table. Each image must have a albumid.

    I have an comment table. Each comment object MUST have an imageid.

    Should I add albumid to image table and imageid, albumid to comment table? I'll be doing all sorts of queries...for an example, i would want to get the count of comments for an album or an image etc.

    Which one would perform faster?

    SELECT * FROM image WHERE EXISTS (SELECT albumid FROM album WHERE userid = 1)

    VS

    SELECT * FROM image WHERE userid = 1;

    Table design:

    [album]

    id

    name

    [album_image]

    id

    albumid

    userid

    name

    [album_image_comment]

    id

    imageid

    albumid

    userid

    commentorid

    Also, is my naming convention good?

  • [Should I add albumid to image table and imageid, albumid to comment table]

    This depends on the normalisation of your database. What is the association between the different tables i.e. 1 album to many images; 1 image to many comments etc ?

    for your second question, there is no need to have a subquery since you have a UserID column in your image table. just use that one

    your naming convention seems ok, except probably the table names e.g. [album_image_comment] , could have been comments,but then again there are no hard and fast rules for this. 


    Everything you can imagine is real.

  • Albums can have many images, an image can have many comments.

    In the front-end, theres going to be displays for all comments for albums, image count for user, image count for album, comment count for album, comment count for image.

    So do I need to place userid,albumid,imageid in comments table etc? Because for the front-end, I'll be querying by those...I could use subqueries.

  • You can store all those columns, but the question is, why would you need to? For example, I've done exactly what you're laying out because we found that all our access was going to be done hiearchically (sp?) and the joins between all the tables benefited from having compound primary keys composed of the key of the parent tables, clustered. Huge performance win for a bit of denormalization. But, that was a very targeted approach. Let's say that the access methods are different depending on how things go such that you'll never need the AlbumId for a healthy percentage of your queries. Then, storing that everywhere is just a waste of time with no real benefit. Even worse if you followed my example and clustered based on the hierarchy of data such that the leading edge of the key was always the AlbumId.

    So, the short answer, it depends.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • So you have a one-to-many relation between users and albums, a one-to-many relation between albums and images, and a one-to-many relation between images and comments.

    A pure normalized database design is:

    - store userid in the albums table

    - store albumid in the images table

    - store imageid in the comments table

    The majority of queries (especially the interactive, "browse"-like ones where responsetime really matters) will be hierarchical: you will know the parent id.

    If you need some extra punch for reporting and bulk queries, you can create a view on a table that includes a redundant column and add an index to that.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply