finding sub-sets of super-sets

  • Guys,

    One of our systems has Role Based Access Control that is extremely granular - there are about 400 permissions, each of which can be revoked (NULL), read-only, amend or Full Access
    These are grouped into Roles but there is significant overlap as the roles setting process has been abused over the years
    Each user is then assigned one or more roles and may get duplicated access the same or higher privileges based on these multiple roles.

    the data is thankfully in 3NF with a permissions table (PermissionID, Description), a Roles table (RoleID, Description), RolePermissions (RPID, RoleID, PermissionID, Permission level), Users (UserID,Name) and UserRoles (URID,UserID,RoleID)

    I have been tasked with sorting this out and the first step is to identify duplicated roles - This should be fairly easy, I will convert to a single string using the STUFF..FOR XML technique and compare strings 🙂
    Next is to find roles that are sub-sets of higher roles - I don't think I can use the same technique becasuse I may be looking at different parts of the string for the super-set
    Thirdly will be to find roles with significant overlap or very similar permissions (e.g. same granular entries, but slightly different elevation levels) and I would also need to introduce some confidence scoring in this process.

    I am using SQL2008R2 but have posted here as this is a more active board.  I have access to 2012 if necessary but would have to pull the data into a separate analysis database to do this if the solution is 2012 only.

    I am not necessarily looking for code, more of a strategy to approach the problem - There are some simple things I can do like count the number of permission records in a role and by definition if there are more permissions in this role than that role, then this role CANNOT be a sub-set but doing some sort of cursor based bubble compare feels like a bad idea even though I am only going to need to run the code once in a blue moon mygut reaction is that is will run like a dog.

  • There are a lot of variables for determining how to handle something like this.  It sounds like at least one of your tables has a recursive self-join involved.  I generally find duplicates with an aggregation statement something like this:

    SELECT col1, col2, col3 FROM table GROUP BY col1,col2,col3 HAVING COUNT(1) > 1;

    If you have recursive joins, you'll have to use CTE's to flatten out the table and then do the above type logic to find duplicates that exist in the sense that a child permission is superseded by a parent permission.

  • One way to compare a given role with another one is to use EXCEPT.   The trouble is that you have to do it both ways, meaning SELECT * FROM ROLE1_DATA EXCEPT SELECT * FROM ROLE_2_DATA as well as SELECT * FROM ROLE2_DATA EXCEPT SELECT * FROM ROLE1_DATA.   In combination with your counts of rows, this might be able to help somewhat.

    EDIT: also be sure to look at using INTERSECT, as that will show you what the two sets have in common.  If that count is the same as the count for both roles, you have a total match.   Same is true if both EXCEPT queries return 0 rows.

  • Thanks guys,

    @bartedgerton-2, no recursive self joins - it might have been easier if there were as then the nesting would  be enforced by the application.

    @sgmuson - good idea - I'll look and see whether that approach is feasible and post back for reference.  It should help find exacts and full-subsets and I might also be able to 'double count' for entries and permission levels and calculate a percentage confidence limit for large overlaps

    Next challenge will be how to present the data, I might have 5 roles that are exactly the same with another 3 roles that are sub-sets of those 5 roles and 20 more that match 80% or more on the 3 roles and %70 on the upper level role

    What to propose then about breaking down the roles based permissions.... decisions, decisions...

  • aaron.reese - Tuesday, October 3, 2017 1:18 PM

    Thanks guys,

    @bartedgerton-2, no recursive self joins - it might have been easier if there were as then the nesting would  be enforced by the application.

    @sgmuson - good idea - I'll look and see whether that approach is feasible and post back for reference.  It should help find exacts and full-subsets and I might also be able to 'double count' for entries and permission levels and calculate a percentage confidence limit for large overlaps

    Next challenge will be how to present the data, I might have 5 roles that are exactly the same with another 3 roles that are sub-sets of those 5 roles and 20 more that match 80% or more on the 3 roles and %70 on the upper level role

    What to propose then about breaking down the roles based permissions.... decisions, decisions...

    I'm guessing that any kind of categorization is likely to end up "data dependent" instead of something you can do before any preliminary analysis.   My gut says abuse tends to put things all over the map, so to speak.  However, I'd be likely to want to orient my analysis towards the amount of duplication, but how to categorize it would be rather data dependent.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply