Concatenation

  • Hi

    I have a table of staff with uniqueid, name, job.  Each member of staff with a second or third job exists as a separate row.

    I need to create a view with each distinct staff member with a concatenated field showing all jobs done by this person.

    So:

    12345, Jane Smith, Accounts Clerk - Store Assistant - Input Clerk

    Instead of showing three rows, the jobs are shown in one field.

    How can I achieve this?

    Paul

     

  • It's quite messy to do in Transact SQL - however possible, but ugly as ****

    Though, I'm not too sure that it would play well with a view...

    The preferred place to do this kind of juggling is at the client.. Any hopes of doing that instead?

    /Kenneth

  • Thanks, Kenneth

    I need to give a colleague access to this, so would prefer either a view, or a table which I could regularly update from a DTS package.

    Paul

  • Ok, so it's more like a relatively static report-thingy then?

    Here's one way to prep a flattened table, though you're aware that your current table seem to be in pretty 'bad' shape normalization-wise..?

    (if I understood the original table DDL)

    create table #x

    ( id int not null, name varchar(20), job varchar(20) not null )

    insert #x select 12345, 'Jane Smith', 'Accounts Clerk'

    insert #x select 12345, 'Jane Smith', 'Store Assistant'

    insert #x select 12345, 'Jane Smith', 'Input Clerk'

    declare @job varchar(255)

    set @job = ''

    -- get job string

    update  #x

    set  @job = @job + job + ' - '

    where  id = 12345

    select  distinct

     id, name, left(@job, len(@job) -2)

    from  #x

    where id = 12345

    go

    id          name                                                               

    ----------- -------------------- -----------------------------------------------

    12345       Jane Smith           Accounts Clerk - Store Assistant - Input Clerk

    (1 row(s) affected)

    This method involves some method of looping or iterating over each unique empId. That's one of the reasons this kind of pivoting can be bad for the server if traffic is high... (and you can't use it in a view either)

    Though... soon SQL Server 2005 will come with a PIVOT operator..

    /Kenneth

  • Hi Kenneth

    Yes it would seem a pretty awful table.  It is really a reporting table from the main Oracle database - brought into SQL Server.

    Thanks for your solution.  I'm amazed that this works.  Well I mean the following bit:

    set @Job = ''

    -- get job string

    update  #x

    set  @Job = @Job + job + ' - '

    where  id = 12345

    I can't quite work out what is happening here.  I understand that you are creating a variable @Job - but can't understand why it captures all the jobs - however many I enter.  The temp table #x isn't actually being updated (a select query shows this) - so what exactly is the logic here?

    Thanks again

    Paul

  • You set the variable @Job to an empty string, then you concatenate all the values that the select will return into that variable, then you can print/return/select the variable.

  • This is an exemple of how this trick can be converted to a function and then used in a select statement. But I agree with Kenneth that said that this work should be done on the client whenever possible.

    if object_id('fnConcatenate') > 0

    drop function fnConcatenate

    GO

    CREATE FUNCTION [dbo].[fnConcatenate] (@id as int)

    RETURNS varchar(7000) AS

    BEGIN

    DECLARE @Items AS VARCHAR(7000)

    SET @Items = ''

    SELECT @Items = @Items + Name + ' - ' FROM dbo.SysColumns WHERE id = @id

    SET @Items = LEFT(@Items, ABS(DATALENGTH(@Items) - 3))

    RETURN @Items

    END

    GO

    Select Name from dbo.SysColumns where id = Object_id('SysObjects')

    GO

    Select dbo.fnConcatenate (Object_id('SysObjects')) as Columns

    Select name, dbo.fnConcatenate (id) as Columns from dbo.SysObjects where XType = 'U' order by name

    GO

    DROP FUNCTION fnConcatenate

  • Yes, the update syntax may be a bit deluding, but it's the only (afaik) syntax that is supported for this 'trick'. Though, I haven't actually confirmed how supported it is, so I'll have to say 'use it at your own risk'.

    Anyways, the basic structure is that you assign

    @variable = @variable + column

    from <constant>

    The init of @variable to an empty string is just to not be bothered with if a null is encountered, the entire string may be set to null.

    /Kenneth

  • Another variation on the same algorithm uses "coalesce" as follows:

    create table #x

    ( id int not null, MyName varchar(20), job varchar(20) not null )

    insert #x select 12345, 'Jane Smith', 'Accounts Clerk'

    insert #x select 12345, 'Jane Smith', 'Store Assistant'

    insert #x select 12345, 'Jane Smith', 'Input Clerk'

    declare @job varchar(255)

    select @job = COALESCE (@job + ' - ', '') + job

      from #x

    where id = 12345

    select distinct Myname, @job as MyJobs

    from #x

    where id = 12345

    drop table #x

     

    G'day all

    Wayne

  • Absolutely terrific.  I now have something which works brilliatly.

    Many thanks to you all

    Paul

  • I’m curious as to how those who have posted comments stating that the current implementation of the Employee-Job relationship is a bad would change the database or do differently so as to store this data in a better way then the author of the post is using?  I personally do not believe there is a better way to set up this employee-role relationship.  Of the methods I have seen used to deal with the Employee-Role relationship none are capable of listing all roles of an employee without having to flatten out the data using a similar technique as listed here by others or having some other negative tradeoff in order to avoid flattening the data.  The following is a list of the 3 techniques I have seen implemented to deal with storing Employee-Role data in a relational database.  Each has it’s pro and cons and I do not see any in which one can avoid having to flatten out the data or having to deal with other issues such as those associated with non-normalized data.

     

    Technique #1 – Employee-Role tables.  In this technique there is an EMPLOYEE table that stores data about an employee and a table called ROLE that stores 1 record for each role or task that the employee iin the EMPLOYEE table is assigned.  This method is normalized however if one wants to see a single record for each employee that contains a list of the roles that the employee is assigned then the data has to be flattened using a method similar to the one listed in an early post.  This is where you concatenate values from multiple records like this:

     

    SET @sVariable = @sVariable + ‘delilmeter’ + T.Column

    FROM TABLE T

     

    Technique #2 – Employee table w/Multiple Role columns.  In this setup there is a single table called EMPLOYEE.  This table has multiple columns, each storing a single role that the employee is linked to.  In this example you no longer need to flatten the data to get a list of the employees roles in a single record because you can now simply concatenate the values from each column. The problem with this solution is that while it is easy to add all the roles/tasks together in a single field you have some seriously non-normalized data.  Each employee is not allotted a fixed number of roles.  If an employee is assigned more roles then available columns then the table must be altered to add more columns or some other method must be employed if the employee is to be assigned more roles.   Even if the number of roles per employee is not an issue the non-normalized data in this case makes other types of queries very ugly and highly inefficient.  If you need a query that lists each employee who is assigned a specific role your query must check the value in each column that contains a role. 

     

    Technique #3 – Employee table with single column that contains multiple roles.  In this setup the EMPLOYEE table has but a single column for storing role information. This column contains a list of roles that the employee is assigned.  Just as with technique #2 you now have a setup in which obtaining a list of roles per employee is very easy because the data is already flattened.  However you are also looking at seriously non-normalized data that will have other problems to deal with.   If you need to issue a query that lists all employees of a specific role you must now use the Like operator against the column that contains the role data.  The Like operator in and of itself is not so much the problem as is the fact that you must use it with the % search character at the beginning like this Like’%roleName%’.  When the Like operator is used with the % search character at the beginning the query processor is unable to use any index.  It must perform a table scan regardless of what indexes exist and the size of the data.  This results in a highly inefficient search.

     

    I would really like to know of an alternative method then those listed above to handle this type of relationship and so if anyone knows of another method for setting up the Employee-Role relationship that is different from these 3 listed above and also different from the method that the original poster is using then please post that in this thread.  I am always looking to expand my understanding of SQL and if someone knows of a way to setup the Employee-Role relationship that allows for not having to flatten out the data to get a single row of the roles the employee belongs to and yet not suffer the same setbacks and problems that techniques #2 & 3 above suffer form then I would be very grateful if you would share that information in this thread.

     

    Thanks

    Ed

     

    Kindest Regards,

    Just say No to Facebook!
  • Normalization is done to ensure consistency and prevent your data from behaving unexpectedly, and possibly even show you things that doesn't even exist. (yes, it's possible if you're creative enough)

    Normalization does not pay any attention to display issues - on the other hand, such issues may very well be reasons to de-normalize in the real world. There are however tradeoffs, and that's where the balance comes in. You (the designer) have to make the calls about what's most important - 'proper' normalization, or less complex code.

    In the example where you have data like:

    12345, 'Jane Smith', 'Accounts Clerk'

    12345, 'Jane Smith', 'Store Assistant'

    12345, 'Jane Smith', 'Input Clerk'

    ...the problem here is how do you guarantee that each empId of 12345 has the exact same spelling of 'Jane Smith'? And how do you make sure that 'Jane Smith' doesn't have more than one of the same job? It can be done, but requires much work and adds a great deal of overhead. And there are other issues as well.

    If you decide to change a job description, it must be changed in mutliple places. If an emp changes name, it must be changed in multiple places. If you want to retrieve other data (than the job) that relates to empId from other tables, you're going to have duplicate rows returned, unless taken care of. (ie distinct or group by can be probably be used, though it adds to overhead and hurts performance and scalability)

    If... and it will some alternate spelling slips through, or job-dupes are entered, you instantly have inconsistent data. Normalization is a way to prevent this.

    Of your examples, #1 is the way to do it, #2 and #3 are really bad methods. When used, they cause more grief than anyone would like to have in a lifetime

    For this example, you have two entities, employees a jobs, where an employee can have zero or many jobs, and a job can be held by zero or many employees - a classic many-to-many relationship.

    The least complicated way to model this is with a employee table, a job table and a employee_job relations table in between, which only holds the empId + jobId combo as a PK (and each as FK to it's respective parent).

    You will have considerably less convoluted code with such a model and a much easier time to keep data accurate at all times.

    in the end, 'it depends' though - some rules are broken, just be sure that you break them on purpose and also know the cost for breaking them

    /Kenneth

     

  • Kenneth, #1 is the technique that I was also recomending.  I was just also pointing out the crwabacks to it; flattening the data.  In the long ruin this is just something for which there is no answer that wil not have some downside.  This is because the data is stored in a relational manner yet the reporting need is non-relational.  I was hoping that some of the posters that stated that the method being used currently is not the best method would provide input as to what method would be best to handle this type of setup.

    Kindest Regards,

    Just say No to Facebook!
  • I believe that the best solution to any given problem is to use the correct tool for said problem. There are (as of yet) no single product/system that's suitable for solving every possible need imagined.

    As for the needs of reporting... RDBMS systems are not designed for reporting needs, their purpose is to maintain your data, nothing else. If the 'problem' is reporting, then you should use a reporting tool for that, not fall in the trap of trying to make the RDBMS act like something it's not ment to be.

    /Kenneth

  • Hi

    As the original poster I have been fascinated by the debate which has ensued.  I never expected this outcome.

    I would like to point out again that the tables in the main Oracle database are fully normalized.  I draw down tables for read-only reporting purposes onto a SQL Server - and this is why the table contains all the names and all of the jobs.

    Normalization is imperative for maintaining consistent data.  This consistent data is then brought into SQL Server in an un-normalized manner.  It all works brilliantly for reporting - as long as somebody can show you a clever way to concatenate.

    Many thanks

    Paul

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply