Concatenation

Question

Post reply

Concatenation

Paul Sp

SSCrazy

Points: 2314
More actions
April 13, 2005 at 4:19 am

#74906

Hi

I have a table of staff with uniqueid, name, job. Each member of staff with a second or third job exists as a separate row.

I need to create a view with each distinct staff member with a concatenated field showing all jobs done by this person.

So:

12345, Jane Smith, Accounts Clerk - Store Assistant - Input Clerk

Instead of showing three rows, the jobs are shown in one field.

How can I achieve this?

Paul

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 1

Kenneth Wilhelmsson

SSC-Dedicated

Points: 30043

April 13, 2005 at 5:10 am

#552358

It's quite messy to do in Transact SQL - however possible, but ugly as ****

Though, I'm not too sure that it would play well with a view...

The preferred place to do this kind of juggling is at the client.. Any hopes of doing that instead?

/Kenneth

Paul Sp SSCrazy Points: 2314 More actions · Answer 2

Paul Sp

SSCrazy

Points: 2314

April 13, 2005 at 5:21 am

#552362

Thanks, Kenneth

I need to give a colleague access to this, so would prefer either a view, or a table which I could regularly update from a DTS package.

Paul

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 3

Ok, so it's more like a relatively static report-thingy then?

Here's one way to prep a flattened table, though you're aware that your current table seem to be in pretty 'bad' shape normalization-wise..?

(if I understood the original table DDL)

create table #x

( id int not null, name varchar(20), job varchar(20) not null )

insert #x select 12345, 'Jane Smith', 'Accounts Clerk'

insert #x select 12345, 'Jane Smith', 'Store Assistant'

insert #x select 12345, 'Jane Smith', 'Input Clerk'

declare @job varchar(255)

set @job = ''

-- get job string

update #x

set @job = @job + job + ' - '

where id = 12345

select distinct

id, name, left(@job, len(@job) -2)

from #x

where id = 12345

go

id name

----------- -------------------- -----------------------------------------------

12345 Jane Smith Accounts Clerk - Store Assistant - Input Clerk

(1 row(s) affected)

This method involves some method of looping or iterating over each unique empId. That's one of the reasons this kind of pivoting can be bad for the server if traffic is high... (and you can't use it in a view either)

Though... soon SQL Server 2005 will come with a PIVOT operator..

/Kenneth

Paul Sp SSCrazy Points: 2314 More actions · Answer 4

Hi Kenneth

Yes it would seem a pretty awful table. It is really a reporting table from the main Oracle database - brought into SQL Server.

Thanks for your solution. I'm amazed that this works. Well I mean the following bit:

set @Job = ''

-- get job string

update #x

set @Job = @Job + job + ' - '

where id = 12345

I can't quite work out what is happening here. I understand that you are creating a variable @Job - but can't understand why it captures all the jobs - however many I enter. The temp table #x isn't actually being updated (a select query shows this) - so what exactly is the logic here?

Thanks again

Paul

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Ninja's_RGR'us

SSC Guru

Points: 294069

April 13, 2005 at 6:22 am

#552381

You set the variable @Job to an empty string, then you concatenate all the values that the select will return into that variable, then you can print/return/select the variable.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

This is an exemple of how this trick can be converted to a function and then used in a select statement. But I agree with Kenneth that said that this work should be done on the client whenever possible.

if object_id('fnConcatenate') > 0

drop function fnConcatenate

GO

CREATE FUNCTION [dbo].[fnConcatenate] (@id as int)

RETURNS varchar(7000) AS

BEGIN

DECLARE @Items AS VARCHAR(7000)

SET @Items = ''

SELECT @Items = @Items + Name + ' - ' FROM dbo.SysColumns WHERE id = @id

SET @Items = LEFT(@Items, ABS(DATALENGTH(@Items) - 3))

RETURN @Items

END

GO

Select Name from dbo.SysColumns where id = Object_id('SysObjects')

GO

Select dbo.fnConcatenate (Object_id('SysObjects')) as Columns

Select name, dbo.fnConcatenate (id) as Columns from dbo.SysObjects where XType = 'U' order by name

GO

DROP FUNCTION fnConcatenate

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 7

Yes, the update syntax may be a bit deluding, but it's the only (afaik) syntax that is supported for this 'trick'. Though, I haven't actually confirmed how supported it is, so I'll have to say 'use it at your own risk'.

Anyways, the basic structure is that you assign

@variable = @variable + column

from <constant>

The init of @variable to an empty string is just to not be bothered with if a null is encountered, the entire string may be set to null.

/Kenneth

Wayne Lawton Hall of Fame Points: 3532 More actions · Answer 8

Another variation on the same algorithm uses "coalesce" as follows:

create table #x

( id int not null, MyName varchar(20), job varchar(20) not null )

insert #x select 12345, 'Jane Smith', 'Accounts Clerk'

insert #x select 12345, 'Jane Smith', 'Store Assistant'

insert #x select 12345, 'Jane Smith', 'Input Clerk'

declare @job varchar(255)

select @job = COALESCE (@job + ' - ', '') + job

from #x

where id = 12345

select distinct Myname, @job as MyJobs

from #x

where id = 12345

drop table #x

G'day all

Wayne

Paul Sp SSCrazy Points: 2314 More actions · Answer 9

Paul Sp

SSCrazy

Points: 2314

April 13, 2005 at 8:23 am

#552426

Absolutely terrific. I now have something which works brilliatly.

Many thanks to you all

Paul

YSLGuru SSC-Insane Points: 21569 More actions · Answer 10

I’m curious as to how those who have posted comments stating that the current implementation of the Employee-Job relationship is a bad would change the database or do differently so as to store this data in a better way then the author of the post is using? I personally do not believe there is a better way to set up this employee-role relationship. Of the methods I have seen used to deal with the Employee-Role relationship none are capable of listing all roles of an employee without having to flatten out the data using a similar technique as listed here by others or having some other negative tradeoff in order to avoid flattening the data. The following is a list of the 3 techniques I have seen implemented to deal with storing Employee-Role data in a relational database. Each has it’s pro and cons and I do not see any in which one can avoid having to flatten out the data or having to deal with other issues such as those associated with non-normalized data.

Technique #1 – Employee-Role tables. In this technique there is an EMPLOYEE table that stores data about an employee and a table called ROLE that stores 1 record for each role or task that the employee iin the EMPLOYEE table is assigned. This method is normalized however if one wants to see a single record for each employee that contains a list of the roles that the employee is assigned then the data has to be flattened using a method similar to the one listed in an early post. This is where you concatenate values from multiple records like this:

SET @sVariable = @sVariable + ‘delilmeter’ + T.Column

FROM TABLE T

Technique #2 – Employee table w/Multiple Role columns. In this setup there is a single table called EMPLOYEE. This table has multiple columns, each storing a single role that the employee is linked to. In this example you no longer need to flatten the data to get a list of the employees roles in a single record because you can now simply concatenate the values from each column. The problem with this solution is that while it is easy to add all the roles/tasks together in a single field you have some seriously non-normalized data. Each employee is not allotted a fixed number of roles. If an employee is assigned more roles then available columns then the table must be altered to add more columns or some other method must be employed if the employee is to be assigned more roles. Even if the number of roles per employee is not an issue the non-normalized data in this case makes other types of queries very ugly and highly inefficient. If you need a query that lists each employee who is assigned a specific role your query must check the value in each column that contains a role.

Technique #3 – Employee table with single column that contains multiple roles. In this setup the EMPLOYEE table has but a single column for storing role information. This column contains a list of roles that the employee is assigned. Just as with technique #2 you now have a setup in which obtaining a list of roles per employee is very easy because the data is already flattened. However you are also looking at seriously non-normalized data that will have other problems to deal with. If you need to issue a query that lists all employees of a specific role you must now use the Like operator against the column that contains the role data. The Like operator in and of itself is not so much the problem as is the fact that you must use it with the % search character at the beginning like this Like’%roleName%’. When the Like operator is used with the % search character at the beginning the query processor is unable to use any index. It must perform a table scan regardless of what indexes exist and the size of the data. This results in a highly inefficient search.

I would really like to know of an alternative method then those listed above to handle this type of relationship and so if anyone knows of another method for setting up the Employee-Role relationship that is different from these 3 listed above and also different from the method that the original poster is using then please post that in this thread. I am always looking to expand my understanding of SQL and if someone knows of a way to setup the Employee-Role relationship that allows for not having to flatten out the data to get a single row of the roles the employee belongs to and yet not suffer the same setbacks and problems that techniques #2 & 3 above suffer form then I would be very grateful if you would share that information in this thread.

Thanks

Ed

Kindest Regards,

Just say No to Facebook!

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 11

Normalization is done to ensure consistency and prevent your data from behaving unexpectedly, and possibly even show you things that doesn't even exist. (yes, it's possible if you're creative enough)

Normalization does not pay any attention to display issues - on the other hand, such issues may very well be reasons to de-normalize in the real world. There are however tradeoffs, and that's where the balance comes in. You (the designer) have to make the calls about what's most important - 'proper' normalization, or less complex code.

In the example where you have data like:

12345, 'Jane Smith', 'Accounts Clerk'

12345, 'Jane Smith', 'Store Assistant'

12345, 'Jane Smith', 'Input Clerk'

...the problem here is how do you guarantee that each empId of 12345 has the exact same spelling of 'Jane Smith'? And how do you make sure that 'Jane Smith' doesn't have more than one of the same job? It can be done, but requires much work and adds a great deal of overhead. And there are other issues as well.

If you decide to change a job description, it must be changed in mutliple places. If an emp changes name, it must be changed in multiple places. If you want to retrieve other data (than the job) that relates to empId from other tables, you're going to have duplicate rows returned, unless taken care of. (ie distinct or group by can be probably be used, though it adds to overhead and hurts performance and scalability)

If... and it will some alternate spelling slips through, or job-dupes are entered, you instantly have inconsistent data. Normalization is a way to prevent this.

Of your examples, #1 is the way to do it, #2 and #3 are really bad methods. When used, they cause more grief than anyone would like to have in a lifetime

For this example, you have two entities, employees a jobs, where an employee can have zero or many jobs, and a job can be held by zero or many employees - a classic many-to-many relationship.

The least complicated way to model this is with a employee table, a job table and a employee_job relations table in between, which only holds the empId + jobId combo as a PK (and each as FK to it's respective parent).

You will have considerably less convoluted code with such a model and a much easier time to keep data accurate at all times.

in the end, 'it depends' though - some rules are broken, just be sure that you break them on purpose and also know the cost for breaking them

/Kenneth

YSLGuru SSC-Insane Points: 21569 More actions · Answer 12

Kenneth, #1 is the technique that I was also recomending. I was just also pointing out the crwabacks to it; flattening the data. In the long ruin this is just something for which there is no answer that wil not have some downside. This is because the data is stored in a relational manner yet the reporting need is non-relational. I was hoping that some of the posters that stated that the method being used currently is not the best method would provide input as to what method would be best to handle this type of setup.

Kindest Regards,

Just say No to Facebook!

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 13

I believe that the best solution to any given problem is to use the correct tool for said problem. There are (as of yet) no single product/system that's suitable for solving every possible need imagined.

As for the needs of reporting... RDBMS systems are not designed for reporting needs, their purpose is to maintain your data, nothing else. If the 'problem' is reporting, then you should use a reporting tool for that, not fall in the trap of trying to make the RDBMS act like something it's not ment to be.

/Kenneth

Paul Sp SSCrazy Points: 2314 More actions · Answer 14

Hi

As the original poster I have been fascinated by the debate which has ensued. I never expected this outcome.

I would like to point out again that the tables in the main Oracle database are fully normalized. I draw down tables for read-only reporting purposes onto a SQL Server - and this is why the table contains all the names and all of the jobs.

Normalization is imperative for maintaining consistent data. This consistent data is then brought into SQL Server in an un-normalized manner. It all works brilliantly for reporting - as long as somebody can show you a clever way to concatenate.

Many thanks

Paul