Split CSV file - this is different

  • I have a CSV string that is already in SQL server as a single field.  I did other manipulation to the string, and now need to split it into fields.

     

    Example #1

    The data has the same number of commas.  Lets say there are 12 fields, some are populated, others are not.  There are a lot of blank fields.

    I need to do this with a stored procedure.

    DATA:

    Ed,Tom,1/1/2000,,,,,,,,,,

    Tom,Jones,,dog,,,,,,,,,

    Paul,,,,,,,,,,,red,

    Steve,,,,,,,,,,,,

     

    Example #2

    Same as above, but now there are two record types, type 1 and 2.  The 1 and 2 are in the record.  Type 1 has 12 commas, and type 2 has 5 commas.

    DATA

    1,Ed,Tom,1/1/2000,,,,,,,,,,

    1,Tom,Jones,,dog,,,,,,,,,

    1,Paul,,,,,,,,,,,red,

    1,Steve,,,,,,,,,,,,

    2,big,A,,,

    2,small,A,,,

    2,,,,,

     

  • charindex or patindex can help you find the commands and then substring can help you split them out.

    I wrote a series on working with strings that might help you

    http://qa.sqlservercentral.com/columnists/sjones/tamestrings2.asp

    http://qa.sqlservercentral.com/columnists/sjones/tamestrings1.asp

  • Hi,

    You could bcp out the data, then bcp/BULK INSERT it back in again.


    You must unlearn what You have learnt

  • I use the following stored procedure for cases similar to yours. It returns the "next" (leftmost) token, then strips that token from the source string (@s). In the loop example, I loop until the source string is empty.

    The example uses a loop, but you can also just repeatedly call the SP with different variables for the token, as in the second example.

    (I've also written a UDF version of this that preserves the source string)

     

    CREATE PROCEDURE uspGetToken

    (

      @parm varchar(1000) OUTPUT,

      @delim varchar(100),

      @token varchar(1000) OUTPUT

    )

    AS

    SET NOCOUNT ON

    DECLARE @spos int

    SET @spos = CHARINDEX( @delim , @parm, 1 )

    IF @spos = 0

      BEGIN

        SET @token = @parm

        SET @parm = ''   

      END

    ELSE

      BEGIN

        SET @token = SubString( @parm, 1, @spos - 1)

        SET @parm = Substring( @parm, @spos + Len(@delim), Len(@parm) - @spos )

      END

    GO

    --------------------------------------------------------------------------------

    -- SAMPLE USAGE 1

    --------------------------------------------------------------------------------

    DECLARE @S varchar(1000), @d varchar(10)

    DECLARE @token varchar(1000)

    SET @S = 'Ed,Tom,1/1/2000,,,,,,,,,,'

    SET @d = ','

    PRINT @S

    PRINT '---------------------------------------------------------'

    SET NOCOUNT ON

    WHILE @S <> ''

    BEGIN

      EXEC uspGetToken @S OUTPUT, @d, @token OUTPUT

      PRINT '>' + @token

    END

    GO

    --------------------------------------------------------------------------------

    -- SAMPLE USAGE 2

    --------------------------------------------------------------------------------

    DECLARE @S varchar(100), @d varchar(10)

    --DECLARE @v1 varchar(1000), @v2 varchar(1000), @v3 varchar(1000), @v4 varchar(1000), @v5 varchar(1000)

    DECLARE @Fname varchar(20), @Lname varchar(20), @DOB varchar(8)

    SET @S = 'Doe;Jane;12231968'

    SET @d = ';'

    PRINT @S

    SET NOCOUNT ON

    EXEC uspGetToken @S OUTPUT, @d, @Fname OUTPUT

    EXEC uspGetToken @S OUTPUT, @d, @Lname OUTPUT

    EXEC uspGetToken @S OUTPUT, @d, @DOB  OUTPUT

    PRINT @Fname

    PRINT @Lname

    PRINT @DOB

    GO

  • mkeast, interesting solution, the type of thing I was hoping for

     

    excuse my stupidity, but when I try to replace the @S with the field that contains this string I'm trying to parse, all it seems to grab is one record

  • rockmouse, I know how to bulk insert in, how do I bulk insert out?

     

    If I have a table with a single field, say TestData..Field01, how do I create a file in c:\testdata\fileout.txt

  • By using bcp. ( see BOL for this )

    There was a discusiion here:

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=44176


    You must unlearn what You have learnt

  • >excuse my stupidity, but when I try to replace the @S with the field that

    >contains this string I'm trying to parse, all it seems to grab is one record

    Make sure that @d is set to the appropriate delimiter ( comma, semicolon, wahtever) that appears in @S.

    EXEC uspGetToken @S OUTPUT, @d, @token OUTPUT

    @S is the source string, which contains strings delimited by the value in @d

    @d is the delimiter (can be up to 100 chars)

    @token is the leftmost substring taken from @S and returned by the SPROC.

    After the first call, the substring (@token) and the first delimiter (@d) is deleted from @S:

    SET @S = 'ABC,DEF,GHI'

    EXEC uspGetToken @S OUTPUT, ',' , @token OUTPUT

    At this point, @S = DEF,GHI' and @token = 'ABC'

    Maybe the UDF version would work better for you:

    CREATE FUNCTION dbo.fGetToken

    (

      @parm varchar(8000),  -- source string containing delimiters

      @delim varchar(100),   -- delimiter string

      @whichOccur smallint  -- number of the delimited string you want, from the left

    )

    RETURNS varchar(8000)

    AS

    BEGIN

    DECLARE @occur int, @spos int, @startPos int, @maxlen int

    DECLARE @token varchar(8000)

    DECLARE @delimLen int

    -- Len() function ignores trailing blanks, so if our delimiter is a space,

    -- or multiple spaces, Len() will returns zero.

    IF Len(@delim) = 0 AND DataLength(@delim) > 0

      SET @delimLen = DataLength(@delim)  --  DataLength(@delim) * 2 for nvarchar

    ELSE

      SET @delimLen = Len(@delim)

    SET @occur = 0

    SET @startPos = 1

    SET @maxLen = Len(@parm)

    WHILE @occur < @whichOccur AND @parm <> '' AND @parm IS NOT NULL AND @startPos <= @maxLen

    BEGIN

      SET @spos = CHARINDEX( @delim , @parm, @startPos )

      IF @spos = 0

        BEGIN

          SET @token = Substring(@parm, @startPos, 8000)

          SET @occur = @occur + 1

          BREAK

        END

      SET @token = SubString( @parm, @startPos, @spos - @startPos)

      SET @startPos = @spos + @delimLen

      SET @occur = @occur + 1

    END

    IF @occur <> @whichOccur

      SET @token = '' -- or NULL

    RETURN @token

    END

    GO

    --------------------------------------------------------------------------------

    -- SAMPLE USAGE

    --------------------------------------------------------------------------------

    DECLARE @S varchar(1000)

    SET @S = 'Ed,Tom,1/1/2000,,,,,,,,,Dog,'

    SELECT dbo.fGetToken(@s, ',' , 1) AS [1]

    SELECT dbo.fGetToken(@s, ',' , 2) as [2]

    SELECT dbo.fGetToken(@s, ',' , 3) as [3]

    SELECT dbo.fGetToken(@s, ',' , 4) as [4]

    SELECT dbo.fGetToken(@s, ',' , 12) as [12]

    SELECT dbo.fGetToken(@s, ',' , 13) as [13]

    SELECT dbo.fGetToken(@s, ',' , 14) as [14]

  • mkeast, thanks for your patience and help

    I'm still having a problem.  Lets assume I have a table called SlowThinker (that me, LOL) and it contains one field, ExData.  There are numerous records in the table, but for this example lets assume there are just three records:

    DATA:

     1,aa,bb,cc,dd

     2,ee,ff,gg,hh

     3,ii,jj,kk,ll

    DECLARE @S varchar(1000)

    SELECT @S = ExData FROM SlowThinker

    SELECT

    dbo.fGetToken(@s, ',' , 1) AS [1],

    dbo.fGetToken(@s, ',' , 2) as [2],

    dbo.fGetToken(@s, ',' , 3) as [3],

    dbo.fGetToken(@s, ',' , 4) as [4],

    dbo.fGetToken(@s, ',' , 5) as [5],

     

    When I execute the above in Query Analyzer, I get the last record only:

    1      2   3   4    5

    3      ii   jj   kk    ll

    How can I get all the records to appear?  Thanks again.  This is a very creative method, I'm just not grasping how to apply it.

  • You can't select multiple rows into a local variable like that.

      DECLARE @S varchar(1000)

      SELECT @S = ExData FROM SlowThinker

    I ran this:

    DROP TABLE SlowThinker

    GO

    CREATE TABLE SlowThinker

    (

      id int PRIMARY KEY,

      ExData varchar(100)

    )

    SET NOCOUNT ON

    INSERT SlowThinker (id, ExData) VALUES (1, 'aa,bb,cc,dd')

    INSERT SlowThinker (id, ExData) VALUES (2, 'ee,ff,gg,hh')

    INSERT SlowThinker (id, ExData) VALUES (3, 'ii,jj,kk,ll')

    -- DECLARE @S varchar(1000)

    -- SELECT @S = ExData FROM SlowThinker

    SELECT * FROM SlowThinker

    SELECT dbo.fGetToken(ExData, ',' , 1) AS [Value1],

           dbo.fGetToken(ExData, ',' , 2) AS [Value2],

           dbo.fGetToken(ExData, ',' , 3) AS [Value3],

           dbo.fGetToken(ExData, ',' , 4) AS [Value4],

           dbo.fGetToken(ExData, ',' , 5) AS [Value5]

      FROM SlowThinker

    ================================

    RESULTS:

    ================================

    id          ExData       

    ----------- -------------

    1           aa,bb,cc,dd

    2           ee,ff,gg,hh

    3           ii,jj,kk,ll

    Value1           Value2           Value3           Value4           Value5

    ---------------- ---------------- ---------------- ---------------- ----------------

    aa               bb               cc               dd

    ee               ff               gg               hh

    ii               jj               kk               ll

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ll                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              

  • Here's an expanded example showing how you could insert rows into a new table while parsing, or update existing rows with the parsed data.

    DROP TABLE SlowThinker

    GO

    DROP TABLE TheData

    GO

    CREATE TABLE SlowThinker

    (

      id int PRIMARY KEY,

      ExData varchar(100)

    )

    CREATE TABLE TheData

    (

      id int PRIMARY KEY,

      Col1 varchar(100),

      Col2 varchar(100),

      Col3 varchar(100),

      Col4 varchar(100),

      Col5 varchar(100)

    )

    SET NOCOUNT ON

    INSERT SlowThinker (id, ExData) VALUES (1, 'aa,bb,cc,dd')

    INSERT SlowThinker (id, ExData) VALUES (2, 'ee,ff,gg,hh')

    INSERT SlowThinker (id, ExData) VALUES (3, 'ii,jj,kk,ll')

    -- DECLARE @S varchar(1000)

    -- SELECT @S = ExData FROM SlowThinker

    SELECT * FROM SlowThinker

    SELECT dbo.fGetToken(ExData, ',' , 1) AS [Value1],

           dbo.fGetToken(ExData, ',' , 2) AS [Value2],

           dbo.fGetToken(ExData, ',' , 3) AS [Value3],

           dbo.fGetToken(ExData, ',' , 4) AS [Value4],

           dbo.fGetToken(ExData, ',' , 5) AS [Value5]

      FROM SlowThinker

    INSERT TheData (id, Col1, Col2, Col3, Col4, Col5)

        SELECT id,

               dbo.fGetToken(ExData, ',' , 1) AS [Value1],

               dbo.fGetToken(ExData, ',' , 2) AS [Value2],

               dbo.fGetToken(ExData, ',' , 3) AS [Value3],

               dbo.fGetToken(ExData, ',' , 4) AS [Value4],

               dbo.fGetToken(ExData, ',' , 5) AS [Value5]

          FROM SlowThinker

    PRINT ''

    PRINT 'INSERT data'

    PRINT ''

    SELECT * FROM TheData

    TRUNCATE TABLE TheData

    INSERT TheData (id) SELECT id FROM SlowThinker

    UPDATE TheData

       SET Col1 = dbo.fGetToken(s.ExData, ',' , 1),

           Col2 = dbo.fGetToken(s.ExData, ',' , 2),

           Col3 = dbo.fGetToken(s.ExData, ',' , 3), 

           Col4 = dbo.fGetToken(s.ExData, ',' , 4),

           Col5 = dbo.fGetToken(s.ExData, ',' , 5)

      FROM TheData d JOIN SlowThinker s ON d.id = s.id

    PRINT ''

    PRINT 'UPDATE data'

    PRINT ''

    SELECT * FROM TheData

  • mkeast  ...  WOW  .... this is really cool, thanks again

  • rockmouse thanks for the link for bcp out  ...  I need that for another task, this mkeast solution for this parsing issue is slick

  • Here is an expanded example showing how to use the UDF to INSERT or UPDATE rows in a table.

    DROP TABLE SlowThinker

    GO

    DROP TABLE TheData

    GO

    CREATE TABLE SlowThinker

    (

      id int PRIMARY KEY,

      ExData varchar(100)

    )

    CREATE TABLE TheData

    (

      id int PRIMARY KEY,

      Col1 varchar(100),

      Col2 varchar(100),

      Col3 varchar(100),

      Col4 varchar(100),

      Col5 varchar(100)

    )

    SET NOCOUNT ON

    INSERT SlowThinker (id, ExData) VALUES (1, 'aa,bb,cc,dd')

    INSERT SlowThinker (id, ExData) VALUES (2, 'ee,ff,gg,hh')

    INSERT SlowThinker (id, ExData) VALUES (3, 'ii,jj,kk,ll')

    -- DECLARE @S varchar(1000)

    -- SELECT @S = ExData FROM SlowThinker

    SELECT * FROM SlowThinker

    SELECT dbo.fGetToken(ExData, ',' , 1) AS [Value1],

           dbo.fGetToken(ExData, ',' , 2) AS [Value2],

           dbo.fGetToken(ExData, ',' , 3) AS [Value3],

           dbo.fGetToken(ExData, ',' , 4) AS [Value4],

           dbo.fGetToken(ExData, ',' , 5) AS [Value5]

      FROM SlowThinker

    INSERT TheData (id, Col1, Col2, Col3, Col4, Col5)

        SELECT id,

               dbo.fGetToken(ExData, ',' , 1) AS [Value1],

               dbo.fGetToken(ExData, ',' , 2) AS [Value2],

               dbo.fGetToken(ExData, ',' , 3) AS [Value3],

               dbo.fGetToken(ExData, ',' , 4) AS [Value4],

               dbo.fGetToken(ExData, ',' , 5) AS [Value5]

          FROM SlowThinker

    PRINT ''

    PRINT 'INSERT data'

    PRINT ''

    SELECT * FROM TheData

    TRUNCATE TABLE TheData

    INSERT TheData (id) SELECT id FROM SlowThinker

    UPDATE TheData

       SET Col1 = dbo.fGetToken(s.ExData, ',' , 1),

           Col2 = dbo.fGetToken(s.ExData, ',' , 2),

           Col3 = dbo.fGetToken(s.ExData, ',' , 3), 

           Col4 = dbo.fGetToken(s.ExData, ',' , 4),

           Col5 = dbo.fGetToken(s.ExData, ',' , 5)

      FROM TheData d JOIN SlowThinker s ON d.id = s.id

    PRINT ''

    PRINT 'UPDATE data'

    PRINT ''

    SELECT * FROM TheData

  • Hi,

    I tried to do this in 1 select, and this is what I came up with,

    there is probably room for improvement but seems to work

    Edit: Ooops there was a second page in this thread...

    Anyway posting it

    CREATE TABLE #SlowThinker

    (

      id int PRIMARY KEY,

      ExData varchar(100)

    )

    SET NOCOUNT ON

    INSERT #SlowThinker (id, ExData) VALUES (1,'Ed,Tom,1/1/2000,,,,,,,,,,')

    INSERT #SlowThinker (id, ExData) VALUES (2,'Tom,Jones,,dog,,,,,,,,,')

    INSERT #SlowThinker (id, ExData) VALUES (3,'Paul,,,,,,,,,,,red,')

    INSERT #SlowThinker (id, ExData) VALUES (4,'Steve,,,,,,,,,,,,')

    INSERT #SlowThinker (id, ExData) VALUES (5,'1,Ed,Tom,1/1/2000,,,,,,,,,,')

    INSERT #SlowThinker (id, ExData) VALUES (6,'1,Tom,Jones,,dog,,,,,,,,,')

    INSERT #SlowThinker (id, ExData) VALUES (7,'1,Paul,,,,,,,,,,,red,')

    INSERT #SlowThinker (id, ExData) VALUES (8,'1,Steve,,,,,,,,,,,,')

    INSERT #SlowThinker (id, ExData) VALUES (9,'2,big,A,,,')

    INSERT #SlowThinker (id, ExData) VALUES (10,'2,small,A,,,')

    INSERT #SlowThinker (id, ExData) VALUES (11,'2,,,,,')

    select

     n.id,

     max(case when nbr = 1 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n1,

     max(case when nbr = 2 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n2,

     max(case when nbr = 3 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n3,

     max(case when nbr = 4 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n4,

     max(case when nbr = 5 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n5,

     max(case when nbr = 6 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n6,

     max(case when nbr = 7 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n7,

     max(case when nbr = 8 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n8,

     max(case when nbr = 9 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n9,

     max(case when nbr = 10 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n10,

     max(case when nbr = 11 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n11,

     max(case when nbr = 12 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n12,

     max(case when nbr = 13 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n13,

     max(case when nbr = 14 then substring(n.ExData,idx,nxt_idx-idx) else null end) as n14

    from

     ( select

      1 + len(substring('*'+ExData,1,number))-len(replace(substring('*'+ExData,1,number),',','')) as nbr,

      number as idx,

      case when substring(ExData,number,1) = ',' then number else

       case charindex(',',','+ExData,number+1) when 0 then len(ExData)+1 else charindex(',',ExData,number+1) end

      end as nxt_idx,

      id,

      ExData

     from

      #SlowThinker,

      master.dbo.spt_values tallytable

     where

      tallytable.type = 'P'

      and charindex(',',','+ExData,number) = number

      and number > -1 ) as n

     join #SlowThinker on #SlowThinker.id = n.id

    group by

     n.id

    DROP TABLE #SlowThinker


    You must unlearn what You have learnt

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply