BULK INSERT where source has variable number of fields

  • I have a source text file that for one type of record (inactive) has 25 fields, but for another type of record (active) has up to 36 fields.

    I cannot seem to find a way eiether using bcp, BULK INSERT with or without a FILEFORMAT xml file to actually import this.

    Any ideas on how I may be able to do this? The only way I can think of doing this is to load the whole record into a single field and parse it once it is in the database....

    Thanks in advance.

    Craig

  • Look up on Books Online (free 'Help' that comes with the SQL Server) for OPENROWSET.. that will help you resolve this issue..

  • Thanks for your response. I have been hunting high and low, and through what I am assuming is the page you have referenced and there does not seem to be any mention of a situation where you have a source file that looks like the following:

    ID|Name|Active|Price

    00001|Test Product #1|Y|1.50

    00002|Test Product #2|N

    00003|Test Product #3|Y|23.45

    00004|Some other name|N

    with data.xml as follows:

    <?xml version="1.0"?>

    <BCPFORMAT

    xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <RECORD>

    <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="4"/>

    <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="50"/>

    <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="1"/>

    <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="" MAX_LENGTH="10"/>

    </RECORD>

    <ROW>

    <COLUMN SOURCE="1" NAME="ID" xsi:type="SQLNVARCHAR"/>

    <COLUMN SOURCE="2" NAME="Name" xsi:type="SQLNVARCHAR"/>

    <COLUMN SOURCE="3" NAME="Active" xsi:type="SQLNVARCHAR"/>

    <COLUMN SOURCE="4" NAME="Price" xsi:type="SQLMONEY"/>

    </ROW>

    </BCPFORMAT>

    the following command

    select * from openrowset(bulk 'c:\users\craig\desktop\test.dat', FIRSTROW=2, FORMATFILE='c:\users\craig\desktop\test.xml') as t

    produces an error

    Msg 4864, Level 16, State 1, Line 1

    Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 3, column 3 (Active).

  • And to qualify - the following data file

    ID|Name|Active|Price

    00001|Test Product #1|Y|1.50

    00002|Test Product #2|N|1

    00003|Test Product #3|Y|23.45

    00004|Some other name|N|1

    works as expected!

    Thanks....

  • craig 33233 (9/27/2010)


    I have a source text file that for one type of record (inactive) has 25 fields, but for another type of record (active) has up to 36 fields

    Ah, this seems tricky... if the number of fields per row is consistent in each files, with each file having varied number of fields, then OPENROWSET will work.. BUT , when there is inconsistent number of fields in each row for each files, then i cant think of a way to do it 🙁 ..

    Lets for the bigwigs of SSC to wield their shields and come up with an approach..

    A BULK INSERT of the all the rows as a single blob and then using any delimited-splitter will give you what you want, but that will take some coding...

  • ColdCoffee (9/28/2010)


    craig 33233 (9/27/2010)


    I have a source text file that for one type of record (inactive) has 25 fields, but for another type of record (active) has up to 36 fields

    Ah, this seems tricky... if the number of fields per row is consistent in each files, with each file having varied number of fields, then OPENROWSET will work.. BUT , when there is inconsistent number of fields in each row for each files, then i cant think of a way to do it 🙁 ..

    Lets for the bigwigs of SSC to wield their shields and come up with an approach..

    A BULK INSERT of the all the rows as a single blob and then using any delimited-splitter will give you what you want, but that will take some coding...

    Using a blob field would be quite inefficient here and would carry the extra overhead of having to split on the row terminators which is unnecessary.

    Rather you could create a simple format file called BulkFile.fmt which has a single SQLCHAR 8000 column called BulkColumn. Alternatively if you don't want to use a format file, you could BULK INSERT into a single column staging table and modify the code accordingly.

    ;WITH cteTally (N) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0))

    FROM master..syscolumns

    )

    ,

    cte AS

    (

    SELECT BulkColumn FROM OPENROWSET (BULK 'C:\users\craig\desktop\YourFile.txt', FORMATFILE = 'C:\users\craig\desktop\BulkFile.fmt', FIRSTROW = 2) AS Z

    )

    SELECT [1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31],[32],[33],[34],[35],[36]

    FROM cte

    CROSS APPLY

    (

    SELECT [1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31],[32],[33],[34],[35],[36]

    FROM

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)),

    SUBSTRING(BulkColumn + '|', N, CHARINDEX('|', BulkColumn + '|', N) - N)

    FROM cteTally

    WHERE N < LEN(BulkColumn) + 2 AND SUBSTRING('|' + BulkColumn + '|', N, 1) = '|'

    ) AS Z (ROW, VALUE)

    PIVOT(MAX(Value) FOR ROW IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31],[32],[33],[34],[35],[36])) AS pvt

    ) AS Y

  • Thanks for the advice, but knowing that Microsoft thought it *inconceivable* (flashbacks to the Princess Bride!) that this could ever happen and that I'm not missing something I've decided to go with a different solution which may hopefully be easier to maintain.

    I've explicitly defined the fields that I know will always be there in the xml, and then clumped the (optional) remainder into a single field (up to row terminator) and parse it out manually in the upsert (merge) if there is something there...

    Cheers.

    Craig

  • OK then, another idea perhaps. How about this monster CROSS APPLY version based on some cool CROSS APPLY tricks 🙂 from Brad Schulz

    http://bradsruminations.blogspot.com/search/label/CROSS%20APPLY?updated-max=2009-07-11T12%3A01%3A00-07%3A00&max-results=20

    IF NOT OBJECT_ID('tempdb.dbo.#stage', 'U') IS NULL DROP TABLE #stage

    CREATE TABLE #stage (BulkColumn varchar(8000))

    BULK INSERT #stage FROM 'C:\users\craig\desktop\YourFile.txt'

    WITH

    (

    FIRSTROW = 2, TABLOCK

    )

    ;WITH cte (BulkColumn) AS

    (

    SELECT BulkColumn + REPLICATE('|', 36) FROM #stage

    )

    SELECT

    col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11, col12, col13, col14, col15, col16, col17, col18, col19, col20, col21, col22, col23, col24, col25, col26, col27, col28, col29, col30, col31, col32, col33, col34, col35, col36

    FROM cte

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn)) AS Z1 (p1)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z1.p1 + 1)) AS Z2 (p2)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z2.p2 + 1)) AS Z3 (p3)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z3.p3 + 1)) AS Z4 (p4)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z4.p4 + 1)) AS Z5 (p5)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z5.p5 + 1)) AS Z6 (p6)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z6.p6 + 1)) AS Z7 (p7)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z7.p7 + 1)) AS Z8 (p8)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z8.p8 + 1)) AS Z9 (p9)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z9.p9 + 1)) AS Z10 (p10)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z10.p10 + 1)) AS Z11 (p11)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z11.p11 + 1)) AS Z12 (p12)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z12.p12 + 1)) AS Z13 (p13)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z13.p13 + 1)) AS Z14 (p14)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z14.p14 + 1)) AS Z15 (p15)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z15.p15 + 1)) AS Z16 (p16)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z16.p16 + 1)) AS Z17 (p17)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z17.p17 + 1)) AS Z18 (p18)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z18.p18 + 1)) AS Z19 (p19)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z19.p19 + 1)) AS Z20 (p20)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z20.p20 + 1)) AS Z21 (p21)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z21.p21 + 1)) AS Z22 (p22)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z22.p22 + 1)) AS Z23 (p23)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z23.p23 + 1)) AS Z24 (p24)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z24.p24 + 1)) AS Z25 (p25)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z25.p25 + 1)) AS Z26 (p26)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z26.p26 + 1)) AS Z27 (p27)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z27.p27 + 1)) AS Z28 (p28)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z28.p28 + 1)) AS Z29 (p29)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z29.p29 + 1)) AS Z30 (p30)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z30.p30 + 1)) AS Z31 (p31)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z31.p31 + 1)) AS Z32 (p32)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z32.p32 + 1)) AS Z33 (p33)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z33.p33 + 1)) AS Z34 (p34)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z34.p34 + 1)) AS Z35 (p35)

    CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z35.p35 + 1)) AS Z36 (p36)

    CROSS APPLY (SELECT SUBSTRING(cte.BulkColumn, 1, Z1.p1 - 1)

    ,SUBSTRING(cte.BulkColumn, Z1.p1 + 1, Z2.p2 - Z1.p1 - 1)

    ,SUBSTRING(cte.BulkColumn, Z2.p2 + 1, Z3.p3 - Z2.p2 - 1)

    ,SUBSTRING(cte.BulkColumn, Z3.p3 + 1, Z4.p4 - Z3.p3 - 1)

    ,SUBSTRING(cte.BulkColumn, Z4.p4 + 1, Z5.p5 - Z4.p4 - 1)

    ,SUBSTRING(cte.BulkColumn, Z5.p5 + 1, Z6.p6 - Z5.p5 - 1)

    ,SUBSTRING(cte.BulkColumn, Z6.p6 + 1, Z7.p7 - Z6.p6 - 1)

    ,SUBSTRING(cte.BulkColumn, Z7.p7 + 1, Z8.p8 - Z7.p7 - 1)

    ,SUBSTRING(cte.BulkColumn, Z8.p8 + 1, Z9.p9 - Z8.p8 - 1)

    ,SUBSTRING(cte.BulkColumn, Z9.p9 + 1, Z10.p10 - Z9.p9 - 1)

    ,SUBSTRING(cte.BulkColumn, Z10.p10 + 1, Z11.p11 - Z10.p10 - 1)

    ,SUBSTRING(cte.BulkColumn, Z11.p11 + 1, Z12.p12 - Z11.p11 - 1)

    ,SUBSTRING(cte.BulkColumn, Z12.p12 + 1, Z13.p13 - Z12.p12 - 1)

    ,SUBSTRING(cte.BulkColumn, Z13.p13 + 1, Z14.p14 - Z13.p13 - 1)

    ,SUBSTRING(cte.BulkColumn, Z14.p14 + 1, Z15.p15 - Z14.p14 - 1)

    ,SUBSTRING(cte.BulkColumn, Z15.p15 + 1, Z16.p16 - Z15.p15 - 1)

    ,SUBSTRING(cte.BulkColumn, Z16.p16 + 1, Z17.p17 - Z16.p16 - 1)

    ,SUBSTRING(cte.BulkColumn, Z17.p17 + 1, Z18.p18 - Z17.p17 - 1)

    ,SUBSTRING(cte.BulkColumn, Z18.p18 + 1, Z19.p19 - Z18.p18 - 1)

    ,SUBSTRING(cte.BulkColumn, Z19.p19 + 1, Z20.p20 - Z19.p19 - 1)

    ,SUBSTRING(cte.BulkColumn, Z20.p20 + 1, Z21.p21 - Z20.p20 - 1)

    ,SUBSTRING(cte.BulkColumn, Z21.p21 + 1, Z22.p22 - Z21.p21 - 1)

    ,SUBSTRING(cte.BulkColumn, Z22.p22 + 1, Z23.p23 - Z22.p22 - 1)

    ,SUBSTRING(cte.BulkColumn, Z23.p23 + 1, Z24.p24 - Z23.p23 - 1)

    ,SUBSTRING(cte.BulkColumn, Z24.p24 + 1, Z25.p25 - Z24.p24 - 1)

    ,SUBSTRING(cte.BulkColumn, Z25.p25 + 1, Z26.p26 - Z25.p25 - 1)

    ,SUBSTRING(cte.BulkColumn, Z26.p26 + 1, Z27.p27 - Z26.p26 - 1)

    ,SUBSTRING(cte.BulkColumn, Z27.p27 + 1, Z28.p28 - Z27.p27 - 1)

    ,SUBSTRING(cte.BulkColumn, Z28.p28 + 1, Z29.p29 - Z28.p28 - 1)

    ,SUBSTRING(cte.BulkColumn, Z29.p29 + 1, Z30.p30 - Z29.p29 - 1)

    ,SUBSTRING(cte.BulkColumn, Z30.p30 + 1, Z31.p31 - Z30.p30 - 1)

    ,SUBSTRING(cte.BulkColumn, Z31.p31 + 1, Z32.p32 - Z31.p31 - 1)

    ,SUBSTRING(cte.BulkColumn, Z32.p32 + 1, Z33.p33 - Z32.p32 - 1)

    ,SUBSTRING(cte.BulkColumn, Z33.p33 + 1, Z34.p34 - Z33.p33 - 1)

    ,SUBSTRING(cte.BulkColumn, Z34.p34 + 1, Z35.p35 - Z34.p34 - 1)

    ,SUBSTRING(cte.BulkColumn, Z35.p35 + 1, Z36.p36 - Z35.p35 - 1)

    ) AS ZSplit (col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11, col12, col13, col14, col15, col16, col17, col18, col19, col20, col21, col22, col23, col24, col25, col26, col27, col28, col29, col30, col31, col32, col33, col34, col35, col36)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply