BULK INSERT where source has variable number of fields

Question

Post reply

BULK INSERT where source has variable number of fields

craig 33233

Old Hand

Points: 394
More actions
September 27, 2010 at 7:40 pm

#140830

I have a source text file that for one type of record (inactive) has 25 fields, but for another type of record (active) has up to 36 fields.

I cannot seem to find a way eiether using bcp, BULK INSERT with or without a FILEFORMAT xml file to actually import this.

Any ideas on how I may be able to do this? The only way I can think of doing this is to load the whole record into a single field and parse it once it is in the database....

Thanks in advance.

Craig

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

ColdCoffee SSC-Dedicated Points: 39971 More actions · Answer 1

ColdCoffee

SSC-Dedicated

Points: 39971

September 27, 2010 at 8:20 pm

#1228121

Look up on Books Online (free 'Help' that comes with the SQL Server) for OPENROWSET.. that will help you resolve this issue..

craig 33233 Old Hand Points: 394 More actions · Answer 2

Thanks for your response. I have been hunting high and low, and through what I am assuming is the page you have referenced and there does not seem to be any mention of a situation where you have a source file that looks like the following:

ID|Name|Active|Price

00001|Test Product #1|Y|1.50

00002|Test Product #2|N

00003|Test Product #3|Y|23.45

00004|Some other name|N

with data.xml as follows:

<?xml version="1.0"?>

<BCPFORMAT

xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

</RECORD>

<ROW>

</ROW>

</BCPFORMAT>

the following command

select * from openrowset(bulk 'c:\users\craig\desktop\test.dat', FIRSTROW=2, FORMATFILE='c:\users\craig\desktop\test.xml') as t

produces an error

Msg 4864, Level 16, State 1, Line 1

Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 3, column 3 (Active).

craig 33233 Old Hand Points: 394 More actions · Answer 3

craig 33233

Old Hand

Points: 394

September 28, 2010 at 12:28 am

#1228159

And to qualify - the following data file

ID|Name|Active|Price

00001|Test Product #1|Y|1.50

00002|Test Product #2|N|1

00003|Test Product #3|Y|23.45

00004|Some other name|N|1

works as expected!

Thanks....

ColdCoffee SSC-Dedicated Points: 39971 More actions · Answer 4

craig 33233 (9/27/2010)

I have a source text file that for one type of record (inactive) has 25 fields, but for another type of record (active) has up to 36 fields

Ah, this seems tricky... if the number of fields per row is consistent in each files, with each file having varied number of fields, then OPENROWSET will work.. BUT , when there is inconsistent number of fields in each row for each files, then i cant think of a way to do it 🙁 ..

Lets for the bigwigs of SSC to wield their shields and come up with an approach..

A BULK INSERT of the all the rows as a single blob and then using any delimited-splitter will give you what you want, but that will take some coding...

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 5

ColdCoffee (9/28/2010)

craig 33233 (9/27/2010)

I have a source text file that for one type of record (inactive) has 25 fields, but for another type of record (active) has up to 36 fields

Ah, this seems tricky... if the number of fields per row is consistent in each files, with each file having varied number of fields, then OPENROWSET will work.. BUT , when there is inconsistent number of fields in each row for each files, then i cant think of a way to do it 🙁 ..

Lets for the bigwigs of SSC to wield their shields and come up with an approach..

A BULK INSERT of the all the rows as a single blob and then using any delimited-splitter will give you what you want, but that will take some coding...

Using a blob field would be quite inefficient here and would carry the extra overhead of having to split on the row terminators which is unnecessary.

Rather you could create a simple format file called BulkFile.fmt which has a single SQLCHAR 8000 column called BulkColumn. Alternatively if you don't want to use a format file, you could BULK INSERT into a single column staging table and modify the code accordingly.

;WITH cteTally (N) AS



  (

   SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0))

   FROM master..syscolumns

  )

,

cte AS

  (

   SELECT BulkColumn FROM OPENROWSET (BULK 'C:\users\craig\desktop\YourFile.txt', FORMATFILE = 'C:\users\craig\desktop\BulkFile.fmt', FIRSTROW = 2) AS Z

  )

   SELECT [1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31],[32],[33],[34],[35],[36]

   FROM cte

   CROSS APPLY

    (

     SELECT [1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31],[32],[33],[34],[35],[36]

     FROM

     (

      SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)),

      SUBSTRING(BulkColumn + '|', N, CHARINDEX('|', BulkColumn + '|', N) - N)

      FROM cteTally

      WHERE N < LEN(BulkColumn) + 2 AND SUBSTRING('|' + BulkColumn + '|', N, 1) = '|'

     ) AS Z (ROW, VALUE)

         PIVOT(MAX(Value) FOR ROW IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31],[32],[33],[34],[35],[36])) AS pvt

    ) AS Y

craig 33233 Old Hand Points: 394 More actions · Answer 6

Thanks for the advice, but knowing that Microsoft thought it *inconceivable* (flashbacks to the Princess Bride!) that this could ever happen and that I'm not missing something I've decided to go with a different solution which may hopefully be easier to maintain.

I've explicitly defined the fields that I know will always be there in the xml, and then clumped the (optional) remainder into a single field (up to row terminator) and parse it out manually in the upsert (merge) if there is something there...

Cheers.

Craig

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 7

OK then, another idea perhaps. How about this monster CROSS APPLY version based on some cool CROSS APPLY tricks 🙂 from Brad Schulz

http://bradsruminations.blogspot.com/search/label/CROSS%20APPLY?updated-max=2009-07-11T12%3A01%3A00-07%3A00&max-results=20

IF NOT OBJECT_ID('tempdb.dbo.#stage', 'U') IS NULL DROP TABLE #stage



CREATE TABLE #stage (BulkColumn varchar(8000))

BULK INSERT #stage FROM 'C:\users\craig\desktop\YourFile.txt'

WITH

(

FIRSTROW = 2, TABLOCK

)

;WITH cte (BulkColumn) AS

(

SELECT BulkColumn + REPLICATE('|', 36) FROM #stage

)

SELECT

col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11, col12, col13, col14, col15, col16, col17, col18, col19, col20, col21, col22, col23, col24, col25, col26, col27, col28, col29, col30, col31, col32, col33, col34, col35, col36

FROM cte

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn)) AS Z1 (p1)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z1.p1 + 1)) AS Z2 (p2)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z2.p2 + 1)) AS Z3 (p3)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z3.p3 + 1)) AS Z4 (p4)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z4.p4 + 1)) AS Z5 (p5)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z5.p5 + 1)) AS Z6 (p6)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z6.p6 + 1)) AS Z7 (p7)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z7.p7 + 1)) AS Z8 (p8)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z8.p8 + 1)) AS Z9 (p9)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z9.p9 + 1)) AS Z10 (p10)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z10.p10 + 1)) AS Z11 (p11)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z11.p11 + 1)) AS Z12 (p12)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z12.p12 + 1)) AS Z13 (p13)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z13.p13 + 1)) AS Z14 (p14)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z14.p14 + 1)) AS Z15 (p15)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z15.p15 + 1)) AS Z16 (p16)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z16.p16 + 1)) AS Z17 (p17)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z17.p17 + 1)) AS Z18 (p18)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z18.p18 + 1)) AS Z19 (p19)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z19.p19 + 1)) AS Z20 (p20)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z20.p20 + 1)) AS Z21 (p21)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z21.p21 + 1)) AS Z22 (p22)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z22.p22 + 1)) AS Z23 (p23)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z23.p23 + 1)) AS Z24 (p24)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z24.p24 + 1)) AS Z25 (p25)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z25.p25 + 1)) AS Z26 (p26)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z26.p26 + 1)) AS Z27 (p27)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z27.p27 + 1)) AS Z28 (p28)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z28.p28 + 1)) AS Z29 (p29)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z29.p29 + 1)) AS Z30 (p30)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z30.p30 + 1)) AS Z31 (p31)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z31.p31 + 1)) AS Z32 (p32)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z32.p32 + 1)) AS Z33 (p33)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z33.p33 + 1)) AS Z34 (p34)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z34.p34 + 1)) AS Z35 (p35)

CROSS APPLY (SELECT CHARINDEX('|', cte.BulkColumn, Z35.p35 + 1)) AS Z36 (p36)

CROSS APPLY (SELECT SUBSTRING(cte.BulkColumn, 1, Z1.p1 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z1.p1 + 1, Z2.p2 - Z1.p1 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z2.p2 + 1, Z3.p3 - Z2.p2 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z3.p3 + 1, Z4.p4 - Z3.p3 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z4.p4 + 1, Z5.p5 - Z4.p4 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z5.p5 + 1, Z6.p6 - Z5.p5 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z6.p6 + 1, Z7.p7 - Z6.p6 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z7.p7 + 1, Z8.p8 - Z7.p7 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z8.p8 + 1, Z9.p9 - Z8.p8 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z9.p9 + 1, Z10.p10 - Z9.p9 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z10.p10 + 1, Z11.p11 - Z10.p10 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z11.p11 + 1, Z12.p12 - Z11.p11 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z12.p12 + 1, Z13.p13 - Z12.p12 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z13.p13 + 1, Z14.p14 - Z13.p13 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z14.p14 + 1, Z15.p15 - Z14.p14 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z15.p15 + 1, Z16.p16 - Z15.p15 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z16.p16 + 1, Z17.p17 - Z16.p16 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z17.p17 + 1, Z18.p18 - Z17.p17 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z18.p18 + 1, Z19.p19 - Z18.p18 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z19.p19 + 1, Z20.p20 - Z19.p19 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z20.p20 + 1, Z21.p21 - Z20.p20 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z21.p21 + 1, Z22.p22 - Z21.p21 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z22.p22 + 1, Z23.p23 - Z22.p22 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z23.p23 + 1, Z24.p24 - Z23.p23 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z24.p24 + 1, Z25.p25 - Z24.p24 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z25.p25 + 1, Z26.p26 - Z25.p25 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z26.p26 + 1, Z27.p27 - Z26.p26 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z27.p27 + 1, Z28.p28 - Z27.p27 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z28.p28 + 1, Z29.p29 - Z28.p28 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z29.p29 + 1, Z30.p30 - Z29.p29 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z30.p30 + 1, Z31.p31 - Z30.p30 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z31.p31 + 1, Z32.p32 - Z31.p31 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z32.p32 + 1, Z33.p33 - Z32.p32 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z33.p33 + 1, Z34.p34 - Z33.p33 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z34.p34 + 1, Z35.p35 - Z34.p34 - 1)

		   ,SUBSTRING(cte.BulkColumn, Z35.p35 + 1, Z36.p36 - Z35.p35 - 1)

	    ) AS ZSplit (col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11, col12, col13, col14, col15, col16, col17, col18, col19, col20, col21, col22, col23, col24, col25, col26, col27, col28, col29, col30, col31, col32, col33, col34, col35, col36)