December 16, 2009 at 5:37 am
Hi to all,
I have recently joined at a firm to work as a Data Analyst and part of my daily job requires me to go thru a lot of data. I usually carry out most of my work on Excel as its enough for most of the work and occassionally use SQL if the data is reallyHuge.The Issue I face is that Till Now, i have imported files into SQL(I'm using SQL server 2008 Management Studio) from Excel sheets and had no problem,but now I have to import a text file of about 800 mb in size and When i Run the import/export wizard,I 'm greeted with an error that says 'Data Conversion failed'. I need help regarding this. Before i go into the problem itself please let me tell you:I am a Newbie to SQL.
Now,please allow me to describe my predicament in further detail.
I am trying to import a text file of about 800 mb in size that has data in the following format:
(A sample of data from the txt file)
Then I choose the Data Source As Flat file from the drop down and everything seem to be fine until I try to run the Package when the following error hits me in the face:
I've tried to change the data types for source and destination but of no use....... To be honest i don't even know which data types to assign to the columns.... The values of column 1 in the text file can go upto 1000000 or more.
Please try and help me..... without resolving this issue I'm stuck and everything has come to a standstill here... not many ppl to approach for help here either:-(
Thanks in advance to anyone willing to take a look.
December 16, 2009 at 5:48 am
Can u please provide some data in text file? And let me know what is code page you are using?
December 16, 2009 at 6:17 am
The data in the text file is something like this:
COLUMN0 COLUMN1
yahoo 510297
kijiji 376608
facebook login 350253
gmail 313800
windows live 292609
img wlxrs 286278
hotmail.com 272570
you tube 263347
http://www.facebook.com 252276
google.ca 237275
yahoo mail 214191
microsoft messenger 190257
nbcsports msnbc 186310
msn 184693
image xboxlive 181117
ebay 175051
yahoo.ca 146628
plenty of fish 134887
google maps 134701
canada 411 134323
craigslist 129121
yahoo.com 128963
google.com 128295
face book 126520
youtube.com 117249
mapquest 116769
plentyoffish 111863
img2 wlxrs 96811
porn 93996
westjet 91128
air canada 88518
mls.ca 88008
As you can see the data in the 'COLUMN1' is striclty numbers(+ve Integers) while the data in 'COLUMN0' can be anything(text,symbols,numbers etc.)
..........and the data is tab-delimited.
December 16, 2009 at 8:09 am
Would it be possible to attach an extract from the actual file you're having difficulties with? A hundred or so records in the actual format the import is attempting to handle would useful.
Typing the data yourself no doubt removes any of the problem data the import is encountering.
BrainDonor
December 16, 2009 at 8:12 am
I had a similar problem. In regard to the truncation errors, what I initially did was create a staging table and set all my character fields to TEXT. That way, I was intentionally making them larger than my data so there'd be no truncation errors. As I gained more experience, I eventually learned how to use BCP to elegantly import text using proper column types and without using a staging table.
As for your issue, I have two thoughts:
Just my $0.02 . . .
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 16, 2009 at 8:36 am
ok. I will provide a sample.Actually I cannot open the original file because its about 830 MB in size and my notepad can barely handle it.I have tried opening it but it takes an eon for the notepad to respond and still doesn't open the file,So I have imported the data into Excel and though I couldn't import all of the data (as there's a Row-Limit in Excel),A large amount of it got imported and now I'm pasting over a 100 rows from the Excel sheet:
chfi 5985
alc 5971
garth brooks announces he's coming out of retirement thursday 5968
gamespot 5950
osap 5944
canada 5938
go transit 5937
saskatoon star phoenix 5934
192.168.1.1 5920
e-bay 5915
cibc personal banking 5913
starfall 5913
charity village 5909
windows update 5886
google. 5881
kelli carpenter 5875
http://www.googleca 5846
gogole 5839
649 5831
future shop.ca 5828
680 news 5823
superstore 5811
"berlin heart" + charles tillman + washington 5797
disney 5795
funny games 5790
toyota 5774
rogers yahoo mail 5773
canada.com 5771
job bank canada 5761
750-pound shark 5747
flight status 5742
flyerland.ca 5740
great west life 5732
literotica.com 5729
toronto weather 5723
gthl 5708
yahoo email 5696
avon.ca 5694
cat food diet 5691
yahoo answers 5682
torontostar 5681
allegiant air 5676
no frills 5676
http://www.f 5675
ford canada 5674
faceboo 5672
hoymail 5669
file extension pps 5668
big boobs 5658
scotia online 5650
globeandmail 5640
future shop canada 5638
http://www.hotmail 5638
waiter jon-barrett ingels 5630
cbs 5628
rosie o'donnell kelli carpenter 5628
rogers video 5627
ikea canada 5620
cam4.com 5616
download.com 5602
the province 5593
canadapost 5584
lotteries 5584
kate gosselin 5568
nasza klasa 5555
southwest airlines 5552
days of our lives 5549
mass murderer dies 5548
kate beckinsale 5547
blackberry 5510
facebook applications 5500
halloween costume ideas 5496
perezhilton 5496
plenty of fish in the sea 5495
surf the channel 5487
google news 5486
justintv 5480
rogers.yahoo.com 5427
aol mail 5426
yahoo fantasy hockey 5412
cbc sports 5410
img microsoft 5408
maria shriver apologizes 5408
tim hortons 5402
farmville 5395
party packagers 5395
rogers.ca 5385
toyota canada 5384
royalbank.com 5380
yotube 5376
caa 5362
adam lambert 5357
pbskids 5342
the hun 5338
sex videos 5332
rona.ca 5331
meteomedia 5324
q107 5322
webmail 5314
silverdaddies 5312
home depot.ca 5302
soupy sales 5300
vancity 5300
tv shack 5299
coronation street 5298
dr oz 5297
localtvmatters.ca 5291
ups tracking 5291
servicecanada 5289
i3 microsoft 5283
sarnia observer 5276
shaw cable 5274
slutload 5267
new york times 5254
virgin mobile 5242
om 5241
tdcanadatrust.ca 5241
'jon & kate' 5237
avg free download 5236
hotwire 5214
mail 5212
December 16, 2009 at 9:44 am
Sheez . . . that's quite the variety of data.
Here's how I'd tackle it . . .
One thing to bear in mind: the single quotes (') in the first column will likely cause problems, because T-SQL makes extensive use of them.
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 16, 2009 at 9:49 am
One other thought (speaking of stray characters): data cleansing is always a good thing. For example, since you're importing from Excel, it's probably using the TAB character as a delimiter. If any TAB characters exist in your first column, it's going to to halt your import.
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 16, 2009 at 9:54 am
@ray K
you mean change the column data types for the 'Data Source'??? coz I can't.... the defaults are as follows:
Column 0 - string [DT_STR]
Column 1 - single-byte signed integer [DT_I1]
....and they remain the same after pressing the 'Suggest types'....
and there're no varchar or nvarchar choices available in the drop-down..... am using import/export wizard BTW......
December 16, 2009 at 11:51 am
No. What I'm saying is change the destination to VARCHAR(MAX). That would make the column large enough to be handle variable-length character data to whatever the upper limit is (I don't remember the limit off the top of my head). That should be enough to accommodate that column and take care of any truncation issues.
Also (and if I'm mistaken with this, someone please correct me), I believe VARCHAR(MAX) is flexible enough to be able to handle different types of characters.
At the same time, you also need to make sure your data actually can be imported. I had a situation where I was importing log files, and the format was changed in the middle of a month (specifically, an extra column was added). I had to make an adjustment in order to accommodate the extra column. But because these were log files, I wasn't allowed to make any edits to the data source files themselves.
Your situation, however, looks like a different case. Your column contains inconsistent data, and I'm going to assume that you can safely edit the data source without compromising anything. Strange data characters (like the TAB character I mentioned before, for example) can stop an import in its tracks. That's what data cleansing is: making sure your data is in a consistent format so that it's usable by the database.
I also mentioned the single-quote; that shouldn't be too much of a problem with imports (unless you're using it as a text qualifier), but it could be an issue with INSERTS, UPDATES, and SELECTS. (For example: "select * from TABLE where COLUMN = 'O''Brien'" -- note that you need two single quotes, because one single quote would result in an error.)
Hope this helps. Good luck.
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 16, 2009 at 11:54 am
I just remembered something . . .
VARCHAR(MAX) is equivalent to the TEXT type in SQL Server. You should change the column type to TEXT. However, Microsoft is depreciating this data type, and they suggest you should use VARCHAR(MAX) instead.
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 18, 2009 at 3:54 am
I've taken the data you posted and turned it into something that may actually resemble your data. Because you just copied it into your message it was one column, so I have attached a spreadsheet here that has two columns.
In future try to include the precise data in the correct format so we're all working from the same starting point. It isn't always easy, but I also find that going through that process sometimes helps me find the answer before I complete the post.
There's an article about this at: http://qa.sqlservercentral.com/articles/Best+Practices/61537/.
The code I have pasted below is a very basic import from an excel spreadsheet into a temporary table. When you run this I would like to know if it gives you any errors. We should then be able to work from this starting point.
I understand that you have very little SQL knowledge, so we'll keep it as simple as possible.
The core of it creates a temporary table with two varchar columns of the maximum size that SS 2008 can create.
Then it uses the OPENROWSET command to import from the spreadsheet (you can change the filename and location to whatever you require). This should import the data from the first sheet of that spreadsheet into the temporary table.
The SELECT statement that follows is just to show whatever is in that temp table.
With the number of records in your spreadsheet it may take a while. I've never tried this method for any reason other than curiosity, so I couldn't begin to guess.
When you run this code you may get a message along the lines of:
"SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', see "Surface Area Configuration" in SQL Server Books Online. "
If you do see this then need to also run the code that is commented out. Remove the '--' from the start of each line. This changes the SQL Server configuration to allow the OPENROWSET command to run, and then removes the permission again at the end.
The temporary table is just while we're messing about. You can create a permanent table instead of course.
--drop table #ExcelImport
--sp_configure 'show advanced options', 1
--reconfigure
--go
--sp_configure 'Ad Hoc Distributed Queries', 1
--reconfigure
CREATE TABLE #ExcelImport(
ImportText VarChar(MAX),
ImportNumber VarChar(MAX))
INSERT INTO #ExcelImport(ImportText,ImportNumber)
SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\TEMP\Imports\TestImport.xls;HDR=YES',
'SELECT * FROM [Sheet1$]')
SELECT * FROM #ExcelImport
--sp_configure 'show advanced options', 1
--reconfigure
--go
--sp_configure 'Ad Hoc Distributed Queries', 0
--reconfigure
December 21, 2009 at 2:35 am
>RayK
Hi,
Thank you very Much for your suggestions:)
>Braindonor
Hi,
Firstly,Thank you very much for your interest and patience.One couldn't have approached my problem in a better way,I suppose,and I appreciate it.
Secondly, I've read the Forum Posting Ettiquette and I believe I now have a clue as to how to ask for help on a professional forum.Thanks,I'll make sure i follow them in the future:)
Now--I've tried the code you have provided and was the greeted by the error you already predicted,So I removed the Comments from the code and tried to execute it and encountered the following error:
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'sp_configure'.
Msg 102, Level 15, State 1, Line 19
Incorrect syntax near 'show advanced options'.
Msg 15123, Level 16, State 1, Procedure sp_configure, Line 51
The configuration option 'Ad Hoc Distributed Queries' does not exist,
or it may be an advanced option.
So,I also made a few searches for info about Surface Area Configuration and found that i can tweak the settings here: Server-->facets-->-->Surace area config. and While I couldn't find the 'Ad Hoc Distributed Queries' option, there was one which read 'AdHocRemoteQueriesEnabled' which was 'False' and I've turned it to 'true' and tried to execute the program and it popped the following error msg:
OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "(null)"
returned message "Unspecified error".
Msg 7303, Level 16, State 1, Line 14
Cannot initialize the data source object of OLE DB provider
"Microsoft.Jet.OLEDB.4.0" for linked server "(null)".
I guess both those options are not obviously interchangeable,after all....
So, after that I've searched how to enable 'Ad Hoc Distributed Queries' and found some info here:
http://msdn.microsoft.com/en-us/library/ms187569.aspx
However, it was the same as what you have written in the code you provided.So, I don't understand why:
1. it throws up a Syntax error when i try to exec ur code.
2.why I can't see the 'Ad Hoc Distributed Queries' option itself in the Surface Area Cofig. window.
December 21, 2009 at 3:39 am
You were right - AdHocRemoteQueriesEnabled is the option you wanted. To see what the settings are for the various options you can run
SELECT * FROM sys.configurations
ORDER BY name
and you should now see 'Ad Hoc Distributed Queries' with a value of 1.
The only reference I've found for the Jet error message is at http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/9605444f-a6e1-40ab-be9f-f40362672039, which then points to http://support.microsoft.com/default.aspx?scid=kb;en-us;239114 for the latest version of the Jet database engine.
I would suggest taking the steps listed in it to check you have the latest files installed. Be careful though - there are different downloads depending upon which version of Windows you are using. If in doubt and you have PC support staff where you are, ask them.
BrainDonor
December 22, 2009 at 11:06 pm
Hi folks,
>braindodnor: Thank you Braindonor:-)
I have finally got around the problem of importing the data into SQL db from the text file.Here's what I did,I split the 800 MB file into 30 smaller text files and tried to upload each one and when they failed to import again,I did a little fiddling around and found that the whole problem was with the text file 'ENCODING'...... all of them (including the acual 800MB original) were in ANSI text format and my sql import/export wizard accepted only 'UNICODE'. So was able to import some files after changing their encoding from ANSI to UNICODE.
Now I've got two simple questions:
1.)I know there are a few editors out there that can open very large text files(PFE,Gvim etc.),but is there one that would let me open my 800 meg file(ANSI) and save it as a 'UNICODE' text file?? . . .I haven't seen that option in any of the text editors I've used so far:-(
or
2.)I have 30 text files(of abt. 25 MB each) and each time I import one of them a NEW table is being created in the SQL db,So my question is : Is there any way to Append/Combine the tables after/before Importing so I would have one big table instead of 30 smaller ones???
Can someone help me???Any help appreciated.
Thanks in Advance.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply