Text File Import Problem

  • Hi to all,

    I have recently joined at a firm to work as a Data Analyst and part of my daily job requires me to go thru a lot of data. I usually carry out most of my work on Excel as its enough for most of the work and occassionally use SQL if the data is reallyHuge.The Issue I face is that Till Now, i have imported files into SQL(I'm using SQL server 2008 Management Studio) from Excel sheets and had no problem,but now I have to import a text file of about 800 mb in size and When i Run the import/export wizard,I 'm greeted with an error that says 'Data Conversion failed'. I need help regarding this. Before i go into the problem itself please let me tell you:I am a Newbie to SQL.

    Now,please allow me to describe my predicament in further detail.

    I am trying to import a text file of about 800 mb in size that has data in the following format:

    (A sample of data from the txt file)

    Then I choose the Data Source As Flat file from the drop down and everything seem to be fine until I try to run the Package when the following error hits me in the face:

    I've tried to change the data types for source and destination but of no use....... To be honest i don't even know which data types to assign to the columns.... The values of column 1 in the text file can go upto 1000000 or more.

    Please try and help me..... without resolving this issue I'm stuck and everything has come to a standstill here... not many ppl to approach for help here either:-(

    Thanks in advance to anyone willing to take a look.

  • Can u please provide some data in text file? And let me know what is code page you are using?

  • The data in the text file is something like this:

    COLUMN0 COLUMN1

    yahoo 510297

    kijiji 376608

    facebook login 350253

    gmail 313800

    windows live 292609

    img wlxrs 286278

    hotmail.com 272570

    you tube 263347

    http://www.facebook.com 252276

    google.ca 237275

    yahoo mail 214191

    microsoft messenger 190257

    nbcsports msnbc 186310

    msn 184693

    image xboxlive 181117

    ebay 175051

    yahoo.ca 146628

    plenty of fish 134887

    google maps 134701

    canada 411 134323

    craigslist 129121

    yahoo.com 128963

    google.com 128295

    face book 126520

    youtube.com 117249

    mapquest 116769

    plentyoffish 111863

    img2 wlxrs 96811

    porn 93996

    westjet 91128

    air canada 88518

    mls.ca 88008

    As you can see the data in the 'COLUMN1' is striclty numbers(+ve Integers) while the data in 'COLUMN0' can be anything(text,symbols,numbers etc.)

    ..........and the data is tab-delimited.

  • Would it be possible to attach an extract from the actual file you're having difficulties with? A hundred or so records in the actual format the import is attempting to handle would useful.

    Typing the data yourself no doubt removes any of the problem data the import is encountering.

    BrainDonor

    Steve Hall
    Linkedin
    Blog Site

  • I had a similar problem. In regard to the truncation errors, what I initially did was create a staging table and set all my character fields to TEXT. That way, I was intentionally making them larger than my data so there'd be no truncation errors. As I gained more experience, I eventually learned how to use BCP to elegantly import text using proper column types and without using a staging table.

    As for your issue, I have two thoughts:

    • If your first column can contain "anything," you might want to check your collation to make sure it can handle it.
    • The numbers in your col2 look large, and it's possible (speculation on my part) that the int type might not be able to handle it. (I don't remember the int limit off the top of my head -- can anyone else expand on this?)

    Just my $0.02 . . .

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • ok. I will provide a sample.Actually I cannot open the original file because its about 830 MB in size and my notepad can barely handle it.I have tried opening it but it takes an eon for the notepad to respond and still doesn't open the file,So I have imported the data into Excel and though I couldn't import all of the data (as there's a Row-Limit in Excel),A large amount of it got imported and now I'm pasting over a 100 rows from the Excel sheet:

    chfi 5985

    alc 5971

    garth brooks announces he's coming out of retirement thursday 5968

    gamespot 5950

    osap 5944

    canada 5938

    go transit 5937

    saskatoon star phoenix 5934

    192.168.1.1 5920

    e-bay 5915

    cibc personal banking 5913

    starfall 5913

    charity village 5909

    windows update 5886

    http://www.kijiji.com 5886

    google. 5881

    kelli carpenter 5875

    http://www.googleca 5846

    gogole 5839

    649 5831

    future shop.ca 5828

    680 news 5823

    superstore 5811

    "berlin heart" + charles tillman + washington 5797

    disney 5795

    funny games 5790

    toyota 5774

    rogers yahoo mail 5773

    canada.com 5771

    job bank canada 5761

    750-pound shark 5747

    flight status 5742

    flyerland.ca 5740

    great west life 5732

    literotica.com 5729

    toronto weather 5723

    http://www.westjet.com 5713

    gthl 5708

    yahoo email 5696

    avon.ca 5694

    cat food diet 5691

    yahoo answers 5682

    torontostar 5681

    allegiant air 5676

    no frills 5676

    http://www.f 5675

    ford canada 5674

    faceboo 5672

    hoymail 5669

    file extension pps 5668

    big boobs 5658

    scotia online 5650

    globeandmail 5640

    future shop canada 5638

    http://www.hotmail 5638

    waiter jon-barrett ingels 5630

    cbs 5628

    rosie o'donnell kelli carpenter 5628

    rogers video 5627

    ikea canada 5620

    cam4.com 5616

    download.com 5602

    the province 5593

    canadapost 5584

    lotteries 5584

    kate gosselin 5568

    nasza klasa 5555

    southwest airlines 5552

    days of our lives 5549

    mass murderer dies 5548

    kate beckinsale 5547

    blackberry 5510

    facebook applications 5500

    halloween costume ideas 5496

    perezhilton 5496

    plenty of fish in the sea 5495

    surf the channel 5487

    google news 5486

    justintv 5480

    http://www.sympatico.ca 5438

    rogers.yahoo.com 5427

    aol mail 5426

    yahoo fantasy hockey 5412

    cbc sports 5410

    img microsoft 5408

    maria shriver apologizes 5408

    tim hortons 5402

    farmville 5395

    party packagers 5395

    rogers.ca 5385

    toyota canada 5384

    royalbank.com 5380

    yotube 5376

    caa 5362

    adam lambert 5357

    pbskids 5342

    the hun 5338

    sex videos 5332

    rona.ca 5331

    meteomedia 5324

    q107 5322

    webmail 5314

    silverdaddies 5312

    home depot.ca 5302

    soupy sales 5300

    vancity 5300

    tv shack 5299

    coronation street 5298

    dr oz 5297

    localtvmatters.ca 5291

    ups tracking 5291

    servicecanada 5289

    i3 microsoft 5283

    sarnia observer 5276

    shaw cable 5274

    slutload 5267

    new york times 5254

    virgin mobile 5242

    om 5241

    tdcanadatrust.ca 5241

    'jon & kate' 5237

    avg free download 5236

    hotwire 5214

    mail 5212

  • Sheez . . . that's quite the variety of data.

    Here's how I'd tackle it . . .

    • For the first column, make it a VARCHAR(MAX) column (equivalent to TEXT, which Microsoft is going to depreciate)
    • Make the second column INT

    One thing to bear in mind: the single quotes (') in the first column will likely cause problems, because T-SQL makes extensive use of them.

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • One other thought (speaking of stray characters): data cleansing is always a good thing. For example, since you're importing from Excel, it's probably using the TAB character as a delimiter. If any TAB characters exist in your first column, it's going to to halt your import.

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • @ray K

    you mean change the column data types for the 'Data Source'??? coz I can't.... the defaults are as follows:

    Column 0 - string [DT_STR]

    Column 1 - single-byte signed integer [DT_I1]

    ....and they remain the same after pressing the 'Suggest types'....

    and there're no varchar or nvarchar choices available in the drop-down..... am using import/export wizard BTW......

  • No. What I'm saying is change the destination to VARCHAR(MAX). That would make the column large enough to be handle variable-length character data to whatever the upper limit is (I don't remember the limit off the top of my head). That should be enough to accommodate that column and take care of any truncation issues.

    Also (and if I'm mistaken with this, someone please correct me), I believe VARCHAR(MAX) is flexible enough to be able to handle different types of characters.

    At the same time, you also need to make sure your data actually can be imported. I had a situation where I was importing log files, and the format was changed in the middle of a month (specifically, an extra column was added). I had to make an adjustment in order to accommodate the extra column. But because these were log files, I wasn't allowed to make any edits to the data source files themselves.

    Your situation, however, looks like a different case. Your column contains inconsistent data, and I'm going to assume that you can safely edit the data source without compromising anything. Strange data characters (like the TAB character I mentioned before, for example) can stop an import in its tracks. That's what data cleansing is: making sure your data is in a consistent format so that it's usable by the database.

    I also mentioned the single-quote; that shouldn't be too much of a problem with imports (unless you're using it as a text qualifier), but it could be an issue with INSERTS, UPDATES, and SELECTS. (For example: "select * from TABLE where COLUMN = 'O''Brien'" -- note that you need two single quotes, because one single quote would result in an error.)

    Hope this helps. Good luck.

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • I just remembered something . . .

    VARCHAR(MAX) is equivalent to the TEXT type in SQL Server. You should change the column type to TEXT. However, Microsoft is depreciating this data type, and they suggest you should use VARCHAR(MAX) instead.

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • I've taken the data you posted and turned it into something that may actually resemble your data. Because you just copied it into your message it was one column, so I have attached a spreadsheet here that has two columns.

    In future try to include the precise data in the correct format so we're all working from the same starting point. It isn't always easy, but I also find that going through that process sometimes helps me find the answer before I complete the post.

    There's an article about this at: http://qa.sqlservercentral.com/articles/Best+Practices/61537/.

    The code I have pasted below is a very basic import from an excel spreadsheet into a temporary table. When you run this I would like to know if it gives you any errors. We should then be able to work from this starting point.

    I understand that you have very little SQL knowledge, so we'll keep it as simple as possible.

    The core of it creates a temporary table with two varchar columns of the maximum size that SS 2008 can create.

    Then it uses the OPENROWSET command to import from the spreadsheet (you can change the filename and location to whatever you require). This should import the data from the first sheet of that spreadsheet into the temporary table.

    The SELECT statement that follows is just to show whatever is in that temp table.

    With the number of records in your spreadsheet it may take a while. I've never tried this method for any reason other than curiosity, so I couldn't begin to guess.

    When you run this code you may get a message along the lines of:

    "SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', see "Surface Area Configuration" in SQL Server Books Online. "

    If you do see this then need to also run the code that is commented out. Remove the '--' from the start of each line. This changes the SQL Server configuration to allow the OPENROWSET command to run, and then removes the permission again at the end.

    The temporary table is just while we're messing about. You can create a permanent table instead of course.

    --drop table #ExcelImport

    --sp_configure 'show advanced options', 1

    --reconfigure

    --go

    --sp_configure 'Ad Hoc Distributed Queries', 1

    --reconfigure

    CREATE TABLE #ExcelImport(

    ImportText VarChar(MAX),

    ImportNumber VarChar(MAX))

    INSERT INTO #ExcelImport(ImportText,ImportNumber)

    SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',

    'Excel 8.0;Database=C:\TEMP\Imports\TestImport.xls;HDR=YES',

    'SELECT * FROM [Sheet1$]')

    SELECT * FROM #ExcelImport

    --sp_configure 'show advanced options', 1

    --reconfigure

    --go

    --sp_configure 'Ad Hoc Distributed Queries', 0

    --reconfigure

    Steve Hall
    Linkedin
    Blog Site

  • >RayK

    Hi,

    Thank you very Much for your suggestions:)

    >Braindonor

    Hi,

    Firstly,Thank you very much for your interest and patience.One couldn't have approached my problem in a better way,I suppose,and I appreciate it.

    Secondly, I've read the Forum Posting Ettiquette and I believe I now have a clue as to how to ask for help on a professional forum.Thanks,I'll make sure i follow them in the future:)

    Now--I've tried the code you have provided and was the greeted by the error you already predicted,So I removed the Comments from the code and tried to execute it and encountered the following error:

    Msg 102, Level 15, State 1, Line 2

    Incorrect syntax near 'sp_configure'.

    Msg 102, Level 15, State 1, Line 19

    Incorrect syntax near 'show advanced options'.

    Msg 15123, Level 16, State 1, Procedure sp_configure, Line 51

    The configuration option 'Ad Hoc Distributed Queries' does not exist,

    or it may be an advanced option.

    So,I also made a few searches for info about Surface Area Configuration and found that i can tweak the settings here: Server-->facets-->-->Surace area config. and While I couldn't find the 'Ad Hoc Distributed Queries' option, there was one which read 'AdHocRemoteQueriesEnabled' which was 'False' and I've turned it to 'true' and tried to execute the program and it popped the following error msg:

    OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "(null)"

    returned message "Unspecified error".

    Msg 7303, Level 16, State 1, Line 14

    Cannot initialize the data source object of OLE DB provider

    "Microsoft.Jet.OLEDB.4.0" for linked server "(null)".

    I guess both those options are not obviously interchangeable,after all....

    So, after that I've searched how to enable 'Ad Hoc Distributed Queries' and found some info here:

    http://msdn.microsoft.com/en-us/library/ms187569.aspx

    However, it was the same as what you have written in the code you provided.So, I don't understand why:

    1. it throws up a Syntax error when i try to exec ur code.

    2.why I can't see the 'Ad Hoc Distributed Queries' option itself in the Surface Area Cofig. window.

  • You were right - AdHocRemoteQueriesEnabled is the option you wanted. To see what the settings are for the various options you can run

    SELECT * FROM sys.configurations

    ORDER BY name

    and you should now see 'Ad Hoc Distributed Queries' with a value of 1.

    The only reference I've found for the Jet error message is at http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/9605444f-a6e1-40ab-be9f-f40362672039, which then points to http://support.microsoft.com/default.aspx?scid=kb;en-us;239114 for the latest version of the Jet database engine.

    I would suggest taking the steps listed in it to check you have the latest files installed. Be careful though - there are different downloads depending upon which version of Windows you are using. If in doubt and you have PC support staff where you are, ask them.

    BrainDonor

    Steve Hall
    Linkedin
    Blog Site

  • Hi folks,

    >braindodnor: Thank you Braindonor:-)

    I have finally got around the problem of importing the data into SQL db from the text file.Here's what I did,I split the 800 MB file into 30 smaller text files and tried to upload each one and when they failed to import again,I did a little fiddling around and found that the whole problem was with the text file 'ENCODING'...... all of them (including the acual 800MB original) were in ANSI text format and my sql import/export wizard accepted only 'UNICODE'. So was able to import some files after changing their encoding from ANSI to UNICODE.

    Now I've got two simple questions:

    1.)I know there are a few editors out there that can open very large text files(PFE,Gvim etc.),but is there one that would let me open my 800 meg file(ANSI) and save it as a 'UNICODE' text file?? . . .I haven't seen that option in any of the text editors I've used so far:-(

    or

    2.)I have 30 text files(of abt. 25 MB each) and each time I import one of them a NEW table is being created in the SQL db,So my question is : Is there any way to Append/Combine the tables after/before Importing so I would have one big table instead of 30 smaller ones???

    Can someone help me???Any help appreciated.

    Thanks in Advance.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply