How can you tell if a file has finished downloading?

  • Currently we have a process where 4 files are ftp'd down to our server. We have DTS package which tests for the existence of these files and once these exist, the DTS package loads the data from these files into a SQL database. Sometimes, however, though the files exist they haven't finished being downloaded so that when the DTS package tries to process them the package is saying the files are empty (though they're not).

    Thanks for any help.

  • Not sure exactly how you would do it in DTS but I'm thinking that if you built some error logic around the fact that the files are empty and that the file modified date is up to snuf, that would be your ticket...

    When I do this type of thing in T-SQL, I will sometimes make a directory dip using xp_CmdShell (no security lectures for the SA, please) and that will tell me the length of the files (0 bytes means empty), if the files exists, and what the date of the file is...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Here's the example xp_CmdShell code I will sometimes use... note that I always include the hardcoded DOS command DIR and I always include the path in double quotes to help prevent an injection attack even though this proc can only be executed by SA privs.  I paranoid but I'm not sure I'm paranoid enough...

    --=======================================================================================

    --===== Get the directory information and the LastModifiedDate for lines with files only.

    --=======================================================================================

    --===== Setup to do a "DIR" with the following switches

         -- /TW  = Date/Time file was last written to (LastModifiedDate)

         -- /-C  = List number of bytes without commas

         -- Enclose the @pPath variable in quotes to prevent SQL Injection attacks

        SET @Command = 'DIR "' + @pPath + '" /TW /-C'

    --===== Execute the "DIR" command and save the output in #DosOutput

         -- (order preserved by the Primary Key)

     INSERT INTO #DosOutput (Data)

       EXEC Master.dbo.xp_CmdShell @Command

    --===== Parse the Dos output into the file info table.

         -- The criteria in the WHERE clause ensures only file info is returned

     SELECT

            IDENTITY(INT,1,1) AS RowNum,

            SUBSTRING(Data,40,256) AS [FileName],

            CAST(NULL AS DATETIME) AS CreateDate, --Populated on next step

            CONVERT(DATETIME,SUBSTRING(Data,1,23)) AS LastModifiedDate,

            CAST(SUBSTRING(Data,22,17) AS BIGINT) AS Bytes

       INTO #FileInfo

       FROM #DosOutput

      WHERE SUBSTRING(Data,15,1) = ':' --Row has a date/time on it

        AND Data NOT LIKE '%<DIR>%'    --Row is not a directory listing

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Can you change the process? How about when the FTP if finished in the script to rename the files, then have the DTS project look for the new name?


    Shalom!,

    Michael Lee

  • It sounds as if the FTP process is separate from the DTS package. Why not add FTP tasks to the beginning of the DTS package? That way, it will be a synchronous process, and you won't have to setup a wait loop to make sure sure the files are completely downloaded.  

    Sometimes I use a semaphore file method in cases similar to yours. One way to do that is to write a 1 byte text file with a fixed filename to the same folder used to store the downloaded files after the ftp process completes.  Your DTS package would have to wait until that file exists before proceeding. When your DTS package completes, it would delete the semaphore file. Of course, you would want to add error checking, and a timeout feature so your DTS package doesn't wait indefinately.

     

     

  • Thanks for all your replies. Very helpful.

    The FTP process is separate from the DTS package. We're actually going with Michael's Lee suggestion that the FTP process renames the file after it's finished downloading, and amend the DTS package to look for the new file name (or for the FTP process to rename it back to the original name (ie 2 renames)  so that the DTS package doesn't have to be changed).

  • You might consider having the ftp process send a fifth file, use this fifth file as the indicator that your other four files have completed when the DTS runs.  this new file doesn't have to have much of anything in it, its existance tells the DTS package that the other four files are ready to be uploaded into the database.

  • Hi Bill

    Your idea has also been suggested as a solution, in addition to the renaimg of files, to the powers that be at my workplace. They'll decide what solution they'll do with.

    Thanks for yoru suggestion.

  • There are 2 other alternatives.  The first is to write a Service program that monitors the ftp log file (a text file) looking for a successful ftp of each of the files, and when all the files have been successfully received, the service could start a job in SQL Server that runs the DTS package to import the files.  The second alternative, would be similiar but would have the ftp service log the transfers to a SQL database table, and a trigger on the table would start the SQL job after all the files have been successfully transfered.

    At my last job, we used the second method.  We had two servers, A and B.  Server A had the production databases, and Server B was an Web/FTP server that also ran SQL Server.  We had the FTP server logging all transfers to a table in an FTP database and established triggers for all of the files that were transferred to Server B (we used virtual directories so the files actually resided on Server A), and when the files were successfully transferred, the triggers on the table ensured that the proper job on Server A was started to import the files.  As new files were added for import into the databases, we just added new triggers to the FTP table.  This worked great, except when the sys admin turned off SQL Logging for the FTP server, but once restarted, we never had any problems with our nightly or near real-time data transfers.

    Lynn

    Lynn

  • This is the method I would suggest.

  • Thanks for your suggestions Lynn and Antares686.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply