Using the WMI Event Watcher Task in SSIS to Process Data Files

  • One thing to watch out for here with "processing a file as soon as it arrives" is that you don't know if that file is done copying or not. Let's say this was an FTP drop directory, or the user was simply copying a large file. Well, the FileSystemWatcher (and probably the WMI events) will trigger as soon as this file is created in the directory, even though it hasn't finished copying. So "processing as soon as it arrives" is not the right solution. We have done this by creating a Windows Service which uses the .NET TPL (Task Parallel Library) to queue files into a "monitoring" method which checks a file to see if it hasn't been updated in over 30 secs (which is configurable). Once the file hasn't been written to in that interval amount, we know it's "somewhat" safe to move it to a "processing" folder and begin working with it. There's a lot more to consider when doing projects like this.

  • Upsert (7/2/2012)


    kpatrick (7/2/2012)


    Some questions to consider:

    1. What kind of Windows Server security permissions does the identity running the SSIS package need in order to execute the WMI query? In a real world environment, I'd assume the file would be dropped to a separate server (maybe a network filer) which is where the WMI connection would need to be pointed, and my SSIS identity is going to be running with the absolute least privileges needed.

    2. What is the impact to your SQL server if you have a long-running SSIS package (inside a SQL agent job, presumably) like in your example and you need to either restart the machine or the SQL Agent service? .NET Windows services have events you can handle in your code that allow you to gracefully stop your file watcher. Will the restarts hang when the SSIS package is sitting there? Do you have to manually abort the SQL Agent Job?

    3. If a file doesn't get dropped for a couple days, what is the impact to the server to having a SQL agent job running for several days?

    1. My guess is that it would be running under NT\Autority. If set correctly on the sql server upon installation, then it can read files anywhere on the server easily.

    Can you please clarify? What are you saying would be running under "NT\Authority", the SSIS package or part of the WMI process, or both?

    3. I would instead just schedule the job agent to scan every 30 minutes if your timing is days...that will minimize server stress. You don't want to ever have a package open for days looking for a file.

    I tried to answer to some of kpatrick's concerns surrounding question 3 earlier in this thread. I tried to paint a proper picture of the tradeoff between memory and CPU in the article. It is my contention that using a technique that watches for files and reacts immediately actually reduces server stress when compared with the cumulative impact of the traditional start-check-and-exit technique. The technique where we watch for files may be a departure from the norm for many people, but I have not run into any technical reason why a package cannot be open for days, or even weeks, watching for a file to arrive via a SQL Agent job. Can you please elaborate on why you would avoid having a job running for days or weeks?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • SpongeBob (7/3/2012)


    One might event setup a SQL Agent "event" to watch for the file and then kick off the processing job as the response.

    Just a thought.

    That's a neat idea. I never thought of using the WMI functionality built into SQL Agent for this. We would still be on the hook to wait until we could gain exclusive access to the file in the SSIS package, but all of our "watchers" could be managed inside SQL Agent. Thanks for posting.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • tymberwyld (7/5/2012)


    One thing to watch out for here with "processing a file as soon as it arrives" is that you don't know if that file is done copying or not. Let's say this was an FTP drop directory, or the user was simply copying a large file. Well, the FileSystemWatcher (and probably the WMI events) will trigger as soon as this file is created in the directory, even though it hasn't finished copying. So "processing as soon as it arrives" is not the right solution.

    I addressed this in great detail in the article. Please see the Design section and Steps 11 through 16 of the demo section.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I'm a bit confused about a couple of things in this article.

    Firstly, you've defined a FileName variable of BusinessData.YYYYMMDD.xlsx which is implying that a file can arrive with a datestamped format and indeed your WQL query will pick up the file given you are using a prefix/extension with a like operator. However, the next step you have done is moving the file to a processing directory in which you use a file system task that references your User::DropFile variable which contains the YYYYMMDD. Am I to assume that the file task will automatically recognise the file name that WMI has found and rename appropriately?

    Secondly, the C# script you have provided here doesn't work in my environment - the Dts namespace is not recognised. I am using VS2008 and SQL Server 2008 so perhaps there is a different approach I would need to use?

    In any event, here is the problem I am attempting to address:

    We receive multiple times a day from our customer services team master files for various manufacturers. The excel file is in the same format in all cases, and will be named <Franchise> Master.xls. At this point in time we receive via email, save the file to a directory and initiate a process I have developed that imports, processes and distributes the file to our production servers. I want to automate the process so my database admin team doesn't have to get involved in most cases as it does take up a good portion of our time.

    I've got the bit about WMI firing off when it finds a file (save for the script to check when it's available) but I'm curious as to how to move the file to the processing directory given the search is a wildcard search and doesn't return the actual name of the file it finds. I.e how would I determine that the file that was dropped is named AAA Master.xls or BBB Master.xls?

    Any insight you can provide would be much appreciated.

  • You can completely ignore my post.

    I just noticed that the script was setting the output variable FileName containing the file it found, and since my script wasn't working that wasn't being set correctly.

    Furthermore, 2 seconds after posting my reply I got the script to work so I'm all sorted now.

    Always the way.

    Great article though, thanks.

  • I'm working with this.... I have a question

    So I'm trying to watch for new files... when I see one I kick into a For Each Loop Container that processes as many files as are there when the for each loop container runs.

    So to test I copy 30 files into the directory.

    As it is running, I copy an additional 20 files into the directory.

    When the For each finishes with the first 30 files it then returns to a waiting state. Unless I copy even more files into the directory after it finishes, the other 20 files don't get processed.

    Is there a way around this?



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • mtassin (8/17/2012)


    I'm working with this.... I have a question

    So I'm trying to watch for new files... when I see one I kick into a For Each Loop Container that processes as many files as are there when the for each loop container runs.

    So to test I copy 30 files into the directory.

    As it is running, I copy an additional 20 files into the directory.

    When the For each finishes with the first 30 files it then returns to a waiting state. Unless I copy even more files into the directory after it finishes, the other 20 files don't get processed.

    Is there a way around this?

    The WMI __InstanceCreationEvent only fires when new items are created and only after the event is initially registered, so no, not with the WQL in the article. You could implement a Script Task ahead of the WMI Event Watcher Task to recognize existing files, pickup the initial file name and immediately pass control to the Move Data File To Processing Directory Task. Another option is to switch from the WMI Event Watcher Task to the Konesans File Watcher Task which natively handles the case you're describing by setting the FindExistingFiles setting. My follow-up article:

    Using the Konesans File Watcher Task in SSIS to Process Data Files[/url]

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks,

    The Konesans' task is just what I needed.

    Normally I'm pretty nervous about 3rd party plugins for SSIS, but this is the right tool for the right task in this case 🙂



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • mtassin (8/21/2012)


    Thanks,

    The Konesans' task is just what I needed.

    Normally I'm pretty nervous about 3rd party plugins for SSIS, but this is the right tool for the right task in this case 🙂

    I look to use built-in tasks for most everything but in many cases the Konesans File Watcher Task (FWT) can be the right tool for the job. My only comment taking away from the FWT is that it is closed source so it is a tough sell in most corporate environments. In that sense the fact that it is free is actually a detriment. Corporate managers seem to take comfort in paying for closed source products because it implies the seller will support them adequately. As usual it depends on the product and the timing. I have found Darren Green to be very responsive to general questions, bug requests and releasing new versions. I would say far and away better than almost all for-pay software vendors from which I have purchased products.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I've just run into one thing with it... I figure I'll ask you before I go to Darren.

    I left a process with the filewatcher watching a remote share

    \\server\share

    We copied 2000+ files into the directory.

    The SSIS package was setup as

    FOR LOOP on true == true

    FWT

    FOR EACH LOOP on directory

    Scrip Task (used as a placeholder for conditional execution)

    Two different Sequence tasks (one for each condition with different flat file layouts) Each looks roughly like this

    Data Transform Process

    File Task (Delete)

    Over night it cleared the directory out. when I came in the directory was empty. But the process kept looping through running the Data Tranform Process, then the File system task to delete a file that wasn't there. It would then go back to the filewatcher (which would go from yellow to green again) and repeat the process.

    When I start up the task, it does exactly what it should do.. it sits there and waits for a file... after that... it seems to go into loopy-land. Any ideas?



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • mtassin (8/22/2012)


    I've just run into one thing with it... I figure I'll ask you before I go to Darren.

    I left a process with the filewatcher watching a remote share

    \\server\share

    We copied 2000+ files into the directory.

    The SSIS package was setup as

    FOR LOOP on true == true

    FWT

    FOR EACH LOOP on directory

    Scrip Task (used as a placeholder for conditional execution)

    Two different Sequence tasks (one for each condition with different flat file layouts) Each looks roughly like this

    Data Transform Process

    File Task (Delete)

    Over night it cleared the directory out. when I came in the directory was empty. But the process kept looping through running the Data Tranform Process, then the File system task to delete a file that wasn't there. It would then go back to the filewatcher (which would go from yellow to green again) and repeat the process.

    When I start up the task, it does exactly what it should do.. it sits there and waits for a file... after that... it seems to go into loopy-land. Any ideas?

    From what you have posted it certainly sounds like a bug in the FWT that only happens after an initial execution. Is FindExistingFiles True on the FWT? Were there 0 files in the dir that satisfied the mask or 0 files at all?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • There were 0 files in the directory. Existing files were set to true.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • I have been trying to get this to work on SQL 2008 R2 on Windows 7 (servers are Windows 2008 Server, so should be the same), but I can't seem to get past an error and can't figure out what is going on.

    I have copied your article exactly, and the first issue is that I get an exception when I try to set the Expressions on the Excel part of the Data Flow Task, it will not allow me to get out of design time even, but forget this for now as I wanted to see if I could get the first couple of steps running regardless, so I dropped this part and just ran the first two steps to see if I could get the WMI Event Task working. I immediately run into the error:

    [WMI Event Watcher Task] Error: An error occurred with the following error message: "Invalid parameter ".

    I have checked all the Variables against your article ( I know it was SQL 2012, but can't see how this would casuee the issue I am seeing ), I also ensured that I am running Visual Studio with elevated permissions, but I just can not seem to get this working or get past that error.

    Any help on what to look at next would be most welcome as I am stuck ( the error message is less than helpful as per usual ).

    Thanks

  • I know you checked, but all I can offer for the error is that the times I or myself have experienced the 'Invalid Parameter' issue it has been related to a problem with the expression that builds the WQL.

    I'll try to carve out some time to try the demo using SSIS 2008 on Windows 7 and report back.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 15 posts - 16 through 30 (of 50 total)

You must be logged in to reply to this topic. Login to reply