ETL

  • ETL

    I've followed the development, release, and subsequent news on SQL Server 2005 for over two years now. It's been an interesting journey and I've had the chance to see some evolutions in the product over time as well as observe the reaction to different subsystems.

    At the Colorado Code Camp, where I got roped into speaking, I was asked what I thought was the part of SQL Server 2005 that had the most impact. I'm not a heavy user of the product, but since I gather, edit, and follow the news I answered that Integration Services was one of the most written about, blogged about, newsworthy systems. It seemed more people were really excited about SSIS than anything else, despite the fact that I think the Service Broker and other technologies are "cooler" to me.

    So when I saw some people knocking SSIS, as well as this blog post by Jamie Thomson, I decided this would make a good poll.

    Is SSIS a professional data integration tool?

    I'm not a BI guy, and not deep into the ETL world like many of you out there. However it does seem to me that while it may not offer all the features of tools costing thousands of dollars, it does a great deal of the things I need in a professional ETL tool.

    Most of us don't deal with terabyte databases. We don't roll up data from 30 separate applications into one humongous data warehouse and then spin off a dozen cubes for different departments.

    I've done some of that on a smaller scale and I thought DTS was up to the task most of the time. Sure the error handling wasn't great and there were times we had to restart things and do some manual cleanup or script editing, but it worked very well for most of my needs.

    And from what I've seen of SSIS, it's more robust and does an even better job of being a professional tool I can use.

    As to whether it's professional enough or enterprise ready, I'd like to know what you think.

    Steve Jones

  • Oh,  Jamie Thomson... (he's never walking around in MSDN newsgroups, never in SSIS forums...)

    In my humble opinion SSIS is better than the old ETL model and give you a lot of features, everything in a nutshell framework. Although when you have SSIS packages with thirty or forty tasks management is low and spends much memory (I suppose Microsoft's guys keep in mind that)

     

    To tell the truth is hard at the outset when you comes from DTS universe and its liberties when you want some anything due to SSIS is a strict one tool regarding datatypes and more stuff..

     

    But we've built a Windows Service using Framework 2.0 which throw threads on-demand (loading SSIS packages according file system criteria) and the best part has been... SSIS. Easier and very manageable than any area of our application.

     

     

  • Well, in answer to the question, I'm going to fall back on the DBA standard answer, it depends. What's your definition of a enterprise class tool? Last year when SQL 2005 was being released our data warehouse team was trying to pick an ETL tool. They brought all the major vendors in house, including MS. We ran through the basics & then drilled down on the operations. With the exceptions listed in Jaime Thomson's post, impact analysis, data lineage and metadata exchange, SSIS held it's own with most of the major vendors and surpassed a few in terms of performance and ease of use. The only people that really liked some of the big-time third party ETL tools were the logical modeling team because it gave them impact analysis.

    The real problem with most of the big time ETL tools is that they require you to surrender control of your structures to them. Instead of running SQL code to update data structures we would have been required to go through the management tools of the ETL products in order for them to maintain all the fancy data lineage reports. Several of them just generated crappy code and we could be facing data loss during production rollouts, or we'd have to have an extended multi-phase rollout process that entailed building manual scripts to protect data, run the tool & let it break stuff, then run more manually built scripts to clean things up. It just wasn't worth having a report that showed a particular field was used in one transactional system, the warehouse, and one datamart, multiplied by thousands of fields. The widely used data is pretty well known.

    We're a mixed shop with Peoplesoft/Oracle apps, an Oracle warehouse (which several of us keep threatening to turn off one weekend & put SQL Server 2005 up in it's place... makes the Oracle people nuts... good times) and then a whole slew of SQL Server systems in production and in development with more being planned. It just makes sense for us to go with MS.

    Oh, the question... Yes, per our definition, it's an enterprise data integration platform.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • The ultimate ETL tool is (drum roll, please) the

    INSERT INTO xxx

    SELECT yyy

    FROM zzz

    I'm an Oracle guy mostly, but this is fairly constant. I find the graphic ETL tools extremely costly means of drawing hard-to-read pictures.

  • Stephen, I definitely agree with you.  Most of the ETL that I do is simply writing SQL.  But sometimes there's no linked server available and then I have to use DTS (we're on SQLServer2K).  My ETL is really just EL, nothing that complicated.

    From what I've seen, SSIS looks pretty cool though.  One of my biggest complaints with DTS is how a lot of things aren't very intuitive, you just have to know how to do them.  If they've put properties and options in more intuitive places then I'll probably look forward to using SSIS when we finally move to SS2005.

  • I've used SQL 2005 to build a complete ITIL CMDB system that draws from 46 different databases that include MSMSQL, Oracle, Sybase, MySQL, Informix, and Ingres.  Since we are tying everything by hostname and each system uses a different way to store this information (FQDN vs NETBIOS, etc..) there is a need for several different transofrms tobe able to link the data.  SSIs works great for this purpose.  I am then able to provide great reports to management using Reporting Services and then can even write their own reports.  Try that on any other ETL system!  I am then able to use the CLR integration to write VB.NET code to write information from my CMDB back into the original sources so they are updated daily with correct information.  I process over 10GB of data each night and do it in a matter of a few hours.  The most amazing prt of all this is that it toook a team of only three people just five weeks to design and implement the entire system.  SQL2005 is the best ETL system on the market because of the integration with VS2005, Reporting Services and IIS, and the ability to talk to just about any other database right out of the box.

  • Yes it is a professional data integration tool.  It can hold its own with the big 100k+/seat apps, and it doesn't cost 100k+/seat so its possible to intentionally use it in conjunction with the approach Stephen mentions.

    I've mostly been exposed to Informatica.  I can do more faster and as complex with DTS than any Informatica developer I've seen.  (And that also drives the Oracle/Informatica guys where I work nuts.)  And I don't need to hire a separate DTS developer to do stuff. 

    So maybe you'd need one of those big apps, if you're a WalMart or Safeway and you're tracking hundreds of millions of transactions and hundreds of dimensions.  But for your basic Fortune 1000 company well-thought out DTS and well-managed hardware are more than sufficient.

     

     

  • Somebody help me out here, what does ETL stand for?

  • ETL: Extract, Transform, Load

     

    Also thought of as get the data, clean the data, and load the data.

  • I'm an Oracle guy, don't know the SQL world very well, but I think it is more a question of writing vs. drawing code. Writing code will always be more efficient and quicker.

    Anyone ever notice with the graphical tools how the developers always drop down to PL/SQL or TSQL or whatever whenever there's any kind of complicated logic? Otherwise, the diagram looks like a terrible circuit diagram.

  • I am a data warehouse developer.  Even most people denied it, but one of the data warehouse expert Sid Adelman considered SQL Server 2000 DTS was an ETL tool.  I took his class in the TDWI - data warehouse conference. So SSIS should be considered as ETL tool with much more functionality.

    In my old company, they bought Abinito (because the data warehouse team VP was anti-microsoft).   We ran an extract using Abinito vs DTS, the performance was the same, except Abinito was very complicated to set up, it required an administrator and cost so much for each license. DTS is free if you buy SQL Server.  

     

  • I am currently working on a BI project with SSIS. This gives me 2 things that was most wanted in DTS for me i.e. error handling and good flow control. The project may not be very big but we have seen good performance with big xml files. In my previous BI app I used SPs to do all the transformations. But now it is completely in SSIS packages and with good performance. Trying to achieve a million records transformation / hour on my 2.4GHz / 512 MB machine.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply