Jobs for Data Scientists

  • Comments posted to this topic are about the item Jobs for Data Scientists

  • As ever a thoughtprovoking editorial - I've been trying to get into this soon-to-be-massive area for a while, but am not quite sure how to make the next step from BI to something more analytical. Bayesian Filtering and HMMs are often presented in a complicated way, but the concepts are not that bad - like explaining BCNF with lots of set algebra, when really what you need is a 'cookbook' approach with worked examples.

    SciPy could be well worth a look for anyone interested in mangling big datasets, as well as all the cloud map/reduce implementations out there. IBM Infosphere Streams also looks frighteningly like skynet, and is the most 'productised' real-time analytics solution I've yet seen.

    Currently there's a lot of theory and PR around, but if anyone knows of any good 'try solving this problem with a publically available dataset' sites please share!

  • We have several of these people in our workforce as project managers. They go though special training called Green Belt and Black Belt. They must use data to save the company money in order to get their certification.

  • Some more searching on this has turned up a very nice site for getting some practice in... with cash prizes for the overcompetent!

    http://www.kaggle.com/

  • I havently really posting much to SQLServerCentral, whether it be my lack of DB knowledge (super low level, maggot-sized, junior dba), or my higher interest in the topic presented by Steve (my name as well...the coolest possibly given to a spartan child 😀 ).

    But after seeing this editorial, I knew I should chime in, as if my opinion was necessary :-). I see some of you speak of strong business intelligence, with intense products from IBM and the like, as well as Green Belt, Black belt, quality assurance/logistics/secret ninjas in today's business world (they haven't fooled me yet). However, I wanted to point out that new frontier field, filled with "data scientists," is actually a pure mix of both, with a little bit of everything.

    I find it more often referred to as analytics, and there is a growing number of larger corporations that are building analytics departments, with Senior Analytics Directors, Analytics programmers, and so on. SAS is one of the most well known, well concentrated of these companies (go figure the statistical analysis company would be at the top).

    I have been attempting to break into this exclusive, up and coming field for about 2 years now, and am almost there. I definitely agree with all of you and Steve, having some DBA background really puts you at the forefront for the next Analytics position, but many companies are now looking for education specifically in Analytics. This is where it gets interesting: The new up and coming career field of Analytics is also making changes in Academia. Look into NC State Instute of Analytics, or the University of Tennesse Analytics. Both of these institutes are offering intense programs for graduates to propel you into this wonderful new field, and accept a handful of mixed applicants, made up of working professionals, recent grads, etc. I am trying for my 2nd time at NC State, and hope to have the chance to step into Analytics and maybe change the world a little (personal goal; some say I am over zealous).

    -Stephen

    *Apologies for any misspellings, run-on sentences, etc. I write how I converse.

  • "... I think that being a DBA who knows how to query data and find patterns gives you a head start ...

    Steve, where I am, Data Analysts are MBAs who are not afraid of SQL. I often help them with more complex or slow performing queries.

    Their favorite tools are Report Generator, PowerPivor and Excel.

  • stephen99999 (8/2/2011)


    SAS is one of the most well known, well concentrated of these companies (go figure the statistical analysis company would be at the top)

    SAS is great but requires lots of training to use. We have junior analysts that like Minitab because it's so much easier to use.

  • stephen99999 (8/2/2011)


    I am trying for my 2nd time at NC State, and hope to have the chance to step into Analytics and maybe change the world a little (personal goal; some say I am over zealous).

    Best of luck at NCState.

  • Wow, thanks! Gotta new title to add to my resume!

    Data Scientist... has a certain ring to it, doesn't it!

    I've been doing this for years now on data tables with over 5T rows!

    Before we build out a production database with new data we analyze each column for it's static and dynamic attributes. Here are a few of the tests we perform:

    * Cardinality

    * Distinct values with count (group by)

    * Data value min/max lengths

    * Data type (does it contain alpha, numeric, special characters?)

    * Numeric ranges (for numeric type columns)

    * Special characters (/,*-%$#!& etc.)

    * Garbage characters (tabs, CR, LF, etc.)

    * Type specific tests (is it a valid State abbreviation, a valid US ZIP code, etc.)

    The dynamic aspect of this analysis is how one month's data columns compare to the subsequent month's data columns, i.e., how the data changes over time.

    We also keep track of how our clients search the data to eliminate indexes that are no longer being used (doesn't happen very often) and add new ones as needed.

    There is nothing static about data. Even read-only data tables get new data from month to month and that data can fundamentally change to the extent that a schema change may be in order!

    To do all of the above we use a judicious mix of C++, C#, and T-SQL to groom our data. Of all these processes the longest step is always the index phase (Microsoft, take notice...). To bad they don't have an external batch index build routine that would build all your indexes with one pass through the data. You could specify that each index being built go to its own LUN for fragmentation and performance reasons. After this external process completes you would then "attach" these indexes to you database...

    Someone slap me and wake me up :blink:



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • Nice editorial. Very relevant topic.

    cengland0 (8/2/2011)


    We have several of these people in our workforce as project managers. They go though special training called Green Belt and Black Belt. They must use data to save the company money in order to get their certification.

    We too have green belts, black belts and master black belts but they are all on the business side as data users. Even though many use minitab and other tools, none of them are developers or under developer budgets. Until they are under IT budgets, I doubt we would have data scientists. The black belts really focus more on business processes and the data scientist part of their jobs is just a portion of what they do. I would love to see some actual data scientists or data modelers on our teams providing more useful output. Those sorts of analytics require more time and training than what most of our developers can do currently. I do not see the black belts transitioning to that sort of position as most of them are really more business process focused.

  • stephen99999 (8/2/2011)

    I am trying for my 2nd time at NC State, and hope to have the chance to step into Analytics and maybe change the world a little (personal goal; some say I am over zealous).

    Best of luck Stephen, both with NC State and with the world-changing. I often say similar things in pub rants and then feel mortified the next day, but there's definitely some truth in it somewhere!

  • Thank you all for the good luck. I am hoping to become a more active member of SQL Server Central, and attending a SQL conference in the near future!

    -Stephen

  • how to make the next step from BI to something more analytical. Bayesian Filtering and HMMs are often presented in a complicated way, but the concepts are not that bad - like explaining BCNF with lots of set algebra, when really what you need is a 'cookbook' approach with worked examples.

    Bayesian is used in prediction modeling in BI also known as data mining when used by regulated businesses as big pharmaceuticals raw math is required with mathematicians on staff to help as needed. I have worked on a project with 500 plus pages use case document with more than 70 pages of Gaussian math. I blame Microsoft for developers thinking BI and Bayesian are not related.

    And I am very confused how a cook book can replace the need for SET algebra in SQL. The reason SQL is used in simple data querying to complex reporting used to run Engineering depts. in Engineering companies to banks risk management and complex data cleansing. All of the above requires actual knowledge of SET algebra to create the needed solutions. The RDBMS vendors Extensions of SQL can be used without the algebra but it is needed to solve problems.

    And the belts are simple process certifications not related to actual implementable software development.

    Kind regards,
    Gift Peddie

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply