Data Mining

  • This has to be an interesting data mining project. With all of the controversy over digital music, DRM, piracy, etc., now we have an investigation into the pricing of the producers of music and their sales to online retailers.

    I know that most of the investigation of sales takes place by someone wading through records, but these days I suspect that there's a DBA somewhere charged with loading various sets of data and then querying it. A perfect job for SQL Server and DTS/SSIS. I'm not sure the quick, ad-hoc loads I want to do with SSIS. It still seems too complicated for quick and dirty loads, but it does have great flexibility to read a variety of formats that DTS doesn't.

    And how interested a job would it be? Analyzing data for court cases, looking for patterns from seemingly unrelated sets of data, that would be cool. It's the kind of job I've wanted to do for a few companies, but they never felt that deep analysis of sales or marketing data was worth the time and cost.

    Course, my statistics background is a little weak, so I'm not sure I'd get the job. But maybe someone out there wants to start a business to do this? A data analysis company to mine data for lawyers? It would fly in the US, given all the lawsuits we have each year.

    And I don't even need a referral fee 🙂 Just a job and some training to work on some fun projects.

  • I used to have a job with a credit card company doing something very similar - analysing the credit card transactions looking for fraudulent or suspicious patterns. And yes, it was quite fascinating and rewarding.

    But I was eventually replaced by an expert system (a neural network running on Unix), and I suspect this type of system would also do the mining of music sales more efficiently than a human, too.

    Sad but true!

    John F.

  • I put forward that the samples they would use probably will not be exhaustive.  Although they would need the tables of data, for demonstrating tha strong point a few dozen clear examples of new music prices versis old music prices would suffice.

    Especially with itunes using a fixed price no matter what year the song.

    On another note, I would feel pretty powerful were I to be replaced by a neural network.  In contrast to being replaced by a tame of monkeys or a voice mail system.

    It would look pretty good on a CV.

  • I think that the expert systems probably do a better job, but they tend not to "think outside the box" in new ways. They do their job, following their guidelines extremely well, but when you want to change the POV, it need a human touch.

    Probably need to go get my PhD in stats, huh?

  • Just think in Chi square tests. 

    Which I think is where chi tea comes from.

    Chi delta tea, on the otherhand is the statistical rate of change of the cup of tea being emptied.

     

    http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html

  • "tame of monkeys" ???

    http://www.dict.org/bin/Dict

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Must be an english saying... I picked up along the way.

  • When I was in the front room where there wasn't enough to swing a cat.  But I might just be two and eight.

    If you want more fun like that .. watch 'Lock, Stock and Two Smoking Barrels ' and 'Snatch'.  I embarrassingly can understand Brad Pitt from meeting gypsies across the pond in my youth.  And by the way he wanted a 'blue caravan for his Ma'

  • And he asked if "ye licked dags"

  • Whenever I need a break from the tedium of routine I come to this site...and it never fails to pique my interest...I too honed in on "tame of monkeys" and moved on since it was irrelevant to the "subject"...funny how certain phrases catch your eye much more than others....







    **ASCII stupid question, get a stupid ANSI !!!**

  • Irrelevant or irreverent?

  • aye, meh licks dags too.

  • You're the most irreverent person I know so I'll nab "irrelevant"...always wise to stick to the things you know/do best eh..







    **ASCII stupid question, get a stupid ANSI !!!**

  • I worked briefly with an old neural net called 4Thought.

    It "learnt" from the data that was supplied to it but it really was a garbage in garbage out thing.

    I used to do a lot of work with CHAID (Chi squared automatic itteration detector) which compared groups of characteristics and chose the most significant group. This was fascinating but it still required a human to choose which characteristics were used.

    The danger would be that if you put in factors which were closely related to each other then you would get a false emphasis on certain characteristics. For example, fuel economy of a car is dependent on size, weight, engine size, engine type etc. However size and weight are closely correlated so you would run the risk of the model emphasising both at the expense of a better factor. It takes a human judgement to specify a decent model.

    Another problem is that people tend to go with gut feel and get rather upset when the stats just don't back them up. Worse still, when the stats seem to confirm their prejudices they overweight the importance of those stats ignoring any anomolies.

    Science is littered with such examples. James Lovelock's Gaia theory being a good example.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply