AI Data

At the Microsoft Ignite conference recently, I saw a talk that mentioned the Microsoft Garage Project, Trove, which is designed to help people provide data for AI projects in a new way. You can read more about it and get the app for Android mobile devices.

Trove is built to help AI researchers find images and use them in projects. However, the data they get is provided by users, who make the choice to include their data. This is different than many AI projects, where anyone doing AI work often just gets data from various sources, sometimes without permissions, but often without the individuals who own the data understanding where their data is being used or for what purposes.

I like the idea here of people specifically giving permission for their data to be used. It's a good way for volunteers to provide data, and have some control over how the information you provide might be used and where it is used. That doesn't mean this is necessarily a good model for the future. First, I'm not sure we can easily verify that the images someone submits are their own. I could see that if there are payments made, I'm sure people will try to game this and earn more money by using images they don't own. We already have problems with people publishing content they didn't create. I'm sure we've have plenty more with something like Trove.

The other issue, and likely the biggest one I think is a problem, is that trying to understand what data is collected and how it's used by many companies is a challenge. Even when there is some disclosure, it can be difficult to understand what is being released. Even while reading this document on SQL Server data collection, I'm not sure what might be collected on my system that could be an issue.

I don't think this is malicious or deceitful on Microsoft's part, I'm just not sure I can understand the implications. That is where I feel we, as a society, and certainly with regards to regulations, are woefully immature. We don't have good controls, but I'm not sure we really know what we'd want.

This is a thorny problem, and one I know we need to find better solutions to over time. Especially as we use more and more data for large scale research and applications in areas such as Artificial Intelligence and Machine Learning.

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

Finding Legal Data

by Steve Jones

SQLServerCentral

Using data scraped from the web might be convenient, but is it legal. Perhaps more importantly, is it moral? Steve has a few thoughts.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2021-04-19

144 reads

Discuss

Don't Get On This Page

by Steve Jones

SQLServerCentral

There is a page where GDPR fines are tracked. None of us want to get on that page.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2019-08-22

305 reads

Discuss

Demo Data for Everyone

by Steve Jones

SQLServerCentral

Steve thinks having a known set of data for your system is one way to improve your software development process and make salespeople happy.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

5 (1)

You rated this post out of 5. Change rating

2019-08-14

323 reads

Discuss

The AI Manager

by Steve Jones

SQLServerCentral

Artificial Intelligence (AI)

AI software is being used to manage the daily work of some employees. Is this a trend that is good or bad?

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4 (1)

You rated this post out of 5. Change rating

2019-07-18

180 reads

Discuss

The Changing Nature of Data

by Steve Jones

SQLServerCentral

The way we look at data is changing, especially when data privacy and protection is considered. Today Steve has some thoughts on address data and the implications for cities as well as databases.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2019-07-16

201 reads

Discuss

AI Data

Rate

Share

Categories

Share

Rate

AI Data

Rate

Share

Categories

Share

Rate

Related content

Finding Legal Data

Don't Get On This Page

Demo Data for Everyone

The AI Manager

The Changing Nature of Data