A Group of Ex-NSA and Amazon Engineers Are Building a 'GitHub For Data' (techcrunch.com) 21
A group of engineers and developers with backgrounds from the National Security Agency, Google, and Amazon Web Services are working on Gretel, an early-stage startup that aims to help developers safely share and collaborate with sensitive data in real time. TechCrunch reports: It's not as niche of a problem as you might think, said Alex Watson, one of the co-founders. Developers can face this problem at any company, he said. Often, developers don't need full access to a bank of user data -- they just need a portion or a sample to work with. In many cases, developers could suffice with data that looks like real user data. "It starts with making data safe to share," Watson said. "There's all these really cool use cases that people have been able to do with data." He said companies like GitHub, a widely used source code sharing platform, helped to make source code accessible and collaboration easy. "But there's no GitHub equivalent for data," he said.
And that's how Watson and his co-founders, John Myers, Ali Golshan and Laszlo Bock came up with Gretel. "We're building right now software that enables developers to automatically check out an anonymized version of the data set," said Watson. This so-called "synthetic data" is essentially artificial data that looks and works just like regular sensitive user data. Gretel uses machine learning to categorize the data -- like names, addresses and other customer identifiers -- and classify as many labels to the data as possible. Once that data is labeled, it can be applied access policies. Then, the platform applies differential privacy -- a technique used to anonymize vast amounts of data -- so that it's no longer tied to customer information. "It's an entirely fake data set that was generated by machine learning," said Watson. The startup has already raised $3.5 million in seed funding. "Gretel said it will charge customers based on consumption -- a similar structure to how Amazon prices access to its cloud computing services," adds TechCrunch.
And that's how Watson and his co-founders, John Myers, Ali Golshan and Laszlo Bock came up with Gretel. "We're building right now software that enables developers to automatically check out an anonymized version of the data set," said Watson. This so-called "synthetic data" is essentially artificial data that looks and works just like regular sensitive user data. Gretel uses machine learning to categorize the data -- like names, addresses and other customer identifiers -- and classify as many labels to the data as possible. Once that data is labeled, it can be applied access policies. Then, the platform applies differential privacy -- a technique used to anonymize vast amounts of data -- so that it's no longer tied to customer information. "It's an entirely fake data set that was generated by machine learning," said Watson. The startup has already raised $3.5 million in seed funding. "Gretel said it will charge customers based on consumption -- a similar structure to how Amazon prices access to its cloud computing services," adds TechCrunch.
Will it (Score:2)
A policy on facial recognition?
The support of/for the US mil?
The ability to talk about DRM and crypto?
Re: (Score:2)
Re: (Score:2)
"The support of/for the US mil?"
If as a citizen I wish to enjoy the benefits and protection of the British empire it would be wrong of me not to help its defense.
Gandhi
China, FBI, IRS, Facebook and Google are greater threats than the US military.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
This could be a great service... as long as you could take it and install it internally with no connection to the company supplying it.
It's not clear how they're planning for it to work, but based on the comparisons they're making, the idea that large companies with sensitive data are going to want to push all that data to an outside provider who will store it out of their control in order to allow their internal folks to get an anonymized version of it sort of misses the point of only allowing testing with
Re: (Score:1)
Use code in the wrong political/mil/nation/gov/city/police way years later?
Some US brands CoC will follow projects, code and resulting productive work around looking for political issues.
Just wait (Score:2)
Facebook will acquire it.
Re: (Score:1)
Ex-NSA wants fake data... riiiiiight. (Score:3)
1. Once NSA, always NSA. Shouldn't be allowed to work in the private sector. They're spying on you when they do.
2. Fake data my butt. How many times have we seen "anonymized" data get de-anonymized?
How about no.
Re: (Score:3)
When will people learn... (Score:2)
Re: (Score:3)
Disregarding the PII issues, how would this not be a honey-pot for dictatorships, organized crime, etc.?
I propose a Github for adult material... (Score:2)
..oh wait... we already got that.
Yeah. Ex-NSA. Sure! (Score:2)
Re: (Score:1)
Mitgefangen, mitgehangen! (Score:2)
It's called SQL (Score:1)
open (Score:2)
one of the reasons why git is so successful and popular is because it's open source.
i don't see this taking over the industry by storm, unless they make it open source too.
the product isn't even unique, i've had several product sales pitches from multiple vendors that promise this kind of functionality.
It needs a sexy name..em...The Cloud (Score:2)
Damn! it's taken!