Researchers Are Training Image-Generating AI With Fewer Labels (venturebeat.com) 18
An anonymous reader shares a report: Generative AI models have a propensity for learning complex data distributions, which is why they're great at producing human-like speech and convincing images of burgers and faces. But training these models requires lots of labeled data, and depending on the task at hand, the necessary corpora are sometimes in short supply.
The solution might lie in an approach proposed by researchers at Google and ETH Zurich. In a paper [PDF] published on the preprint server Arxiv.org ("High-Fidelity Image Generation With Fewer Labels"), they describe a "semantic extractor" that can pull out features from training data, along with methods of inferring labels for an entire training set from a small subset of labeled images. These self- and semi-supervised techniques together, they say, can outperform state-of-the-art methods on popular benchmarks like ImageNet.
"In a nutshell, instead of providing hand-annotated ground truth labels for real images to the discriminator, we ... provide inferred ones," the paper's authors explained. In one of several unsupervised methods the researchers posit, they first extract a feature representation -- a set of techniques for automatically discovering the representations needed for raw data classification -- on a target training dataset using the aforementioned feature extractor.
The solution might lie in an approach proposed by researchers at Google and ETH Zurich. In a paper [PDF] published on the preprint server Arxiv.org ("High-Fidelity Image Generation With Fewer Labels"), they describe a "semantic extractor" that can pull out features from training data, along with methods of inferring labels for an entire training set from a small subset of labeled images. These self- and semi-supervised techniques together, they say, can outperform state-of-the-art methods on popular benchmarks like ImageNet.
"In a nutshell, instead of providing hand-annotated ground truth labels for real images to the discriminator, we ... provide inferred ones," the paper's authors explained. In one of several unsupervised methods the researchers posit, they first extract a feature representation -- a set of techniques for automatically discovering the representations needed for raw data classification -- on a target training dataset using the aforementioned feature extractor.
Headline, meet story (Score:2)
Researchers Are Training Image-Generating AI...
Story:
The solution might lie in an approach proposed by researchers at Google and ETH Zurich...
The researchers aren't training anything. They just hypothesized that it might be possible to use AI to train AI. Then their heads exploded.
So statistical classifiers fail on complex things? (Score:2)
Who would have thought. Oh, right, I learned that about 30 years ago at university in my CS studies.
Re: (Score:2)
This type of "AI" is really not more. Non-statistical approaches are different, but about as "intelligent".
Algorithms all the way down (Score:3)
So we're training machine learning algorithms with data that was generated by machine learning algorithms?
And we're using those algorithms in situations where we didn't have much data, which may often mean they are complex situations?
This sounds like a bias-factory, breaking some kind of law of entropy.
one step forward, two steps back (Score:2)
My comprehension regressed upon encountering this sentence.
Firirre Turukcs (Score:2)
Those artificially generated fire trucks sure are funky looking. They immediately stand out as fire-trucks, but as you look more closely, they have weird details in weird spots, and duplicate things that shouldn't be duplicated in practice. iLSD or a transporter accident.