Google's Breast Cancer-Predicting AI Research is Useless Without Transparency, Critics Say (venturebeat.com) 24
An anonymous reader shares a report: Back in January, Google Health, the branch of Google focused on health-related research, clinical tools, and partnerships for health care services, released an AI model trained on over 90,000 mammogram X-rays that the company said achieved better results than human radiologists. Google claimed that the algorithm could recognize more false negatives -- the kind of images that look normal but contain breast cancer -- than previous work, but some clinicians, data scientists, and engineers take issue with that statement. In a rebuttal published today in the journal Nature, over 19 coauthors affiliated with McGill University, the City University of New York (CUNY), Harvard University, and Stanford University said that the lack of detailed methods and code in Google's research "undermines its scientific value."
Science in general has a reproducibility problem -- a 2016 poll of 1,500 scientists reported that 70% of them had tried but failed to reproduce at least one other scientist's experiment -- but it's particularly acute in the AI field. At ICML 2019, 30% of authors failed to submit their code with their papers by the start of the conference. Studies often provide benchmark results in lieu of source code, which becomes problematic when the thoroughness of the benchmarks comes into question. One recent report found that 60% to 70% of answers given by natural language processing models were embedded somewhere in the benchmark training sets, indicating that the models were often simply memorizing answers. Another study -- a meta-analysis of over 3,000 AI papers -- found that metrics used to benchmark AI and machine learning models tended to be inconsistent, irregularly tracked, and not particularly informative.
Science in general has a reproducibility problem -- a 2016 poll of 1,500 scientists reported that 70% of them had tried but failed to reproduce at least one other scientist's experiment -- but it's particularly acute in the AI field. At ICML 2019, 30% of authors failed to submit their code with their papers by the start of the conference. Studies often provide benchmark results in lieu of source code, which becomes problematic when the thoroughness of the benchmarks comes into question. One recent report found that 60% to 70% of answers given by natural language processing models were embedded somewhere in the benchmark training sets, indicating that the models were often simply memorizing answers. Another study -- a meta-analysis of over 3,000 AI papers -- found that metrics used to benchmark AI and machine learning models tended to be inconsistent, irregularly tracked, and not particularly informative.
Models memorizing answers (Score:2)
Pretty much what I expected. And that issue gets worse with more powerful "AI" computing and larger training data. GIGO applies.
Re: (Score:2)
These systems also tend to get limited data too. I was working with an other Big Tech Firm that was trying to do something similar. I remember these "experts" from the company rejecting me for implying we should also feed in Financial Data as well, as they called it irrelevant. While real doctors do indeed see what Insurance a patient has, as well their balance to help make the best health plan for them. If Drug X isn't covered but Drug Y is. and Drug X is only 5% more effective. Y may be a better ch
Re: (Score:2)
Well, for the treatment, this is likely better. But think of the bad press this may generate and you can see why they do not want that. "AI" is at least partially a scam these days and those that push it do not want that exposed.
Re:Models memorizing answers (Score:4, Interesting)
And that issue gets worse with more powerful "AI" computing and larger training data.
No, this is wrong. The best cure for overfitting is more data. The problem does NOT get worse with larger training sets, the problem gets smaller.
There are also standard methods for quantifying overfitting: Segment your data. If you have 90,000 boob x-rays, you train on 80,000 and use the other 10,000 as a validation set. The validation set is not used in training and thus can't be "memorized" by the ANN.
Re: (Score:2)
This is not simply overfitting.
Re: (Score:2)
This is not simply overfitting.
"Models memorizing answers" (your words) is exactly what overfitting is.
Re: (Score:2)
Next story (Score:1)
So-called 'AI' and 'transparency': doesn't exist (Score:4, Insightful)
Re: (Score:2)
Many humans experts also have difficulty explaining the exact steps they used to reach their conclusions. Intuition is often better than deliberation.
Would you prefer a medical treatment that works 50% of the time using a well-understood mechanism, or a treatment that works 99% of the time for reasons that are not understood?
Both humans and ANNs should be judged by their results, not by the transparency of the process.
Re: (Score:2)
Re: (Score:3)
Call it whatever you want—artificial intelligence, machine learning, computational filtering, Bob—the fact remains unchanged that you can't just write something off as "unreliable" because you're unable to explain it. Steel was reliable before chemists discovered why it was that coal forging made iron stronger. The modern (dimpled) golf ball was reliably producing longer shots than the smooth ones that preceded it without anyone being able to explain why. There are gobs of examples of medication
Re: (Score:2)
Re: (Score:2)
The very first example I gave addresses your point: blacksmiths didn’t understand why steel was what it was, but it was still reliable. Devs today may not be able to tell you how a particular neural network works as it does, but they can still be reliable.
You could make an argument for untrustworthy, perhaps, but reliability can be empirically measured, and we already know that many of these systems are reliable.
Re: (Score:2)
Devs today may not be able to tell you how a particular neural network works as it does, but they can still be reliable.
Unreliable/untrustworthy/whatever. If they're relying on some shitty software to decide whether I have cancer or not, then I say "FUCK, NO!". Same goes for so-called 'self driving cars'. It's crappy, over-hyped software that really doesn't deliver when it needs to the most.
Transparent Breasts . . . ? (Score:3)
SUBSCRIBE
Do they want to sell it as a product? (Score:2)
Google does not really have a track record of turning the thing it's trying out into products, so I don't know where they want to go with this. But to me this looks like something they should sell.
Scientific Value vs Medical Treatment Value (Score:2)
It's clear that for this to have scientific value then it needs to provide detailed discussion of the rules, etc it is using.
However to have medical value it doesn't need this. It can be a black box. All you need is the data showing that the black box produces better results than radiologists--and of course this must be on several different sets of data than the data the system was trained on.
Radiologists' Trade Union (Score:2)
They do not like their cozy entitlements threatened, they may be forced into smaller yachts. The cost of medical treatment may be lowered, the horror.
How do reviewers review? (Score:2)
"Ok, the authors say it's true, and the numbers look nice".
What a load of garbage.
Lots of things humans do without knowing how (Score:2)
My favorite is chick sexing. Chicken farmers like to know the sex of their chicks before it becomes obvious, so as to avoid feeding male chickens if they want eggs. But this can be very difficult. Some people however have a learned skill to tell the gender without being able to describe what they are doing.
As for AI involved in medical detection, it has a long history of disappointment. People keep creating AI that works in their lab, successfully beating doctors, but when they try it in the real world
Google lying (Score:1)