Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI Technology

Microsoft Developing a Tool To Help Engineers Catch Bias in Algorithms (venturebeat.com) 239

Microsoft is developing a tool that can detect bias in artificial intelligence algorithms with the goal of helping businesses use AI without running the risk of discriminating against certain people. From a report: Rich Caruana, a senior researcher on the bias-detection tool at Microsoft, described it as a "dashboard" that engineers can apply to trained AI models. "Things like transparency, intelligibility, and explanation are new enough to the field that few of us have sufficient experience to know everything we should look for and all the ways that bias might lurk in our models," he told MIT Technology Review. Bias in algorithms is an issue increasingly coming to the fore. At the Re-Work Deep Learning Summit in Boston this week, Gabriele Fariello, a Harvard instructor in machine learning and chief information officer at the University of Rhode Island, said that there are "significant ... problems" in the AI field's treatment of ethics and bias today. "There are real decisions being made in health care, in the judicial system, and elsewhere that affect your life directly," he said.
This discussion has been archived. No new comments can be posted.

Microsoft Developing a Tool To Help Engineers Catch Bias in Algorithms

Comments Filter:
  • Wrong Bias (Score:3, Insightful)

    by Anonymous Coward on Sunday May 27, 2018 @07:50PM (#56685752)

    Correctly read as: "Microsoft is developing a tool to help developers detect wrong bias in their algorithms."

    • Correctly read as: "Microsoft is developing a tool to help developers detect wrong bias in their algorithms."

      No that's bullshit, you're a fool for saying it and it's fools who modded you up.

      Unless you're claiming that all the input data is perfect then either you lack the knowledge to comment on the topic or you have an ulterior motive for adopting the attitiude you have.

    • More correctly read as: "Microsoft is developing a tool to help developers detect wrongthink bias in their algorithms."

      The article makes this clear;

      [...] data sets used to teach AI programs contain sexist semantic connections, for example considering the word "programmer" closer to the word "man" than "woman."

      Whatever the reasons, men make up the large majority of programmers. They want to purposefully make algorithms less accurate wherever they reflect a reality SJWs think shouldn't exist, even though it clearly does.

  • by Shemmie ( 909181 ) on Sunday May 27, 2018 @07:55PM (#56685772)

    to detect bias in algorithms, be used in an attempt to insert bias into algorithms, without detection?

    Just spit-balling here.

    • by ljw1004 ( 764174 )

      Couldn't a tool developed to detect bias in algorithms, be used in an attempt to insert bias into algorithms, without detection?

      Imagine an algorithm to roll a six-sided dice, and we define bias as anything where a given number appears more than 1/6 of the time on average, and a tool to detect bias works by running the algorithm a lot and checking frequencies.

      No there's no way this tool could be used to insert bias into algorithms without detection, by definition.

      So it all depends on what they mean by "bias" and what kind of tool they're writing.

      • by green1 ( 322787 )

        Except in reality it's probably more like an algorithm that rolls the dice 6 times, and complains that it's biased if it doesn't roll one of each of the 6 numbers. That's no bias, that's how random works.

        Thing is, the real world isn't random. And the people who make these things are likely to try to fit a random pattern on to non-random data. For instance, if you have 30000 males, and 10000 females in a particular data set, and you pick a random person from that data set 500 times, you'll likely pick approx

    • Equally, it can be used to avoid liability. You can say, "Maybe it's biased, but we did due diligence, it's not our fault!" Maybe though, maybe Microsoft is trying to avoid another Tay.
    • to detect bias in algorithms, be used in an attempt to insert bias into algorithms, without detection?

      Sure, but that's doing things on hard mode. Getting unbiased results out of machine learning is very ver hard as is because machine learning is awfully good at picking out on causative correlations. Unless your data is very good it's easy to get out utter junk.

      Now try finding a dataset about humans which doesn't have all sorts of non causative correlations in it.

  • by Citizen of Earth ( 569446 ) on Sunday May 27, 2018 @08:17PM (#56685854)
    The main problem with this endeavor is that the "bias" they are trying to suppress is actually the opposite of bias. They seek to treat people differently on the basis of identity politics instead of on their actual behavior. The AIs will naturally be confused by being disallowed to latch onto the strongest signals in the data.
    • by frank_adrian314159 ( 469671 ) on Sunday May 27, 2018 @09:09PM (#56686024) Homepage

      The AIs will naturally be confused by being disallowed to latch onto the strongest signals in the data.

      Uh not unless it's a really crappy AI. If you haven't noticed, chances are any human directive will be treated as that by the neural network - another signal that is larger/more salient because it is input by a human. Just the way that the system would be designed to do unless you want it completely independent of human control.

      In short, don't project your own human confusion about neural nets onto the technology just because you don't like the implications of human control of machines.

      • chances are any human directive will be treated as that by the neural network - another signal that is larger/more salient because it is input by a human.

        So you're basically saying the system will be unable to detect the explicitly fed-in bias.

      • by q_e_t ( 5104099 )
        I'm baffled that people who scream that correlation is not causation when it goes against their personal bias seem in favour of such confusion when it confirms it
    • Except no (Score:3, Insightful)

      by bug_hunter ( 32923 )

      From the article:

      Northpointe’s Compas software, which uses machine learning to predict whether a defendant will commit future crimes, was found to judge black defendants more harshly than white defendants.

      So that was an existing algorithm that judged somebody on how they were born rather than their individual behavior.

      • More harshly by some metrics, equitable by others. In the end comparing blacks and whites is apples and oranges. Blacks recidivism rates is fundamentally higher than whites and that has some unexpected impact on the statistics. You could arbitrarily force the false positive or negative rate to be equal by making race an input and using affirmative action, but that would degrade fairness in other ways.

        • . In the end comparing blacks and whites is apples and oranges. Blacks recidivism rates is fundamentally higher than whites

          It's not fundamantally higher. It's higher for two reasons, one is socioeconomic (poverty is higher on average) and the other is simple racism (the justice system is harsher on black people than white).

          . You could arbitrarily force the false positive or negative rate to be equal by making race an input and using affirmative action, but that would degrade fairness in other ways.

          t's not i

          • It's not in any way fair to bake existing structural racism into the algorithm because that's the way things currently are.

            If there's structural racism, that needs to be fixed, and then the algorithm will follow automatically.

            • If there's structural racism, that needs to be fixed, and then the algorithm will follow automatically.

              It will only follow if the algorithm is re-trained.

              At the oment, the algorithm trained with biased data is part of the problem.

              • by green1 ( 322787 )

                Any algorithm that isn't constantly updating it's data is useless outside of a one-time use anyway. So I would hope that the algorithm would update as the situation changes, no matter what way the situation changes.

          • You don't have to be rich to get married, nearly three fucking quarters of black kids are born to an unmarried mother. If you think that won't have impact on criminal behavior you're dreaming. The culture of the average black is thoroughly poisoned (as is the one of the average white, but slightly less so). Blaming it all on systemic racism and poverty is silly.

            Regardless, any difference in recidivism rate will cause the imbalances seen in the Compas result. Pick your metric (false negative rate for instanc

            • You don't have to be rich to get married, nearly three fucking quarters of black kids are born to an unmarried mother. If you think that won't have impact on criminal behavior you're dreaming.

              I think you've just demonstrated the point of the article: that's a non causitive correlation. The underlying cause is the lack of a stable family. That commonly manifests as not being married, but not being married is the symptom not the cause. It's perfectly possible to have a stable family without marriage and more

              • You haven't made a point, you mention that no racism should be baked into the algorithm ... but you refuse to mention what an unbiased algorithm and it's result would look like. So I merely made a statement.

                I'll do so again. Compass is close to the best you are going to get without affirmative action (and with the current set of inputs). If the algorithm is unfair, it's because life is unfair, no possible way to "improve" it without just adding "if black, reduce recidivism likelihood".

                • but you refuse to mention what an unbiased algorithm and it's result would look like.

                  Right, so because I, like the entire rest of the ML community don't know how to go beyond the current state of the art we should just not bother trying to correct flaws.

                  . Compass is close to the best you are going to get

                  You don't know that, because you don't know what algorithm it uses.

                  without affirmative action

                  Thi is the first time I've heard that not cracking down on black people merely because they're black called "aff

                  • You should have a relatively good idea what algorithm COMPAS uses from the independent attempts at replicating it's result in your community.

                    https://www.ncbi.nlm.nih.gov/p... [nih.gov]

                    When you have two wildly different approaches (human jury and SVM) produce nearly the same results and the same "unfairness" I feel rather safe taking as a working hypothesis that it is perceptional and actually a result of the underlying statistics when you purposely try to ignore race. If you want to bring false positive rates closer

                    • When you have two wildly different approaches (human jury and SVM) produce nearly the same results and the same "unfairness" I feel rather safe taking as a working hypothesis that it is perceptional and actually a result of the underlying statistics when you purposely try to ignore race.

                      The link you posted demonstrates that COMPASS is a complete shitshow. It's no more accurate than lay people with no expertise in criminal justice.

              • You're trying to refute my argument by reading a more extreme one than the one I wrote, then refuting that instead. I didn't blame it *all*, just a large amount . . .

                Well, you did, actually. You said: It's higher for two reasons, one is socioeconomic (poverty is higher on average) and the other is simple racism (the justice system is harsher on black people than white). You stated that those were the two reasons; you neither stated that there were others, nor that there could be. If that's not blaming
      • From the article:

        Northpointe’s Compas software, which uses machine learning to predict whether a defendant will commit future crimes, was found to judge black defendants more harshly than white defendants.

        So that was an existing algorithm that judged somebody on how they were born rather than their individual behavior.

        What if the prediction is accurate, though?

        I mean, it's a statistical prediction. That's the whole point. Of course you can't truly know what an individual is going to do. But you can make statistical predictions. And on aggregate, they can be accurate or inaccurate, to some measurable degree.

        It seems the problem here is not that the algorithms are wrong, but that they are, embarrassingly, right. They draw correlations that we are culturally required to ignore.

      • Re:Except no (Score:5, Informative)

        by russotto ( 537200 ) on Sunday May 27, 2018 @11:04PM (#56686398) Journal

        The COMPAS algorithm, while opaque, does not have race as an input. It was found its accuracy could be matched by an algorithm with just two variables: age and prior convictions. Even this simple model shows the same "bias" that COMPAS is accused of. The bias isn't in the algorithm; it's in the real world.

        • Re: Except no (Score:5, Interesting)

          by phantomfive ( 622387 ) on Monday May 28, 2018 @02:09AM (#56686916) Journal
          I've found that to be a problem in my attempts to make neural networks: too often a complex network can be simplified to just a few variables that, once found, can be hard coded. In some ways it's really depressing.
          • What if the NN (presumably with proper tools) helps you find those variables more quickly? It could still be worth to use it if it saves you some thinking time.
            • That would be a benefit. In most cases I've found that neural networks have been wholly inadequate for the task I've chosen, and another approach is better (for example, a standard natural language processor [nltk.org] with a strong domain processor to rank resumes. It is true you will get a small improvement at recognizing verbs and nouns with a NN without actually understanding meaning, but the improvement potential of building a solid domain model will make the NN look like a rounding error. You might say that usi
        • Re: (Score:2, Insightful)

          by AmiMoJo ( 196126 )

          "Prior convictions" and "future convictions" are too simplistic.

          For example, getting a minor drug possession conviction is rather different to one for murder. And the system is known to be far more likely to give young black men convictions for minor drug offenses than it is to give them to older white guys, even when the crime and circumstances are identical.

          So we have a situation where the algorithm would need to understand the severity of each conviction, the circumstances in which it was given, and the

        • Re: (Score:2, Insightful)

          The COMPAS algorithm, while opaque, does not have race as an input. It was found its accuracy could be matched by an algorithm with just two variables: age and prior convictions.

          The joker in that is the "prior convictions." If there was bias in how the subject was convicted in earlier cases, then the algorithm will codify that bias.

          • If both prior convictions and the measure of recidivism are biased, the algorithm will correctly use the prior bias to predict the future bias. This is indistinguishable from the case where no bias exists. The case where black people are erroneously and consistently measured as more likely to commit crimes when they aren't produces the same data as if black people are correctly measured as more likely to commit crimes. No useful race-blind algorithm can fix that; either you have to fix the bias in the da

      • Re:Except no (Score:5, Informative)

        by bitkid ( 21572 ) * on Monday May 28, 2018 @03:08AM (#56687040) Journal

        Slight tangent: The article cites the ProPublica study on the Northpointe software in which journalists (not statisticians) reported the software as biased. What they left out is that an independent study found this study showing bias to be wrong.

        Source: Flores, Bechtel, Lowencamp; Federal Probation Journal, September 2016, "False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.”", URL http://www.uscourts.gov/statis... [uscourts.gov]

        In fact the ProPublica analysis was so wrong that the authors wrote: "It is noteworthy that the ProPublica code of ethics advises investigative journalists that "when in doubt, ask" numerous times. We feel that Larson et al.'s (2016) omissions and mistakes could have been avoided had they just asked. Perhaps they might have even asked...a criminologist? We certainly respect the mission of ProPublica, which is to "practice and promote investigative journalism in the public interest." However, we also feel that the journalists at ProPublica strayed from their own code of ethics in that they did not present the facts accurately, their presentation of the existing literature was incomplete, and they failed to "ask." While we aren’t inferring that they had an agenda in writing their story, we believe that they are better equipped to report the research news, rather than attempt to make the research news."

        The authors of the ProPublica article are no longer with the organization, but this article shows up in any news article about AI bias. The fake story just doesn't want to die...

        With all that said, I have some hopes that algorithms will help make truly race-blind decisions in criminal justice. It's easier to test them for bias than humans, and decisions are made in a consistent, repeatable manner.

      • ... Compass software ... was found to judge black defendants more harshly than white defendants.

        ... that was an existing algorithm that judged somebody on how they were born rather than their individual behavior.

        No it wasn't. You are confusing data and process.

        The algorithm COULD NOT have arrived at that output *unless* the category of "race" was included in the data. If it had been excluded from the training data, then there's no way the algorithm could have associated "race==black" with higher criminality deserving of harsher punishments.

        If the DATA is scrubbed of bias, then the ONLY thing the algorithm can base it's decision on is individual behaviour.

    • Except that we specifically need separate test data from the training data. Otherwise you 'overfit' the training data.

      When your algorithm decides who goes to jail... the training data is now just a reflection of the algorithm. It's difficult to determine where the training data ends and the algorithm begins.

      If only people named "John" is arrested for murder and 100% of murder convictions become named "John", suddenly there is a strong signal that only people named "John" should be investigated. Rinse an

      • by PPH ( 736903 )

        If only people named "John"

        Why is a defendant's name an input for a sentencing algorithm?

        Others have raised the point that ML, when not supplied with racial information, might begin to redline certain neighborhoods where people of minorities tend to live. So then why is one's residence or location of the crime used as input? A bank was robbed. Never mind where. The use of a weapon is an aggravating circumstance. The robber (anonymized to remove the Chad/Tyrone bias) has committed similar crimes on N occasions. Here's the sentence ..

  • ... sewing machines.

  • Remember, Citizen: Equality means including an equal number of every ethnic and minority group, no matter their relative numbers in society.

    • Remember, Citizen: Equality means including an equal

      No, citizen, equality means not giving you a harsher conviction simply because people who look like you have been convicted in the past. What I don't really get is why you'e against true equality.

  • by misnohmer ( 1636461 ) on Sunday May 27, 2018 @10:48PM (#56686346)

    I've been reading stories in removing bias from algorithms but still don't get it. What is an algorithm bias? If the results don't have perfectly flat distribution across sex, race, religion, and other protected groups?

    • by Actually, I do RTFA ( 1058596 ) on Monday May 28, 2018 @12:46AM (#56686656)

      What is an algorithm bias?

      An algorithm that uses historic data, which was distorted by human bias, to predict future events. These reinforce human bias from the past. For instance, did you know that in 1864, practically no black people in the South ever paid a debt back? If you use that fact (which was, you know, caused by slavery) to figure that black people were higher credit risks, which meant higher rates, which meant more defaults, which meant worse credit, etc, your algorithm is biased.

      • Depends. If your algorithm determines credit score based on status as slave, that's perfectly reasonable. The problem is when it decides credit score on skin color.

      • How can the algorithm know that if race isn't used as an input?
        • There are algorithms that are 95% effective at determining race from name/age/zip code. Fact is, different groups have different ideas on good first names for babies, and tend to be geographically clustered.

          And, beyond that, there are a lot of ways to extrapolate race/gender/etc. from a dataset. Hell, knowing if you liked Glee on FB gets it right a significant percentage of the time.

          There either are confounds with race, or there are not. If there are no confounds, Microsoft's project will analyze the data

      • by Raenex ( 947668 )

        caused by slavery

        Ah, yes, slavery. White America's original sin. An eternal excuse for black crime, poverty, or whatever the grievance of the day is. Nevermind that whites were also enslaved in the Barbary slave trade, or that Europe arose from the Dark Ages, or that any number of people from any number of shit times rose above their position despite being disadvantaged.

        Nope, it doesn't matter that Japanese were mass interned in World War II, and essentially lost all their property, but rebounded. Asians are "people of colo

        • . An eternal excuse for black crime, poverty, or whatever the grievance of the day is

          I mean, I was talking about 1864, when it was still a big issue. Not contemporary, sure, but also the example I was using. You know, cause easy to understand

        • by q_e_t ( 5104099 )
          That other people did Bad Stuff (TM) doesn't excuse other bad things.
          • by Raenex ( 947668 )

            That other people did Bad Stuff (TM) doesn't excuse other bad things.

            Indeed. So let's not hear about slavery anymore when talking about black crime, okay?

            • by q_e_t ( 5104099 )
              I think you've missed the point. Bad Things (TM) that happened in another country are unlikely to be relevant. An arc of history that led up today in the USA may still have relevance. I'd agree, though, that moving on and dealing with the causes (mostly poverty and discrimination) would make more sense, even if historical context can be useful sometimes. It's not an issue that can be fixed overnight, though.
              • by Raenex ( 947668 )

                I think you've missed the point. Bad Things (TM) that happened in another country are unlikely to be relevant.

                Why? You can trace everybody's arc of history and find some "Bad Things". The point is that we don't play the forever oppressed game, when people all over have rose above their shitty starting position.

                I'd agree, though, that moving on and dealing with the causes (mostly poverty and discrimination)

                That's your assumption and playing the victim, denying self-agency and assigning the blame to others.

    • by AmiMoJo ( 196126 )

      For example, black people are far more likely to convicted over very minor drug offenses. White people are much more likely to be let off, sometimes by the cop choosing to ignore it or deal with it out of court. If it does get to court then the white person is like to get a much more lenient punishment.

      The algorithm comes into this system as it is, full of existing systemic bias. If the algorithm wants to be fair and avoid perpetrating that bias, it is going to have to examine each case in great detail. At

    • I've been reading stories in removing bias from algorithms but still don't get it. What is an algorithm bias? If the results don't have perfectly flat distribution across sex, race, religion, and other protected groups?

      That's because calling it "algorithm bias" is a category error. Algorithms can't be biased (unless explicitly so...)

      What they really mean is "data bias" or GIGO - but because people don't understand the difference between process and data, they're erroneously targeting the process for correction

  • If you are developing algorithms to predict let's say possible criminal behavior and it ultimately predicts higher crimes among those who actually commit more crime then you you have one of three choices 1) Keep it and use it responsibly or 2) Throw it away and eat your development costs or 3) Neuter it to the point of it not working, thus you fail.

  • by bangular ( 736791 ) on Sunday May 27, 2018 @11:26PM (#56686458)
    I think we have to be a little more formal with terminology. The summary and most articles these days use "algorithm" and "AI" interchangeably. You can use an algorithm to train a machine learning model, but the model isn't really an algorithm in the classical sense.

    The trained model can definitely have bias based on the training data. The classical example is, train a word2vec or glove model on the texts of wikipedia, then find the vector representations of doctor and nurse. You'll find that nurse is considered a female term while doctor is male.

    This may be acceptable for trivial things like advertising or movie suggestions, but machine learning is now being used for important things like job application screenings. Many times the model can be very opaque and this bias may not seem obvious. Even worse, it seems every company now wants to have AI in their product, and may have half-rate data scientists that graduated from a data science bootcamp.

    The research I've seen on this subject is serious work. In the case of the doctor/nurse vector representation, the goal would be to make the occupation gender neutral. The tricky part is that you'd still want the model to retain certain qualities, like mother being female and father being male.
    • by q_e_t ( 5104099 )
      The press reporting of this sort of science makes someone like me with a background in the science cringe on a regular basis. Like percentage accuracy and never a ROC to be seen
  • Just as scientific racists hoped that science would justify and enable their racism, technoracists hope that technology will justify and enable theirs. Technoracism is only a couple of years old (about as old as this article [propublica.org]), it's only arisen following recent advances in machine learning. The technoracists hope to exploit layered neural networks' inherent ability to launder and obscure the human biases they were trained on, and portray the results of this GIGO effect as being purely logical and therefore s

    • by ewhenn ( 647989 )
      Whatever, I only care what the data shows. Algorithms like these only latch on to signals in the data. As long as the data is correct and not forged with an inherent bias, then the findings are valid. If a certain group doesn't like the findings, maybe they should figure out how to address the underlying causes and not call an accurate analysis "bias" or "discriminatory" or whatever other term they want to use because their feelings got hurt.

      For example the FBI crime data from 2016 (2017 data is not
      • You can have your opinions and we can share facts, but you can't use those facts to discriminate against someone based on an immutable trait like ethnicity. Unless you want to be a racist asshat.

      • by green1 ( 322787 )

        Ah but this is all so easy to fix.
        We just need to convict more white people of murder, or not convict a some black people (regardless of if they were guilty or not), and the same in reverse for sex crimes (again, ignore any actual evidence that might indicate you're convicting the wrong person). The heart disease one is harder though, but I'm sure if we try hard enough we can "unbias" that data too!

      • by q_e_t ( 5104099 )
        Signals are correlations, not necessarily causal.
  • When MS makes a product that doesn't suck...they'll have bought a vacuum cleaner manufacturer.

    The whole point of AI categorization systems is to uncover bias. We want the thing to make a decision for us, after all.

    This is basically saying that MS is trying to create tools to make AI that doesn't work. I give them an high probability of succeeding.

Real Programmers don't write in PL/I. PL/I is for programmers who can't decide whether to write in COBOL or FORTRAN.

Working...