Australia To Grade Written Essays In National Exam With Cognitive Computing 109
New submitter purnima writes: Australia keeps on giving and giving. Each year school kids in Australia sit The National Assessment Program (NAPLAN) which in part tests literacy. The exam includes a written page-long essay aimed at examining both language aptitude and literacy of students. Of course, human-marking of such essays is costly (twenty teacher-minutes per exam). So some bright spark has proposed that the essays be marked by computer. The government is convinced and the program is slated for the 2017 school year. Aside from the moral issues, is AI ready for this major task?
No, but... (Score:5, Interesting)
AI is not ready to do this task properly, but, at least in the US, human grading has sometimes been dumbed-down to the point where you would not even need current 'AI' to do as well, as prof. Perelman of MIT has demonstrated - e.g: http://www.bostonglobe.com/opi... [bostonglobe.com]
Re: (Score:3)
This is NAPLAN. We assign students into intelligence groups based on one exam and how well teachers taught students to pass the exam. Frankly I don't think an AI assigning marks at random could stuff up more than the education system already has in this country.
It seems like every attempt to unify or improve the education system just puts us on a path to a worse "education".
English factory system (Score:2, Insightful)
That's because all of us colonials and ex-colonials are burdened with the English factory educational system that was designed to produce bureaucrats for the Empire. The reason computers are capable of grading products of the educational system is because the system is made to create human computers.
Our - US, Australia - educational system needs to be completely changed - not reformed. I think the template to use is Maria Montessori's system. In the future we are going to need creative people who can dis
Re: (Score:3)
Creativity is self-learned, I find. But I'd never put my kid in anything other than a Montessori.
Now, the empires (corporations) want a factory system for creating creative people. Hence the coding intitiatives and STEM programs that governments are suddenly shoving down schools' throats all over the world. They aren't doing it to make wealthy citizens. They are demanding it so they can drive down creative costs to a commodity level. A billion Montessori kids are a billion paper-hatted geniuses working 29 h
Re: (Score:2)
Now, the empires (corporations) want a factory system for creating creative people. Hence the coding intitiatives and STEM programs that governments are suddenly shoving down schools' throats all over the world.
At least in the United States, I feel the push for STEM programs is the politicians wanting to be perceived as doing something; and, as typically is the case with politicians, they are doing it wrong. Technically wrong, and for the wrong reasons. As for the "empires (corporations)," that is tracing the curve to its logical extreme, as if faceless corporations will take over the world and we will be powerless to stop them. As much as I love a good corporate apocalypse movie, it is only happening because we a
Re: (Score:1)
How long before another Kennedy clan arises, and we cheer as they crown themselves king?
Likely in another 18 months, 4 of the last 5 Presidents will be a sibling, spouse or child of one of the others (and maybe even two of the others).
Re: (Score:2)
Re: English factory system (Score:2)
Re: (Score:3, Insightful)
It seems like every attempt to unify or improve the education system just puts us on a path to a worse "education".
Everyone is caught up in bullshit about metrics right now. Precisely how dumb are our kids, etc etc. Instead of spending money on education, they're spending it on figuring out what the results of not spending money on education are. Really brilliant work, there. But it makes them look busy, so mission accomplished.
Re: (Score:3)
Indeed. There is a widespread fallacy, in business as well as education, that any number you can assign to something is inherently meaningful, and conversely, if you cannot assign an 'objective' quantity to something, it must not be important. I suspect that business schools have done a lot to spread this fallacy (including into education), though I don't have the numbers to prove it...
Re:No, but... (Score:4, Interesting)
AI is not ready to do this task properly
Neither are humans. The question is not whether an AI can do it perfectly, but rather whether it can do it as well as a typical human grader. The human graders are under time pressure to increase throughput, and spend little time considering the logic and cogency of the students arguments. They are just looking at spelling and grammar, just like the AI would. At least the AI will be consistent. Human graders tend to give lower scores just before lunch, and better scores just after. Is that really fair, considering the importance of these scores on the student's future?
Anyway, this discussion is silly, since it is happening in a data-free environment. It would be far more meaningful if we could see the human and AI grades given to the same papers, side by side, preferably in a blind test, and then decide with is better. AI has advanced rapidly in the past few years, so I wouldn't be surprised if the AI won.
Re: (Score:2)
Re: (Score:2)
Anyway, this discussion is silly, since it is happening in a data-free environment. It would be far more meaningful if we could see the human and AI grades given to the same papers, side by side, preferably in a blind test, and then decide with is better.
Umm, I hate to state the obvious, but from TFA:
The results of the trials have been assessed according to two criteria: whether the computer scores correlate to the human scores within the same margin as two different human markers; and whether the scores generated by the computer distribute in the same way as an equivalent number marked by humans.
Rabinowitz said the trials show the artificial intelligence solutions perform as well, or even better, than the teachers involved.
So, TFA mentions they've already done something very much like your proposed test, although there's no mention of whether this review was "blind" (likely not, or they probably would have mentioned it).
My issue isn't so much with whether such AI can evaluate the average exam; I'm sure it can be calibrated to give a score in the right range for 90%+ papers, even knowing very little about English grammar, since there are various metrics that can be used to look
Re: (Score:2)
So, TFA mentions they've already done something very much like your proposed test
Sorry, but I wasn't clear. I didn't mean that they should do a test (I assumed that they had done that). I meant that WE should be able to see the actual results. If they want the public to support this, they should make their data available to the public.
Re: (Score:2)
I meant that WE should be able to see the actual results. If they want the public to support this, they should make their data available to the public.
Ah, I understand. And I completely agree.
Re: (Score:3)
The AI won years ago. See Pearson Education and Pearson Knowledge Technologies. In trial after trial the AI scores correlated greater with expert readers than the average employed reader correlates with experts.
Interesting. I found these research studies [pearsonassessments.com]. Some of the results are somewhat questionable since they were funded by Pearson, which has skin in the game. But in the absence of other evidence, the AI looks like a clear winner, in cost, effectiveness, and fairness.
Testing literacy (Score:5, Funny)
Each year school kids in Australia sit The National Assessment Program (NAPLAN) which in part tests literacy.
Can we get this AI to test Slashdot summaries?
Re: (Score:3)
It could use a few commas, but it's not terrible. "Sitting an exam" is standard Australian English, I presume. In Europe, it's commonly called "writing an exam" (they started moving from written answers to psychometry much more recently). Maybe "sitting an exam" doesn't make literal sense, but neither does "taking an exam" really; I mean, where are you taking it?
Re: (Score:2)
It could use a few commas, but it's not terrible. "Sitting an exam" is standard Australian English, I presume. In Europe, it's commonly called "writing an exam" (they started moving from written answers to psychometry much more recently). Maybe "sitting an exam" doesn't make literal sense, but neither does "taking an exam" really; I mean, where are you taking it?
I've always thought it was "taking" in the same way you take a pill or a sick day, not as in taking a doughnut.
Re: (Score:1)
Correct. Only yanks 'sit' exams.
Re: (Score:2)
Other than the "The" being capitalised when it shouldn't be and the omission of a comma towards the end, what's wrong with that sentence?
"human-marking of such essays is costly" (Score:1)
So is human-writing. Maybe we should have AIs take the test for us, too.
Ha! (Score:3)
Sounds like some politicians are buying an expensive lesson in what can and can't be automated by computer on their tax payers' dime.
Here in the US it's the military that usually serves that particular function but Autstalia has their schools doing it.
Just waiting to be exploited (Score:1)
I can't wait for some clever student to figure out they can game the system and write a totally incoherent paper that the computer gives perfect marks.
Re: (Score:2)
That would actually be an educationally-useful exercise - much more so than the exam itself.
Re: (Score:2)
Doesn't matter. For one, NAPLAN is not an admissions test. There is not a lot of motivation for individuals to cheat.
And it is a literacy test, so the accuracy of content is irrelevant.
The test does not need to be especially accurate for individuals. Collectively they provide data to compare classes and schools.
Yes, people will try to game the system. Australia already has lots of after-school coaching classes, full of kids of Asian immigrants, teaching cramming and exam technique. No doubt they are alrea
Is AI really necessary? (Score:2)
First the content of the essay shouldn't matter at all so there have to be no understanding of the text.
Second checking grammar, spelling and general literacy isn't new - there are already programs for all three that does an okay job.
Third humans needn't be removed entirely. Outliers can be checked/graded manually.
Of course there will be chances to cheat the system. But IMHO the effort to cheat a "dumb AI" should be similar to or harder than actually writing a text in the first place.
Re: (Score:2)
But IMHO the effort to cheat a "dumb AI" should be similar to or harder than actually writing a text in the first place
Maybe somebody can write a program to cheat. Try random sentences and feed them into a copy of the AI until you get a good grade.
Re: (Score:2)
Maybe somebody can write a program to cheat. Try random sentences and feed them into a copy of the AI until you get a good grade.
They did that.
http://www.bostonglobe.com/opi... [bostonglobe.com]
Flunk the robo-graders
By Les Perelman
April 30, 2014
(Computer science students at MIT and Harvard developed an application that generates gibberish that IntelliMetric, a robot essay-grading system, consistently scores above the 90th percentile. IntelliMetric scored incoherent essays as "advanced" in focus, meaning, language use and style. None of the major testing companies allows demonstrations of their robo-graders. Longer essays get higher grades, even if the
Re: (Score:2)
I'm assuming they resorted to this method after unsuccessfully adding {{OVERRIDE_GRADE_MODE}{SET GRADE='A'}} and variants into all their essays.
Content Matters (re:Is AI really necessary?) (Score:3)
I have to disagree with the statement that content doesn't matter. Without considering the content, you cannot judge whether the student is displaying reasoning and making cogent arguments, or merely faking it. <curmudgeon> it seems to me that the number of people I deal with who cannot tell the difference is increasing - a coincidence? Perhaps not. Murdoch has made a political movement out of exploiting such people.</curmudgeon>
If you say you cannot do a fair test if content is considered, that
Re: (Score:2)
I have to disagree with the statement that content doesn't matter. Without considering the content, you cannot judge whether the student is displaying reasoning and making cogent arguments, or merely faking it. it seems to me that the number of people I deal with who cannot tell the difference is increasing - a coincidence? Perhaps not.
I think both the lack of knowledge of mechanics and of the content can be problems for different populations.
I know people who have taught writing at various universities. This is only anecdotal, but I can tell you that at a couple top-tier universities, the writing courses were almost solely graded on CONTENT, not mechanics of writing. (Frankly, as someone not in the writing department, I was shocked to hear this... grammar errors and bad style simply didn't matter that much.) I encountered students a
Re: (Score:2)
For the purpose of _this_ thing content shouldn't matter. If the test is to make sure the participants have good control of written language it doesn't matter if the text itself promotes killing children or paints Pol Pot as a humanitarian.
Now if we were talking of a test intended to measure critical thinking the content would matter.
Is this the ob luddite post of the day? (Score:2)
Re: (Score:1)
A minor correction on how the essay grading works:
You have two teachers review the essay, and their scores are averaged only if their scores differ by 1 or less. If they differ by more than 1, you bring in a third teacher, and that score is averaged with whichever is closer to it.
Examples:
3 and 4: You get a 3.5
4 and 6: You bring a third teacher:
a. Third teacher gives a 3: You get a 3.5
a. Third teacher gives a 4: You get a 4
a. Third teacher gives a 5: You get a 5
a. Third teacher gives a 6: You get
Re: (Score:3)
Therefore the only task of those who write software to grade essays is that the variation of the machine is no worse that the variations of the humans. There is some success in this. Edx has a module that will grade essays [mit.edu]. As far as I know the value in this is quicker and more uniform feedback for practice essays.
Well, I'm a humanities guy and I know enough about the scientific method to understand that you don't know whether you have "success" until you test your bright idea in the real world and find out whether it actually works. And that's what MIT professor Les Perelman said in the article you're citing:
“My first and greatest objection to the research is that they did not have any valid statistical test comparing the software directly to human graders,” said Perelman, a retired director of writing and a current researcher at MIT.
As Perelman said, some computer students wrote a program that can turn out gibberish that the main robo-grading program consistently scores above the 90th percentile.
Of course humanities majors, who have generally have minimal understanding of advanced technology, hate it. This, of course, includes journalists [bostonglobe.com].
The article you're citing was not written by
Exclamatory sentence! (Score:5, Funny)
Adverb clause, independent clause conjunction independent clause dependent clause. Subject, adjective clause, verb prepositional phrase? Participle phrase subject verb conjunction dependent clause!
Emoticon.
Can we submit a poem? (Score:5, Funny)
Eye halve a spelling chequer
It came with my pea sea
It plainly marques four my revue
Miss steaks eye kin knot sea.
Eye strike a key and type a word
And weight four it two say
Weather eye am wrong oar write
It shows me strait a weigh.
As soon as a mist ache is maid
It nose bee fore two long
And eye can put the error rite
Its rare lea ever wrong.
Eye have run this poem threw it
I am shore your pleased two no
Its letter perfect in it’s weigh
My chequer tolled me sew.
it will be gamed. (Score:3, Insightful)
Since machines cannot yet understand the semantics of complex English text, they will use some simplistic rules as a substitute. These rules will be things like "average sentence length" and other such metrics, which as soon as they are discovered by students, will be used to game the system. Instead of producing essays born of rational and coherent thought, they will instead make them to match the things being measured while being utterly devoid of meaning.
Re: (Score:3)
Sounds perfect for Language Arts and Psych classes then.
So ... (Score:5, Funny)
written page-long essay aimed at examining both language aptitude and literacy of students.
So, the same technology used SO effectively to rank resumes will be used with students. Okay, kiddies, remember to stuff a lot of fancy-pants words into it.
Fail: This is sh*t. Go f*ck yourself. I'm not kissing your ass.
PASS: Subjectively, it is blatantly obvious to this observer that the new paradigm, as a cost-saving measure, was inspired by, and mimics, the the natural environmentally safe process of translating organic matter into nutritious compost. This has the outcome of allowing everyone who is in a paid position to devote the time saved to stress-relieving activities such as self-pleasuring, resulting in both a higher awareness of the need to practice good hygiene by such prophylactic procedures as more frequent hand-washing, and use of tissues to properly dispose of organic residue, though it could also negatively impact on their visual acuity over time.. Affected students should refrain from overtly engaging in behavior with superior's inferior posteriors to avoid being perceived as having a brown proboscis by their peers, with the associated negative impact on their social placement in the student hierarchy.
Re: (Score:2)
Re: (Score:2)
...superior's inferior posteriors...
Who is this particular "superior"? And he/she/it has more than one posterior? (Or perhaps only the inferior posteriors are plural; maybe this particular superior also has a superior posterior?)
(Sorry... couldn't resist. The question is whether grammar checkers would be good enough to realize the incorrect apostrophe usage here. I have my doubts. Also, I'd be interested in a grammar checker that could spot your superfluous comma. I'd be even more intrigued if such a grammar checker could note the re
Re: (Score:2)
We should make it fair. (Score:3)
Re: (Score:2)
Meanwhile Korean and Chinese parents will dutifully coach their children to memorize multiplication tables all the way to 20 times 20. (My Korean friend was surprised to learn we Indians went only till 16 x 16).
I'm curious... what's the point of that? You need to know single digits for obvious reasons, but I've never figured out why people go beyond that, especially nowadays when calculators (or rather nowadays, calculator apps on smartphones or computers) are ubiquitous. It seems like the return on effort drops off fairly dramatically after 10x10, which is where my memorization stopped (although 11s and 12s are trivial, so you can almost throw those in).
Re: (Score:2)
Have you heard of fractional multiplication tables? We did them too. "Tables class" was always the hour after lunch. One student leads the class singing one line at a time, the class follows. All the c
Re: (Score:2)
Have you heard of fractional multiplication tables?
Do you mean using the multiplication table to find equivalent fractions? That was all I could find on that subject. I had never heard of it before, actually. I saw information about it on US teacher's blog, so presumably it's taught here on occasion, but may not part of the official curriculum (or I'd think I would have found more references to it).
Anyhow, 20 x 20 tables are crazy (and even 16 x 16 seems excessive). Literally four times the work to memorize it all with no perceived benefits that I can t
Re: (Score:2)
Re: (Score:2)
Interesting. So, if I understand correctly, it's partly motivated by the way your language works, not just for mathematical reasons. Also, it almost sounds like it may be a leftover curriculum from back when you guys still used English Imperial units, as such fractional addition is common with inch-based measurements - unless you use fractions like that commonly for other things in daily life.
Re: (Score:2)
Re: (Score:2)
We did this mainly because our teachers were tortured by this when they were kids and it is their turn to torture us. Continuity and circle of life and all that.
Heh, you know that's true! Thanks for sharing. It's always fun to learn about small cultural differences like that which you normally never learn unless you go live and work in another country.
One of these generations I'm hopeful the US will eventually go metric as well, but we seem to be unusually stubborn about that sort of thing.
Moral Issues Are Important! (Score:2, Funny)
Aside from the moral issues, is AI ready for this major task?
Moral issues aside?!? I'm sorry, but the moral issues are front and center here. Australia is seriously proposing to bore an AI to death, or at least drive it insane, buy having it grade hundreds of thousands of grade school essays. This is an outrage!
Re: (Score:3)
Think of the AI
Human profs already use AI tools (Score:4, Interesting)
Re: (Score:1)
Re:Human profs already use AI tools (Score:5, Interesting)
Does he check the grammar score before he reads it himself? I would worry that it may bias him before he can make his own judgment. Another potential problem, of course, is that if students have access to the same software, they'll be able to "tune" their papers to ensure the AI gives them the highest possible score. While this may not be "cheating" per se, it does tend to devalue the AI somewhat. This is the same process that's been happening forever with "Search Engine Optimization", or put less nicely, trying to "game" the search engines.
Minor issues aside, it sounds like a reasonable integration of AI and human judgement. This probably sounds like the future direction educators will be taking more and more. Use AI to handle most of the tedious work - that's what computers are good for anyhow. The professor can then use his own judgement to make the final call, using the AI as a tool and not necessarily as a final arbiter. Moreover, it's going to be a long time before AI can evaluate the worth of the content of the paper, of course.
Re: (Score:2)
It's faster than manually marking 150 papers, but still takes him about 15-20 hours of labor over the course of 2-3 days.
Frankly, if he's going to take such a coarse approach, the question is why he's bothering to read most of them at all. It seems like he doesn't care much about content. It also doesn't sound like he's giving any significant feedback to students. (And for final papers, maybe 10% of students would actually read it anyway.)
So, why not streamline the process further, if you don't care enough to really think about the content? Say the grammar score is accurate to +/-10%, so if it scores the paper as 90%, y
Re: (Score:2)
If someone plagiarized whole paragraphs without citations, they get an incomplete and need to do a rewrite.
Really? Somebody who plagiarizes whole paragraphs without citations should be thrown out of school.
Re: (Score:2)
May I ask what the point of this exercise is? What is being tested? Is this about "essay" writing? AFAIK, only a few French philosophers still do that. (I have a Ph.D. in philosophy, so I feel qualified to say that.) I also can't see how such tests can have anything to do with scientific writing, and even less with creative writing. I understand checking for plagiarism, but what the heck is the point of these tests?
Maybe not so difficult (Score:2)
Why not. Just get it over with: fire everyone (Score:5, Insightful)
Hell, why not. While we're at it, why don't we automate the student process. Dump the students and educate AIs instead. Computing solutions always work, just ask any nerd about self-driving cars.
At some point, and it seems that that point is arriving now, people will realize that the driving force behind technological change, as far as money people are concerned, is to eliminate jobs, and that the good jobs are not realy being replaced, and cannot be replaced. AIs grading papers gets rid of more pesky teachers who make a living wage. A self-driving car doesn't fit the picture until you realize that millions of people make a living *driving trucks*, and self-driving trucks will eliminate their jobs (in theory, if it works, and I don't see it working) and make oodles of money for capital and kick millions of truck drivers, along with all the taxi and Uber car drivers, out without a dime. (Uber is VERY interested in self-driving cars. Guess why).
Some jobs are being made. And capital is desperately trying to commodify and cheapen such labor, to the point of demanding governments force coding classes on all kids. There are such jobs, but no where near enough, and those are mostly dropped onto cheaper kids, not newly dumped middle-aged workers.
Asimov was on point, decades ago, when he wrote that inevitably automation would eliminate most jobs, and that the biggest problem - in his view, opportunity -- would be finding something for people to do. I would say that people without purpose are the most dangerous force for destruction and stupidity on the planet - worse than global climate change.
Capital and people who work for capital, and neoliberals and business conservatives who support capital, tend to have well-paying white collar jobs and live among other people of their class, and don't see anything amiss. They're fine. Step outside into the vast middle grounds of the world, and you'll see a growing sense of we're-being-fucked that will require an endless army of pepper-spraying drones and surveillance to keep from erupting into riots someday soon.
Re: (Score:1)
*LIKE
Oddly (Score:4, Funny)
Computers aren't good at everything (Score:1)
Time machine (Score:2)
from TFA:
"(ACARA) plans to start marking two-to-three page written components of the test using cognitive computing from 2017."
They're using software that was written a couple of years in the future...
Anyway my nephew Eli is into that sort of thing, but I don't know if he reads /. these days
(using AI to analyze essays, not the time travel part)
TSI Written Assessment (Score:1)
Do it... transparently (Score:2)
Sinc
Great way to destroy good writers (Score:2)
Writing is an art form, not a science. If a computer could grade the art of writing, then the computer could DO THE WRITING - or at least 'fix' the problems it detected. In which case it would become the equivalent of teaching humans to use a slide rule.
I am absolutely sure that our best and brightest writers will end up being screwed over by A
Yes (Score:1)
Yesterday I said on Facebook:
If your live in Ontario and you're planning to vote liberal in the federal election, then you have to proven stupidity has no limit. Just because Justin's father was a rock star doesn't mean he is. Justin is on par with Wynne for most dysfunctional and idiotic political leader in history.
I had women telling that I was suppressing there right to vote, I had others telling me tha
Colorless Green Ideas Sleep Furiously - Chomsky (Score:1)
Automatic essay grading will be the perfect synergy for computer generated essays.
After all, the computer generated essay will follow grammatic rules consistently (assuming they are programmed in correctly, but let's assume for now that we wait for version 3.1 or so).
One big question is -- What are you trying to test for -- Do you only care if the student knows proper grammar and can follow it (maybe for lower + middle school english class)? Then automatic grading for STRUCTURE is probably good enough.
Do
Turnitin... (Score:1)