Slashdot Log In
Using The Web For Linguistic Research
Posted by
timothy
on Sun Jan 23, 2005 03:50 AM
from the that's-rediculous dept.
from the that's-rediculous dept.
prostoalex writes "The Economist says linguists are gradually adopting the World Wide Web as a useful corpus for linguistic research. Google is used, among other resources, to research how the written language evolves and how some non-standard examples of usage become more or less acceptable (The Economist quotes the phrase 'He far from succeeded,' where 'far from' is used as an adverb). LanguageLog is a resource linked in the article, where linguists discuss current peculiarities of the English language."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
They should probably avoid Slashdot (Score:5, Funny)
Re:They should probably avoid Slashdot (Score:2)
Re:They should probably avoid Slashdot (Score:2)
Re:They should probably avoid Slashdot (Score:3, Interesting)
Actually, I take that back.
It could actually be very interesting from a lexical or morphological point of view. The phenomenon of abbreviating words, such as "u" for "you" or "ur" for "you're" or "ru" for "are you." Language teachers in classrooms have been seeing it crop up in actual homework assignments. While reading such language may be like having glass
Re:They should probably avoid Slashdot (Score:3, Interesting)
One thing that's always been at the front of my my mind, why aren't these kids learning how to type? Or at least to type with any reasonable amount of skill. The only computer I had as a child was a Commodore 64, and I was still faster than most of todays youth even with their abbreviations. I was somewhat lucky in that our schools somehow foresaw the advent of the home computer and made sure we kne
Re:They should probably avoid Slashdot (Score:2)
I'd still advance the argument that extensive use of computers by a larger portion of the population has contributed to the phenomenon. I remember seeing those abbreviations before cell-phone use became almost ubiquitous.
It's al
It looks like no one read the article (Score:3, Insightful)
Re:They should probably avoid Slashdot (Score:2)
Right now, yes. But in a generation or two, perhaps they'll lose the distinction.
Whoops.
Re:They should probably avoid Slashdot (Score:2)
Re:They should probably avoid Slashdot (Score:2)
Re:They should probably avoid Slashdot (Score:2)
WTFDTRLASF?
What the f**k does that really long acronym stand for?
Indeed (Score:4, Funny)
I rue the day... (Score:3, Funny)
Re:I rue the day... (Score:3, Interesting)
I've heard it done. I've also heard 'roffle' (an attempt at pronouncing ROTFL I guess). Bizarre, really, since those terms are attempts to turn physical real-life actions into a verbal-only form.
Re:I rue the day... (Score:2)
Re:I rue the day... (Score:3, Informative)
lol (de ~) 1 [inf.] plezier
(taken from, www.vandale.nl, an authoritive dutch dictionary)
Re:I rue the day... (Score:2)
Epiphany (Score:2, Funny)
Google does it again (Score:3, Interesting)
This is not the first time when Google (and search engines in general) changed how we do things.
Nowadays copyrighters use Google to search for potential violations of their intelectual property. Plagiarism is easy to detect nowadays thanks to Google as well. Instead of using rather expensive [turnitin.com] systems in order to search for duplicate work, teachers are now one search away in distinguishing original work from the rest.
Re:Google does it again (Score:2)
*BSD be dyin' (Score:2, Funny)
One mo'e cripplin' bombshell hit da damn already beleaguered *BSD community when IDC confirmed dat *BSD market share gots dropped yet again, now waaay down t'less dan some fracshun uh 1 puh'cent uh all servers. Comin' on de heels uh a recent Netcraft survey which plainly states dat *BSD gots lost mo'e market share, dis news serves t'reinfo'ce whut we've knode all along. What it is, Mama! *BSD is collapsin' in complete disarray, as fittin'
Be carefull thought... (Score:3, Interesting)
native speakers.
In the European community the native English
speaking persons are by far a minority. That way
French expressions are poring into the language
in an unstoppable way. Those expressions are then
used by native speaking politicians and are
broadcasted by television. That way they enter the
mainstream of the English language.
Regards
Re:Be carefull thought... (Score:2)
Now I know some people would be quite upset at the horrible "loss" of cultural diversity implied by a single global language. But we can be just as diverse in many other ways that don't cause us to be unable to communicate with each other on a basic level. And IMHO, being able to communicate is much more important
Re:Be carefull thought... (Score:2)
Now I know some people would be quite upset at the horrible "loss" of cultural diversity implied by a single global language. But we can be just as diverse in many other ways that don't cause us to be unable to communicate with each other on a basic level. And IMHO, being able to communicate is much more important than some academic's ideal of "cultural identity".
Okay... how about the complete loss of the ability to read any of the world's litera
Re:Be carefull thought... (Score:3, Insightful)
By the time we have a unified language, we'll have a whole new set of literature to go along with it. Today's literature will be like ancient greek literature, and yes, it will only be readable by people with special training. It will need to be translat
Done: nous sommes desolés que notre president (Score:4, Insightful)
Those expressions are then
used by native speaking politicians and are
broadcasted by television.
Dude, it's worse, the French have already infiltrated as far as the advertising business and are using covert channels to spread some dangerous crack i heard was called La Liberte
http://french.about.com/b/a/081281.htm
Slightly more seriously
Apart from pointing out that your use of the word native is rather presumptive of geographic origin in this big wide internet thing, i wonder if this linguistic adoption is more one way towards English since the internet. OK the French got Le Weekend, and tons of anglicised nouns, tried to ban them all and didn't manage. But i read Friday that a British pilot training firm lost a contract to a French one. The reason cited by the Asian airline was that, whilst the training had to be in English, the French trainers spoke better, clearer, more intelligble English than did the English. I can't argue with that. Sadly.
Parent
Re:Be carefull thought... (Score:2)
I've used the web for corpus linguistics research (Score:2, Informative)
I could have gotten a higher accuracy rate, but this was just a simple undergraduate project.
Compression Prize (Score:2)
Non-official English (Score:2, Informative)
Re:Non-official English (Score:2, Insightful)
I'd say you're fighting a losing battle on this one. I'm not too bothered by it, e
Language Lives! (Score:2)
silly move. The example reminds me of "To boldly go", which was not proper, but its elegance is hard to argue against.
'Language' == spoken || written? (Score:3, Insightful)
The article addresses this in a weird way, where it first draws attention to the distinction, but once it reaches its crux, where google is used as a tool, the distinction is ignored entirely; instead it opts to focus on stranger things.
OMG! (Score:2)
Popular usage != wanted usage (Score:3, Informative)
I was pretty taken aback when a council of linguist in Poland suddenly declared some widely-chastised and not even very popular errors to be valid usage. I've been brought up in the circles of people who not only put a lot of stress to the language you use, but also cruelly point out every incorrect word or phrase you use -- and this made me quite intolerant to bad speech.
Being but a dirty foreigner, I know that my English can sound bad in the ears of native English speakers -- that's why I sometimes ask people to correct me if they spot errors.
In other words: some people find careless speech repulsive. Thus, we should do whatever we can to promote correct usage as opposed to legalising incorrect uses.
Three types of language (Score:4, Interesting)
I think that for most of the 20th century, English, and most languages in the industrialized world, was largely static, dominated by the written word which was dominated by proper grammar. Since WWII, popular culture and faster communications have increasingly exposed us to local vernaculars, mostly through radio and television. The written word lagged behind in its cultural evolution.
Thanks to the internet (initially email, BBS's and IRC, but more widely known on the Web), we now have a hybrid of the spoken and written word: the "typed word". This form of language evolves at the same rate as the spoken word, and injects its own vernacular as a side effect of the medium: acromyn and abbreviation "words" (rofl, how r u), along with common misspellings (pwned), and mixing letters with numbers or punctuation (133t, n00b). All of these serve at least one purpose, whether as a form of super shorthand, insult, the appearance of being "cool", or are merely the result of laziness on the part of the author. Most typed-word terms don't transfer well when spoken.
One of my hobbies is studying (European) languages and how they are related. Sometimes I worry about the damage the typed word is causing to the spoken and written word (and any proper linguist should at least be interested in the phenomenon). Luckily, most typed word expressions aren't pronounceable, and the ones that are sound absurd, because they are removed from their original context when spoken, and everyone recognizes gibberish when they hear it. How the typed word affects the written word remains to be seen. Yes both are typed now, but only the written word has a chance of going through an editorial process. I think it will take a very long time for the formal lexicon and rules of grammar to embrace, however reluctantly if ever, the typed vernacular.
Re:Three types of language (Score:2)
You do realise that most of the 20th century happened after the second world war, don't you? A condition that became false
Google as a grammar checker (Score:2, Interesting)
I've had the chance to use Google as a grammar or style checker in my day job as a glorified copy editor. I type two nearly identical expressions X and Y in the search box. If expression X gets 10,100 hits and expression Y only 500 hits, I use expression X.
For example, as a non-native speaker, I found myself waffling between the expression (A) "run for mayor of" and the expression (B) "run as mayor of." Letting Google arbitrate, I found 14,900 hits for (A) and only 200 hundred hits for (B). I chose (A).
I
Tongue Gymnastics (Score:2)
I love a bit of cunning linguistics.
Reminds me of "Meme Tree"... (Score:4, Informative)
Why a tree? Language and geneology seem to have a common thread. Meaning is like genetics. Language is expressive. Information is a kind of tree whose branches grow as reality elaborates and past events accumulate. New terms need to be invented for the dynamics we perceive in reality, just as new names are given to individuals as they emerge into the world. Patterns, continuity, periodicity. Such things lie at the heart of material existence and provide the hooks for consciousness itself. Information theory is the next great frontier, along with particle physics. Already they have converged and diverged and converged again. And playing with artificial trees turns out to be a lot of fun.
As for the "Meme Tree" program
The theory is that the internal consistency of these various lexical maps should roughly reflect many aspects of associative meaning. You could think of the statistical map as a Godelian bubble whose "truth" - if you will - is imposed by the laws governing the statistical associations. We don't derive the laws of language and meaning from these exercises, but we create an internally-complete map that reflects something about the nature of meaning.
There is a practical aim as well. If you can derive the strength of equivalence and the various levels and colors of associative meaning you could in theory build a "Truth Machine" capable of answering any question with a high degree of accuracy. The result of any question could be computed as any other information retrieval problem would be.
I never got around to having my little Meme Tree programs scrape the internet for random sentences. However, this should be a very simple thing to do. Google has had programming contests in the past - programs that use the Google database in interesting ways. Statistical analysis of language is basically what they do. Research projects on their data could provide stunning insights into the nature of information itself, its relation to language and to reality, and likely into our very nature as linguistic beings.
BBC voices (Score:2, Informative)
Writing in Japanese (Score:4, Insightful)
Linguistics 101 (Score:2, Insightful)
Measuring how the internet changes world languages is only a small part o
Using Google as a tagged linguistical data store (Score:2)
operating system
linux kernel
free software
And citations linked to those pairs such as:
Linus torvalds as the moving force behind the operating system t
Re:Hey (Score:2)
Re:Hey (Score:2)
So, yes, it does seem a fracking uselessly mistyped/misknown/miswrote/misthought way to express oneself. But that is changed now, because some people with too much time on their hands think it is a new form of expression and this is the way the English language is changing. So now we are supposed to treat these insentient ideas as the new ways ? Bahh, get lost.
Re:inner city teens (Score:3, Insightful)
I'm glad they're telling the youth what is proper; you're clearly incompetent to do so.
using words... is becoming more than just the normal, it is becoming the standard.
Is that right? Using words is "becoming more than just the normal"? I've been using words for years now; I'm glad to hear that's becoming the standard. Your post is a perfect example of why people should learn to write in something a
Re:inner city teens (Score:3, Interesting)
Incomprehension often has very little to do with that. A friend of mine moved to MA from NC at the same time as I moved from CA. She could not understand most people there, most people there could not understand her. I could, on the other hand, understand both of them. I've be
Re:HAMMER REVOLUTION --; (Score:2)
Programmer grammar (Score:3, Insightful)