Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet Science

Using The Web For Linguistic Research 205

prostoalex writes "The Economist says linguists are gradually adopting the World Wide Web as a useful corpus for linguistic research. Google is used, among other resources, to research how the written language evolves and how some non-standard examples of usage become more or less acceptable (The Economist quotes the phrase 'He far from succeeded,' where 'far from' is used as an adverb). LanguageLog is a resource linked in the article, where linguists discuss current peculiarities of the English language."
This discussion has been archived. No new comments can be posted.

Using The Web For Linguistic Research

Comments Filter:
  • by Anonymous Coward on Sunday January 23, 2005 @05:31AM (#11446770)
    There are more non native speakers on the web then
    native speakers.

    Of course, non-native speakers have generally less trouble distinguishing "then" from "than" than the so-called "native" speakers do. You might speak it natively, but remember, you don't write it natively.

  • by adam31 ( 817930 ) <adam31.gmail@com> on Sunday January 23, 2005 @05:49AM (#11446804)
    How do you even pronounce 'pwn3d' ? Google is not a tool to study speech patterns, and there's nothing to say that speech even resembles written text.

    The article addresses this in a weird way, where it first draws attention to the distinction, but once it reaches its crux, where google is used as a tool, the distinction is ignored entirely; instead it opts to focus on stranger things.

  • by Kafir ( 215091 ) <qaffir@hotmail.com> on Sunday January 23, 2005 @06:17AM (#11446857)
    i countinously question my co-workers (social workers) in telling the youth what is propper and not.

    I'm glad they're telling the youth what is proper; you're clearly incompetent to do so.

    using words... is becoming more than just the normal, it is becoming the standard.

    Is that right? Using words is "becoming more than just the normal"? I've been using words for years now; I'm glad to hear that's becoming the standard. Your post is a perfect example of why people should learn to write in something approaching standard English. Your meaning is barely intelligible, and you sound like an idiot.

  • by Kafir ( 215091 ) <qaffir@hotmail.com> on Sunday January 23, 2005 @06:44AM (#11446920)
    From Merriam-Webster Online [m-w.com]:
    real (3, adverb): VERY (he was real cool -- H. M. McLuhan)
    usage Most handbooks consider the adverb real to be informal and more suitable to speech than writing. Our evidence shows these observations to be true in the main, but real is becoming more common in writing of an informal, conversational style. It is used as an intensifier only and is not interchangeable with really except in that use.

    I'd say you're fighting a losing battle on this one. I'm not too bothered by it, either; the English language has other words that function both as adjectives and as adverbs, despite the existence of a distinct adverb form - near dead and nearly dead are both standard, for instance.
  • by Anonymous Coward on Sunday January 23, 2005 @06:45AM (#11446924)

    One thing that's always been at the front of my my mind, why aren't these kids learning how to type?

    Because, unlike the parent's assumption, the phenomenon isn't related to computers. It's related to text messaging. It might be just as fast to type "you" instead of "u" with a keyboard, but it's noticably slower on mobile phones, especially before predictive text became popular.

    Furthermore, there is a limit on how many characters you can send in a single message. Most service providers automatically split long messages into multiple parts, but in the case where you are just scraping the limit, it might actually cost twiice as much to send a text message that says "you are" instead of "u r".

    I'm not excusing it, I hate reading it myself, it makes people look illiterate and, sadly, in many cases people really aren't able to express themselves in normal English. I know people in their mid twenties who type "his" when they mean "he is", and, to use an example I received recently through email, "gess how i sore." when they meant "guess who I saw?". No, I didn't make it up, and no the person wasn't joking.

  • by new500 ( 128819 ) on Sunday January 23, 2005 @07:40AM (#11447059) Journal
    . . .

    Those expressions are then
    used by native speaking politicians and are
    broadcasted by television.


    Dude, it's worse, the French have already infiltrated as far as the advertising business and are using covert channels to spread some dangerous crack i heard was called La Liberte :

    http://french.about.com/b/a/081281.htm

    Slightly more seriously :

    Apart from pointing out that your use of the word native is rather presumptive of geographic origin in this big wide internet thing, i wonder if this linguistic adoption is more one way towards English since the internet. OK the French got Le Weekend, and tons of anglicised nouns, tried to ban them all and didn't manage. But i read Friday that a British pilot training firm lost a contract to a French one. The reason cited by the Asian airline was that, whilst the training had to be in English, the French trainers spoke better, clearer, more intelligble English than did the English. I can't argue with that. Sadly.
  • by Spy Hunter ( 317220 ) on Sunday January 23, 2005 @08:49AM (#11447208) Journal
    You're overdramatizing. This is a process that will take hundreds if not thousands of years, even with technology helping to accelerate it. It's not like we'll wake up 10 years from now with a unified language and forget how to read today's literature!

    By the time we have a unified language, we'll have a whole new set of literature to go along with it. Today's literature will be like ancient greek literature, and yes, it will only be readable by people with special training. It will need to be translated, just like ancient greek is today. What's the big deal? The biggest difference is that only one translation would be needed, and therefore all the translation work could be focused on that.

    Furthermore, nobody will be forced to adopt a unified language. It will simply evolve. Words will travel from one language to another. Phrases will creep in from other languages. Languages will become closer, and eventually merge. You can see it happening today; at least the beginnings. It will only continue even faster, as the Internet is here to stay and the growth of the global marketplace shows no signs of slowing.

    Academics care about linguistic diversity in an abstract sense, but normal people really don't. People care about it, but in a much more practical sense of everyday communication. People will accept gradual, evolutionary changes to their language, as long as they can express themselves in a way they like. Academics often fight against change, because their theories were all developed to explain the old ways of doing things. They will fight against language unification; luckily I believe they will not be able to prevent it, or even slow it very much. [Note: this is a gross generalization about "academics", please remember that all generalizations are false.]

    You ask what's so great about a global language? The removal of all language barriers from everything! Duh!

    Maybe you don't personally notice any language barriers right now, but that doesn't mean you couldn't benefit from their removal. Maybe there are some really cool people in China right now doing brilliant work in your field that you just don't know about because it's all in Chinese. Maybe you would benefit from the increased efficiency of a global economy without language barriers. I think it's an indisputable fact that removing language barriers is a great thing.

  • by minairia ( 608427 ) on Sunday January 23, 2005 @09:04AM (#11447234)
    I am American but have to write in Japanese for work. No matter how much one learns in school, when one writes in a foreign language, you'll hit a point of wondering if what you wrote is how native speakers say something or is even understandable. Whenever I hit a point like that, I put the sentence in question (or key fragments thereof) into a Google search. If nothing comes up, I know I have to rewrite. If only a few links come up, I know what I wrote might be a little wierd, but is at least understandable. If I get pages and pages of links, I'm golden.
  • Linguistics 101 (Score:2, Insightful)

    by DingerX ( 847589 ) on Sunday January 23, 2005 @09:31AM (#11447295) Journal
    I use search engines all the time for linguistics reseach: when I'm reading or translating from one language to another, and I run into an odd usage, I just type the phrase in the magic box and *poof*, I get hundreds of contextual examples. Likewise, if I'm writing in a foreign language, and I need to know if a preposition or a construction is correct (and not simply words), again all I have to do is type it in and see what comes out.

    Measuring how the internet changes world languages is only a small part of what the 'net offers those interested in linguistics and linguistic usage. Most of the web data archived on google does not consist of ROTFLMAOs and pwn3ds; it consists of everyday usage, and a good deal of that is from the last decade. Much of linguistics deals precisely with that: how the language is used in a daily basis. That's also how dictionaries come about: they're [i]descriptive[/i] accounts of usage (which is why the high school journalistic trick of beginning an article with "Webster's defines fistula as..." doesn't work. Dictionaries don't lay down the law, they describe it).

    Of course, some people have been arguing that this gives room for errors and abuses. Of course it does! just 'cos something doesn't play by the rules doesn't mean it's not in common usage. And just because people don't follow rules of orthography, grammar and style doesn't excuse us from teaching these things, or trying to follow them. After all, language is about communication, and these corruptions hinder our ability to communicate, especially communicating complex thoughts.

    So yeah, "to impact" is to make an active verb out of a passive participle, and "to impinge" should be used ihstead. There are plenty of uses of "bonified" out there. Google finds about 20,000 such occurrences. That doesn't make it correct. Nor does that make Google's suggested correction "bonafide" correct either (306,000 occurrences). The correct spelling is [i]bona fide[/i] (1,050,000 occurrences).

    And don't worry too much about purely textual forms appearing in speach. LOL is just this decade's SOB. A spoken "I R0XX0R, J00 5UXX0R" shouldn't alarm us too much when we consider all those medical shows where doctors run around yelling "Get me a boron enema STAT!", pompous academics actually say "such economic perturbations may affect the governance of a certain cryptodictatorship, VIZ the United States", and we all drop down to the pharmacist to "Fill an RX", all spoken forms of what are written Latin abbreviations (statim -- immediately, videlicet -- that is, Rx -- Respondeo, although some classicists may insist it's the symbol for Jupiter).

    One linguistic area that is interesting is the gradual adoption of worldwide slang. We hear Americans these days using terms like "Bog Standard" and "Arsed".

    What's the point of this rant? Teh intardnet is a great resource for linguistic usage, beyond the navel-reflection of IT professionals. Disciplines like linguistics deal in examples of usage, and the internet is a great stockpile of everyday language. Descriptive grammar and descriptive dictionaries are not an excuse for ignoring arbitrary rules. Most of the lingusitic phenomena we see with internet usage are not new.
  • Programmer grammar (Score:3, Insightful)

    by cbr2702 ( 750255 ) on Sunday January 23, 2005 @11:11AM (#11447627) Homepage
    Adding or changing characters in a literal string seems like misquoting. Traditionally in handwritten work the comma went almost directly under the quotation mark. When people shifted to typewriters and then computers, an arbitrary choice was made to put the comma first. Most programmers I meet seem to have reversed that choice.
  • by Anonymous Coward on Sunday January 23, 2005 @03:25PM (#11449085)
    A user signing as phaln on Slashdot today remarks, apropos of a comment exchange about using the entire web as a corpus (the way we often do here at Language Log Plaza), which led to some comments on the sort of random slangy stuff on the web that might make that a bad idea for grammarians seeking information about English:

    It came to me that the English language was in deep trouble when people started saying "rotfl" and "lol" in person.

    Now, the user is being humorous, of course. But it is remarkable how often people say this sort of thing. It reaches newspaper columns and magazines as well as everyday conversations about language ("Oh, you're a linguist? What do you think about the way Internet slang is changing the language?"). I've heard a half-hour radio discussion about it on the BBC World Service (in the middle of the night; it was a real yawn, a perfect fix for my insomnia). It seems likely that at least some people really do think English might be altered radically by the intrusion of email abbreviations for phrases like "[I'm] rolling on the floor laughing" or "[I'm] laughing out loud" into regular spoken English.

    Don't worry. Nothing radical or even slightly significant will happen. Suppose, say, "rotfl" (pronounced "rotfull") became quite common in speech (which seems unlikely, since if your interlocutor falls down and rolls on the floor laughing it generally needs no comment; but maybe as a metaphor, or on the phone). What would have changed? One interjection (a word grammatically like "ouch") added. Total effect on language: utterly trivial. Not even noise level. Interjections are so unimportant to the fabric of the language that they are almost completely ignored in grammars. There's almost nothing to say. They have no syntactic properties at all -- you pop one in when the spirit moves you. And their basic meaning is simply expressive of a transitory mental state ("Ouch!" means something like "That hurt!"). Don't worry about English. It will do fine. Not even floods of email-originated phrases entering the lexicon would change it in any significant way. If phaln were to suggest such a thing seriously I would be LOL.

    From: http://itre.cis.upenn.edu/~myl/languagelog/archive s/001829.html#more

    Also, for anyone interested, Pullman's crusade against Dan Brown is simply delightful. A good entry about it (Pullman posts about Dan Brown all the time):

    http://itre.cis.upenn.edu/~myl/languagelog/archi ve s/001628.html

  • by JoeBuck ( 7947 ) on Sunday January 23, 2005 @04:35PM (#11449583) Homepage
    It's troubling to read so many comments that worry that the linguistic researchers will find "bad language", and worse, that people have moderated such comments up. It reflects a misunderstanding of what linguists do: they want to get a description of the language as it is used, and as it changes, and historically speaking, usages that start in the gossip of teenage girls often become mainstream a couple of generations later. They need it all, and they probably need the crappy stuff most of all, because it is closer to spoken English.

If you want to put yourself on the map, publish your own map.

Working...