Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Google The Internet Businesses Your Rights Online

Google Admits to Using Sohu Database 209

prostoalex writes "A few days ago a Chinese company, Sohu.com, alleged Google improperly tapped its database for its Pinyin IME product, stirring controversy on whether two databases were similar just due to normal research process. Today Google admitted that its new product for Chinese market 'was built leveraging some non-Google database resources.' 'The dictionaries used with both software from Google and Sohu shared several common mistakes, where Chinese characters were matched with the wrong Pinyin equivalents. In addition, both dictionaries listed the names of engineers who had developed Sohu's Sogou Pinyin IME.'"
This discussion has been archived. No new comments can be posted.

Google Admits to Using Sohu Database

Comments Filter:
  • This reminds me of (Score:5, Interesting)

    by Diordna ( 815458 ) on Monday April 09, 2007 @07:57PM (#18669475) Homepage
    "Stolen from Apple Computer" (whole story [folklore.org])
  • So... (Score:5, Interesting)

    by Anonymous Coward on Monday April 09, 2007 @07:58PM (#18669491)
    When caught making a mistake, they admit it, work to resolve it, and move on?
    I think there are a few other companies who could learn from that approach ...
  • I wonder... (Score:2, Interesting)

    by flyboy81 ( 698817 ) on Monday April 09, 2007 @08:04PM (#18669531)
    Is this a single isolated incident or simply the first one of more coming from the company that does no evil?
  • Re:Do no evil (Score:2, Interesting)

    by maxume ( 22995 ) on Monday April 09, 2007 @08:41PM (#18669767)
    There is no way to tell if the copying was done by 'Google' or if it was done by some engineer on their own. Sure, 'Google' needs to take steps to make sure that they what they put out meets some sort of standard, but the backpedaling and what not is pretty much the response you would get no matter how the copying was initiated, so there isn't much reason to assume where the responsibility for the copying lies.
  • by QuantumG ( 50515 ) <qg@biodome.org> on Monday April 09, 2007 @09:56PM (#18670235) Homepage Journal
    meh, the argument for why compilations of public domain "facts" should be considered a copyrightable work is that it is work to compile those facts. Why people can't understand that not all work results in property is beyond me, but there's ya reasoning.
  • by wrook ( 134116 ) on Monday April 09, 2007 @10:03PM (#18670265) Homepage
    I've been thinking about this. Throwing the evilness of Google aside for a moment, why should someone be able to copyright a listing of the phonetic pronunciation of an alphabet?

    Let's just imagine how I might create this list. I would have to hire people who spoke the Chinese. Then I would ask them to record the pronunciation of each character that they know. This is pretty easy because in Chinese each character has only one pronunciation (per dialect, anyway). There are about 3500 characters that you need to know in order to be literate. And all of these people would have learned these at school.

    But how did they learn them? Well, they had a textbook and they memorized the list from the textbook.

    Wait. I can't just memorize a list from one book and put it in another book. That's copyright infringement. In order for it not to be copyright infringement, I need to make sure that my sources all memorized the pronunciations from different sources. That's going to be difficult.

    But let's say I do that. Now I have a list of the 3500 most common characters. And with that, I've probably got 99% of everything that's in a newspaper. But that's probably not good enough. I probably want a list
    of say 60,000 characters. Otherwise it's pretty useless in a general sense. Uncommon characters are uncommon, but you *will* bump into the words over time.

    So where do I find these characters? Can I hire some guy that knows them all? It would be very difficult. The best place to look is in a book. But wait... what am I going to do? Every time I find a character my people don't know, look it up in a book? Why don't I just copy it from the book in the first place? That's just copyright infringement again.

    Really, the task of creating this list authoritatively without infringing copyright is monumental. Probably the *only* way to do it is with a community project where people just submit the pronunciations they know.

    But if I'm going to have a community project like this, what the heck do I need copyright for? What am I protecting? If everyone is going to contribute, everyone should benefit.

    So, personally, I don't think one should have copyright on this kind of material (same thing for spelling). It's just not in the public interest. This goes doubly so now that we have the internet and creating these kinds of projects is very inexpensive.

    OK, I've gone on long enough... But one more rant. What's with this "do no evil" thing? Isn't that setting the bar a little low. If I told my parents that I'd work hard not to be evil, I think they'd be somewhat disappointed in me. If Google wanted to actually "do some good" rather than "do no evil", they could start a community project to collect this data and share it with the world.

    Sigh... I guess we'll have to wait for some guy in his garage (but here's betting that someone has already started something).
  • Re:Do no evil? (Score:4, Interesting)

    by setagllib ( 753300 ) on Monday April 09, 2007 @11:00PM (#18670643)
    They're significantly reducing the lockin to Microsoft products, by encouraging, buying and thereafter funding web application projects that often overlap with what is currently locked in to Microsoft. They even brew some of their own sometimes. They continue the development of Linux and Python with a wide adoption of both. All of these things are creating wealth for everyone, and crippling Microsoft little by little, which we know is what we want. I'd much rather have a Google & Microsoft duopoly if it means Microsoft would finally have to clean up its shit and accomodate whatever open source platform Google would support in that scenario.
  • Re:Is this... (Score:5, Interesting)

    by 808140 ( 808140 ) on Tuesday April 10, 2007 @01:43AM (#18671923)
    No, actually, "gook" is a term that originated in the Korean war for Korean people. Because many of the soldiers who fought in the Korean war were officers in the Vietnam war, their racial slurs were adopted and modified by a new generation, leading to great confusion about the origins of the term.

    The etymology of the word gook is interesting, because it may be one of the few racial slurs that originated with a people's term for themselves. In Korean, guk means "country" and by extension a country's people; when it is not modified (cf. waiguk, outside country, foreigner) it is understood to be Korea or its peoples. Speakers of Chinese will recognize the word as having sintic origin (gúo, country, and wàigúo, foreign country, respectively, in Mandarin).

    The term was appropriated by the Americans during the Korean war and used as a racial slur for Korean people in general, which must have been confusing to the Koreans (imagine someone using "American" as a slur for Americans to get an idea). Then, in Vietnam, the old "Asians are all the same" mentality prompted GIs to extend its meaning (imagine "American" being a racial slur for all white people, for example -- yes, I know many Americans aren't white, it's not a perfect analogy, deal with it).

"Here's something to think about: How come you never see a headline like `Psychic Wins Lottery.'" -- Comedian Jay Leno

Working...