Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Google Faces Plagiarism Questions Over Chinese Software 187

Posted by Zonk on Sunday April 08, 2007 @03:03PM from the i'll-just-take-a-look-and-yoink dept.

yaohua2000 writes "Google's laboratory in China has launched its first product, a Pinyin Input Method Editor. The software allows the romanized characters to be translated to more traditional Chinese symbols , via entering on a QWERTY keyboard. Users soon discovered that the data Google used for the product was unusually similar to the data used by a Chinese rival, Sogou. Google has evaded the question about software similarities, reports PC World. 'The similarities, which included an error involving the name of a celebrity, were noted on a Google Labs discussion board about its Pinyin IME. Users noted that entering the Pinyin pinggong into the Google IME incorrectly produced the name of Feng Gong, an actor and comedian.'"

This discussion has been archived. No new comments can be posted.

Google Faces Plagiarism Questions Over Chinese Software

Load All Comments

Search 187 Comments Log In/Create an Account

Comments Filter:

Google Should Defend Themselves the OpenBSD Way (Score:5, Funny)

by Anonymous Coward writes: on Sunday April 08, 2007 @03:08PM (#18656981)

Blame the Sogou authors, and call them inhuman. Also say it isn't plagarism because it's beta.

Share
twitter facebook
- The plagiarism has been confirmed by Google (Score:3, Informative)
  
  by gam3cub3 ( 1085777 ) writes:
  
  Plagiarism has been confirmed officially by Google, Sohu and IDG news reporter Sumner Lemon.
  
  Google admits word database came from third party - Network World
  
  http://www.networkworld.com/news/2007/040907-updat e-google-admits-word-database.html [networkworld.com]
  
  An earlier report by the same reporter: Sohu to Google: Take down copycat software
  http://www.networkworld.com/news/2007/040707-sohu- to-google-take-down.html [networkworld.com]
  
  Google China's Official Apology to Sohu.com (in Chinese)
  http://googlechinablog.com/2007/04/blog [googlechinablog.com]
- - Re: (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    The story didn't come from Sohu/Sogou. The copying was originally discovered by bloggers and BBS posters, and Sohu only made their statement once the story had crossed over into the mainstream media and they were being asked about it by journalists. They didn't give any comment at all for the first couple of days.
I'm a stupid American, so... (Score:5, Funny)

by Anonymous Coward writes: on Sunday April 08, 2007 @03:09PM (#18656991)

Let me be the first to say... WHAT?

Share
twitter facebook
- Re: (Score:1, Funny)
  
  by alexhard ( 778254 ) writes:
  
  I'm a stupid American, so...
  This brought to you by: the redundancy department of redundancy!
- Input method (Score:5, Interesting)
  
  by DrYak ( 748999 ) writes: on Sunday April 08, 2007 @07:09PM (#18658441) Homepage
  
  Just fucking google it ;) [justfuckinggoogleit.com]
  
  Chinese is a complex language to write. It doesn't use an alphabet (like most western languages). It doesn't even use syllables (like, for example, 2 of the Japanese writing system), it uses logographs : in an over-simplified way, we can say they use 1 symbol for every different word/idea/etc.
  This makes thousands of different symbols (According to wikipedia : a little less than 50k variants in the Kangxi dictionary).
  
  This ISN'T something you can put on a regular occidental 107 keys keyboard.
  
  Therefor you have several solutions [wikipedia.org] :
  
  - Custom keyboards :
  Use special keyboards where the most frequently couple of thousand of symbols are present.
  Not very practical (symbols harder to find compared to looking for a letter on a 107 keyboard). Wikipedia has a picture.
  
  - By shape of characters :
  Either by handwriting recognition, or by decomposing charachters (the different strokes) and putting them on a regular keyboard layout.
  
  - By sound of words :
  Either by using something like Zhuyin [wikipedia.org] which is system that was invented to help teaching chinese. It has 31 symbols, 1 for each consonant or vowel in chinese. As such, it can be used for other purposes, like putting it on a keyboard : the person type the sound and the software guess the corresponding word/logogram.
  Or an alternative method is the Pinyin [wikipedia.org] : it uses latin letters to write the sound. (And thus is interesting for computers on which latin keyboards are widespread).
  
  The mapping of sound to logographs isn't completely straightforward, for example Chinese is a tonal language, but some system don't require the writer to specify tones using marks. Some software work is required. And this software isn't infallible.
  
  Google released such a software. User can phonetically type Chinese on any occidental keyboard using (tone-less) pinyin, and the software tries to convert it to actual Chinese characters.
  This software produce the same correct results as another popular one. (Hopefully. If the google soft didn't give the correct results, there would be problems. I wouldn't be a functional pinyin input system).
  Sometime, the software hesitates and give a choice of possibilities. Most of the time, the same as the concurrent (Possibly explained by the fact that both softwares have to process the same user input, using the same pronunciation system that isn't unambiguous).
  But, sometime the Google soft is plain wrong, and produces the same errors as the concurrent. And THIS is suspicious, because maybe some part of the software uses piece from the concurrent (part of the algorithm ? statistical data ?)
  
  The company is suing googles on the grounds that if both softwares behave the same down to the bugs, maybe some part could have been illegally copied.
  
  Meanwhile, adepts of Google Seppuku [wikipedia.org] rejoiced world wide a cheap and easy to find software that could also be used to produce random chinese caracter to be subsequently imported into Google as Kanji.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by mochan_s ( 536939 ) writes:
    
    Or an alternative method is the Pinyin : it uses latin letters to write the sound. (And thus is interesting for computers on which latin keyboards are widespread).
    
    It's all nice and good until you have to share an office with a Chinese and listen to him beat the spacebar to death trying to write his e-mail in Chinese.
Identical typos... (Score:5, Insightful)

by pedantic bore ( 740196 ) writes: on Sunday April 08, 2007 @03:11PM (#18657019)

Funny, that's how we catch students who plagarize, too.
Coming up with the same algorithm isn't terribly unlikely. Structuring it in the same way is not uncommon either. Making exactly the same mistakes, however, is hard to believe.

Share
twitter facebook
- Re:Identical typos... (Score:5, Insightful)
  
  by Plutonite ( 999141 ) writes: on Sunday April 08, 2007 @04:46PM (#18657647)
  
  Not really. I'm not defending Google here, but you seem to be talking about an essay not an algorithm. If you have algorithms that are similar enough, they do not even need to be "structured the same way" to produce the same output(errors included). Anybody who has been to an ACM contest will tell you this.
  
  As such this story is useless. The internet needs no more speculation as it is, it's hard enough arguing what is wrong or right when concrete evidence is available [slashdot.org]. Our flamewars should be founded on solid ground.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by pedantic bore ( 740196 ) writes:
    
    Just to clarify, I'm only talking about computer programs.
    Detecting plagarism in essays is even easier -- there are more degrees of freedom in prose than in computer code (especially when the code is written to conform to some style guide).
  - Re: (Score:1)
    
    by Ziwcam ( 766621 ) writes:
    
    Our flamewars should be founded on solid ground.
    
    You must be new here...
  - Re: (Score:2)
    
    by microbee ( 682094 ) writes:
    
    : but you seem to be talking about an essay not an algorithm.
    
    Why do you even talk about algorithm here? There is no algorithm involved. It's a dictionary that is copied, not code.
    - Re: (Score:2)
      
      by Plutonite ( 999141 ) writes:
      
      Did you expect me to RTFA or something? An article about Chinese characters?! You are definitely new here :)
      
      That said, I was responding to a particular point made by the parent. You are probably right, this seems to be an issue of indexing, but since mapping roman characters to others is generally not possible through 1-to-1 transformations alone(i.e there is a small amount of additional logic), it is possible that there is a little code involved. Even if it was purely an index, you cannot make claims witho
- Re:Identical typos... (Score:5, Insightful)
  
  by ReallyEvilCanine ( 991886 ) writes: on Sunday April 08, 2007 @04:56PM (#18657683) Homepage
  
  According to TFA, Sohu has patents in several areas related to how popular Internet search terms can be used for predictive text input. Google does, too. And unlike most others, Google constantly tweaks algorithms. Have you noticed how the Google Toolbar now predicts your search terms? And every time you deviate, they do modifications for you personally and tabulate in general to see if other's are also going after such similar versions.
  I work in I18N and deal with IMEs all the time, from the basic, non-learning MS Windows versions to the ones which come with the NJ Star and give preference to lesser-used terms previously selected to various other proprietary variants. There are only so many ways to write an IME, and there are only so many ways to do good prediction. If I type "go" in Japanese, my first choice will usually be "5" followed by the symbol for "language" and the game "Go", then various other possibilities. Only when I next type a "z" or a "g" do the symbols for a.m. and p.m. move to the front. Now if I'd written an IME and wanted to protect it I might have it always bring up "Mifune Go" ( as the fifth selection or, more subtly, bring up "Go" as the fifth possibility if you typed a "G" or "Go" after "Mifune". This isn't the case here.
  With Google's work and implementation of prediction methods, I find it hard to accuse the company of plagiarism for having the same bug (which comes as a result of predictive methods) as some other company. This is a bug, not some zyzzyx or easter egg which a programmer included to catch thieves. It was unintentional on Sogou's part and likely equally unintentional on Google's.
  Then again, there's a lot of pressure to excel at Google and maybe someone gave in to temptation despite working for a company that knows more about data than anyone else out there. Unlikely, but possible... and if Google issue a statement that someone did indeed plagiarise Sohu's work, fine. It could happen anywhere. Doesn't make Google bad, only one programmer. It makes the company culpable, but it hardly looks malicious.
  
  Parent Share
  twitter facebook
Ironic, isn't it? (Score:2, Insightful)

by catdevnull ( 531283 ) writes:

Of all the countries in the world to bitch about someone stealing or copying...
- Re: (Score:1, Insightful)
  
  by Anonymous Coward writes:
  
  why is such a blatantly prejudiced comment modded insightful?
  shall i make a comment about the us being nothing but greedy lying bullies because microsoft or diebold is from the us?
  - Re: (Score:3, Funny)
    
    by Zarel ( 900479 ) writes:
    
    why is such a blatantly prejudiced comment modded insightful?
    shall i make a comment about the us being nothing but greedy lying bullies because microsoft or diebold is from the us?
    Well, if you did make such a comment, and it was relevant, it probably would get modded insightful.
  - Re: (Score:1)
    
    by Lord Balto ( 973273 ) writes:
    
    The "us"? Shift key broken on your computer?
  - Re:Ironic, isn't it? (Score:5, Insightful)
    
    by ScrewMaster ( 602015 ) writes: on Sunday April 08, 2007 @03:58PM (#18657301)
    
    Why is it that saying anything negative about another country is always turned into a discussion about racism and bigotry? It immediately poisons further dialog when it is applied without reason. If you have some reason to think the OP is prejudiced I'd like to hear it, because I didn't read that into his comment. I hear a lot of negative comments about the United States on Slashdot (yours, for one, which is interesting) but I don't immediately conclude that prejudice is the root of it. Sometimes it is, but it's nice to find that out first before jumping to any conclusions.
    
    The unfortunate fact of the matter is that China's government and industry are completely unconcerned about the source of the technology that they mass-produce and sell to everyone. They just don't care, period, and I suppose when you get right down to it there's no reason they should. On the other hand, that just means there's no reason why we should respect their "intellectual property" either, and when their scientists and engineers come up with something good they damn well shouldn't expect us to concern ourselves over their rights either. If Google did indeed rip off their Chinese counterparts my feeling is ... more power to 'em.
    
    So, it's not a statement of prejudice (e.g. "I dislike Chinese people because they are Chinese, or have yellow skin, or slanted eyes, or talk funny") but a legitimate observation on the state of affairs in that country.
    
    Just watch it when you start playing the race card without a good reason ... it prejudices any argument you make after that point.
    
    Parent Share
    twitter facebook
    - Re: (Score:1)
      
      by iminplaya ( 723125 ) writes:
      
      If Google did indeed rip off their Chinese counterparts my feeling is ... more power to 'em.
      
      Umm, I'm kinda new here, so I'm not sure if this will come out right:
      2wrongs != 1right? Who knows with this "new math"...
      
      What the hell, let's go way off topic with a possibly suitable analogy:
      
      China invades Tibet. So that makes it ok to invade Iraq? I wouldn't put it past Bush to use such a pretext [slashdot.org]. There. I turned it away from racism and bigotry and into US bashing and good old political mud slinging.
      
      But if the subje
      - Re: (Score:2)
        
        by PastaLover ( 704500 ) writes:
        
        China invades Tibet. So that makes it ok to invade Iraq? I wouldn't put it past Bush to use such a pretext [slashdot.org]. There. I turned it away from racism and bigotry and into US bashing and good old political mud slinging.
        If you can't make a decent analogy, just don't. To fix this one, I'd suggest it's more like China invading Hawaii, and the US subsequently invading Taiwan (or Tibet).
    - Re: (Score:2, Insightful)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
    - Re: (Score:2)
      
      by microbee ( 682094 ) writes:
      
      : The unfortunate fact of the matter is that... They just don't care, period...
      
      And this is "insightful" on slashdot.
      
      Enough said.
    - Re: (Score:2)
      
      by kabocox ( 199019 ) writes:
      
      The unfortunate fact of the matter is that China's government and industry are completely unconcerned about the source of the technology that they mass-produce and sell to everyone. They just don't care, period, and I suppose when you get right down to it there's no reason they should. On the other hand, that just means there's no reason why we should respect their "intellectual property" either, and when their scientists and engineers come up with something good they damn well shouldn't expect us to concer
    - - Re: (Score:2)
        
        by fuzzix ( 700457 ) writes:
        
        It is sorta like picking a keyboard like qwerty as a starting point, it may be arbitary but it supposedly was designed by the popularity of letter use and other so-called reasons.
        OWERTY was designed to move the hammers of a typewriter most commonly hit in sequence further apart - efficiency was its goal, but its efficiency was reliant on clunky hardware.
        
        Now that we don't have the problem of typewriter hammers sticking, why is QWERTY still in use?
- Re: (Score:3, Insightful)
  
  by fermion ( 181285 ) writes:
  
  OTOH, google is desperately trying to show that it offers an original and innovative product, and does in fact owe it profit to stealing and repacking the content of others. The lifting of code sort of indicates that the case is the former and not the later, and may tend to have an impact in cases where Google is claiming it need not make royalty payments.
- Re: (Score:1)
  
  by Sanguis Mortuum ( 581999 ) writes:
  
  Because two wrongs make a right?
- Re: (Score:2)
  
  by catdevnull ( 531283 ) writes:
  
  I'm just saying--the Chinese have a pretty bad reputation for industrial espionage and pirating. I found it ironic that a company there was crying foul--Pot. Kettle. Black.
  
  I was making no value judgements--just calling it like I see it.
  
  I'm really surprised by the number of hater tots jumping in on this.
- - Re:Ironic, isn't it? (Score:4, Insightful)
    
    by ScrewMaster ( 602015 ) writes: on Sunday April 08, 2007 @03:39PM (#18657191)
    
    Strange how you wouldn't have said this if it was Microsoft.
    
    You're definitely new here. We complain about Microsoft pinching other people's work continuously here on Slashdot, mainly because Microsoft does, continuously. We also regularly bitch about how the current patent and copyright systems here in the United States are seriously flawed. And the OP is correct in pointing out that China has always been, shall we say, less than respectful of others' rights in this regard ("blatantly ripping them off" is as good a description as any.)
    
    What was your complaint again?
    
    Parent Share
    twitter facebook
    - Or, basically... (Score:5, Insightful)
      
      by mattgreen ( 701203 ) writes: on Sunday April 08, 2007 @04:15PM (#18657421)
      
      "This is our groupthink, it doesn't need to make sense. Now shut up and conform so you get your mod points!"
      
      Parent Share
      twitter facebook
      - Hmm... (Score:5, Funny)
        
        by mattgreen ( 701203 ) writes: on Sunday April 08, 2007 @05:12PM (#18657771)
        
        This confirms it: meta-discussion of Slashdot makes for karma whoring. Now, can I recurse again and have that be the case?
        
        Parent Share
        twitter facebook
    - - Re: (Score:3, Interesting)
        
        by Anonymous Coward writes:
        
        Right, of course. It's perfectly ok to discriminantly refer to the Chinese based on a broad generalization. I mean.. any decisions a corporation in China makes is obviously the representation of the entire country. Just like Diebold and Microsoft are for the US.
        The Chinese government has refused on multiple occasions to enforce copyright of others and blatantly turns a blind-eye to this sort of behavior. If someone in China were to take the Microsoft source and re-sell it as a Chinese OS, the government would probably smile and buy the OS and say they were "supporting the Chinese economy" or "supporting the Chinese developers". This happened to Cisco, when a Chinese company stole their source and re-sold the exact same product. The government didn't do a damn
    - - Re: (Score:2)
        
        by fbjon ( 692006 ) writes:
        
        I work for Microsoft, and trust me, we are so disorginised we dont even know that there exist teams in one office doing the SAME product as another team in a different office but the SAME BUSINESS UNIT. I dont think we can organize a lightbulb replacement party.
        Sue the other guys for plagiarism.
not saying it's the case (Score:5, Insightful)

by creativeHavoc ( 1052138 ) writes: on Sunday April 08, 2007 @03:21PM (#18657077) Homepage

while i am not insisting that it is the case, it seems like it could easily be the same logic flaw. Different algorithms and code can produce the same mistake if you are using the same mis guided logic behind the problem. Thats why you see the same bugs in students' code in university, even when worked on separatly during a lab.

Share
twitter facebook
- Re: (Score:1)
  
  by obarel ( 670863 ) writes:
  
  Ask twenty students to implement a function that reverses a linked list, and you'll find that those that make mistakes probably make the same mistakes (not dealing with zero-length lists, for example).
  
  It's not the same situation, though, because Google only employs people who can reverse a linked list - there must be some other explanation.
- Re:not saying it's the case (Score:4, Informative)
  
  by eggstone ( 957547 ) writes: on Sunday April 08, 2007 @03:53PM (#18657269)
  
  Well, if it is kind of programing bug, then the reasoning is fine. However, google is simply using sougou's dictionary. In fact, sougou's dictionary contains several developers' names which can be produced as the 1st choice if input their name, such as Tong Zi Jian, Zhao Li Yang, Lv Jie Yong, and Ru Li Yun. It is impossible for google to use sougou's developers' names in google's dictionary except they are simply copying the whole dictionary. Notice that although those names were in google's Pinyin input 1.0.15.0. they are removed in the newer version 1.0.16.0.
  
  Parent Share
  twitter facebook
- Re: (Score:2, Interesting)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by microbee ( 682094 ) writes:
  
  : while i am not insisting that it is the case
  
  So should this be modded OT?
This is big news in China (Score:5, Informative)

by Anonymous Coward writes: on Sunday April 08, 2007 @03:29PM (#18657131)

Unfortunately, since the IME is only used by Chinese speakers, most reports and discussions about this are in Chinese as well. For example, Sina has published an announcement (in Chinese) [sina.com.cn] from Google admitting that they indeed "used data from non-Google sources" during the testing stage.

There were actually much more evidence than the PC World article mentioned, the most convincing being that Google IME included many names of the developers of Sogou IME.

Although according to the other users (I don't use Google Pinyin myself now, or Windows for that matter), the error has been fixed - and those developer names has been removed - in the most recent version of Google IME (1.0.17.0).

Ming

Share
twitter facebook
- Re: (Score:1)
  
  by skippybosco ( 897134 ) writes:
  
  or the (ironically google) translated version of the article can be found here:
  
  http://translate.google.com/translate?u=http%3A//t ech.sina.com.cn/i/2007-04-08/18351454194.shtml&hl= en&langpair=zh%7Cen&tbb=1&ie=GB2312/ [google.com] ...most interesting? "Google accused illegal use thesaurus and expressed strong indignation."
  
  wow.. ;-)
- Re:This is big news in China (Score:5, Interesting)
  
  by epine ( 68316 ) writes: on Sunday April 08, 2007 @06:27PM (#18658175)
  
  I was involved in a very early effort to develop a pinyin based IME. Think 4.77Mhz. It worked quite well, in fact. Good dictionaries are hard to come by. Back then, not easy at all. In fact, we liberated data quite freely from any resource we could obtain. I made it a policy that each dictionary term had to come from at least two independent sources (sources unlikely to have stolen from each other). The singletons had to be manually reviewed by a qualified linguist. It's like that old saying: stealing from one source is plagiarism, stealing from multiple sources is research.
  
  Eventually I found an extremely effective compression method (the IME portion of our system fit into 128K including dictionary) using a hash table approach. Collisions in the hash table generated spurious terms. The spurious terms that conflicted with legitimate terms were suppressed by a "phantom dictionary". The rest of the phantoms were allowed to remain. These only came up for pinyin bigrams (almost always bigrams) that were non-productive in the stock dictionary. The user supplied dictionary took priority over the system dictionary (and the phantoms it contained) so conflicts didn't arise.
  
  Because of the way the hash table was constructed, our dictionary generated an exponentially increasing number of phantoms with increasing phrase length. By the time you got to four character phrases, the phantoms vastly outnumbered the legitimate vocabulary. Note that our system distinguished 8000 hanzi characters for the input system, so the space of possible four character phrases was up in the trillions, and the phantoms were extremely sparse by that metric, and never seen in the wild.
  
  Any competitor who had decided to enumerate our dictionary (I could have suggested several practical ways to achieve this) would have ended up with barrels of nonsense, unless they also devoted the resources, as we had, to "research" rather than plagiarise.
  
  Nor was it possible to copy our dictionary directly in its compressed format, as the hash function was tied to a hardware dongle. I never heard that the algorithm embedded in the dongle was ever cracked directly, but I do know that the vendor's recommended algorithm for feeding the dongle was awful, and failed most of my statistical tests. We beefed up the routine until many (but far from all) of the statistical tests for randomness were satisified, and then ran the device ten times overspec to get the performance we required. Fun times.
  
  A funny story is that our software was listed as "cracked" on some hacker site because some l33t dude had removed the code to test for the presence of a functioning dongle, and the message we displayed "where's your dongle?" (OK, it wasn't quite like that) without noticing that with the dongle absent, the pinyin input method used white noise as the dictionary hash function, and produced nothing but chicken soup for the hanzi output text. To successfully change the hash function and maintain the dictionary compression ratio, you had to solve a bipartite graph matching problem and then recompute the phantom table, and none of that code shipped with the product.
  
  In this era, with the amount of data you can scrape off the internet on a the barest whim, I'm a bit shocked that anyone still stoops to our tried and true "research" methodologies from the mid eighties. My involvement ended around 1991 as it became apparent that Windows 3.x was going to take over the world. My joy in life at that time was writing bug-free code, and I didn't see any way to achieve that the way the world was turning. If someone tapped me on the shoulder and woke me up after my fifteen year snooze, I could probably suggest many fascinating IME features I had planned back then that still haven't been implemented, though I haven't checked on this in a long while. We already had simplified/classical, Mandarin/Cantonese working from a single dictionary. It wasn't proper dialectic Cantonese though, that was something I wished to do, but never completed. We did all this pre Unicode, so we had to invent our own Unicode, too. Anyone need a first edition Unicode standard? I think I've got three.
  Read the rest of this comment...
  
  Parent Share
  twitter facebook
  - - Re: (Score:2)
      
      by HungWeiLo ( 250320 ) writes:
      
      For Cantonese, the Wade-Giles or Yale systems are still widely used. IIRC, The Yale system is considered to be more academically correct. Although, there is technical debate on what is "real" Cantonese. There are literally hundreds of different dialects of Cantonese, but they're overshadowed by the Hong Kong flavor as they have the most cultural hegemony.
a dark secret revealed... (Score:1)

by malevolentjelly ( 1057140 ) writes:

I guess this explains google's amazing capability and seemingly flawless record- no company could be that clever!

They were pirates all along! I knew their original idea of 'searching the web' seemed oddly similar to their rival yahoo...

I'll be watching you, google!
- Of course they're pirates! (Score:4, Funny)
  
  by Etherwalk ( 681268 ) writes: on Sunday April 08, 2007 @04:10PM (#18657373)
  
  They have a copy of the internet! A COPY! How much of that do you think is copyrighted?
  
  Parent Share
  twitter facebook
Pot calling Kettle (Score:1, Flamebait)

by Original Replica ( 908688 ) writes:

Did I read this right, to understand that the Chinese are complaining about someone stealing IP ?
- Re: (Score:1)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by Original Replica ( 908688 ) writes:
  
  Yes, to blame all Chinese for the actions of the few is painting with too broad a brush. My bad. I will be curious to see if Google is pursued with more significantly more vigor than other, smaller profile, IP offenders. The PRC haveing a very different set of IP rules for foreign companies could have deep consiquences for innovation coming form multinationals.
hits++ (Score:3, Funny)

by no-body ( 127863 ) writes: on Sunday April 08, 2007 @03:39PM (#18657195)

bashing - or trying to bash Google is a sure way to increase hits these days.

Share
twitter facebook
Maybe more to the story (Score:1, Flamebait)

by icepick72 ( 834363 ) writes:

Did Google claim to create the product from scratch, or is the media insinuating it?
Did Google say they did not hired another company to program the software, or is the media insinuating it?
Did Google say who they're Chinese rivals vs. allies are, or did the media tell us?
One possible situation: What if Google hired another company to create the software? What if that 3rd party company stole IP? What if Google is looking into the issue right now and therefore won't comment to the public media? (pure spe
- Re: (Score:1)
  
  by Lord Balto ( 973273 ) writes:
  
  "You see, Google has not answered many questions yet and this does not an admission of guilt."
  
  Shouldn't that be "You see, Google has not answered many questions yet and this does not an admission of guilt make it"? ;-)
- Re: (Score:1)
  
  by DDLKermit007 ( 911046 ) writes:
  
  Google's primary communication has been with CHINESE speakers. Ya know, the people that would use it. They have supposedly rectified the problem, and effectively made this a non-issue.
  - Re: (Score:2)
    
    by icepick72 ( 834363 ) writes:
    
    They have supposedly rectified the problem
    
    Please post link to relevant information. I want to read it for myself. TIA.
    - Re: (Score:2)
      
      by DDLKermit007 ( 911046 ) writes:
      
      http://tech.sina.com.cn/i/2007-04-08/18351454194.s html [sina.com.cn]
      There you go, and as other posters have noted on here, things appear to be rectified in the latest beta release (1.0.17.0).
      - Re: (Score:2)
        
        by icepick72 ( 834363 ) writes:
        
        For the benefit of non-Chinese speaking users reading this thread, for what the BETA translation is worth [google.com].
On the Bright Side (Score:2, Funny)

by Anonymous Coward writes:

Google code search (http://www.google.com/codesearch [google.com]) Obviously Works!
This wouldn't be the first time... (Score:3, Interesting)

by Anonymous Coward writes: on Sunday April 08, 2007 @03:45PM (#18657229)

This wouldn't be the first time that Google used other people's software in their live services without due credit.

Another example is the spell checkers that Google's Gmail have for the dozen or so languages to support. Nowhere to be found is an explanation of where these spell-checkers come from, so it would be safe to assume that Google wrote them themselves, or at least bought them from some company that allowed them not to give them credit? Well, the reality is more sad. It turns out that Google actually uses the free-software project, aspell, to do its spell-checking, and the dozens of person-years that went into writing the actual dictionaries for aspell were simply co-opted by Google. When you spell-check in some language X, you do not see any credit for the person who wrote the dictionary, or to aspell. Even if you look very hard in the documentation, this credit is nowhere to be found. It's all very legal under the GPL, but ugly behavior, especially for scientists (like most of the Google who's-who) who are used to giving credit where credit is due.

And how do I know that Google's Gmail uses free-software spell-checkers? Well, I used a method very similar to that described in the article. I'm the author of one of the dictionaries that Google "adopted", and I deliberately inserted some "misspelled" (aka "easter-egg") words into the dictionary, so I can immediately recognize a spell-checker based on my dictionary - and it turns out that Google's Gmail spell-checker is indeed based on my dictionary.

So it's great that Google reuses other software - free-software and commercial software - but they should learn to give credit where credit is due. It doesn't have to be the google.com homepage (of course) - even in some deep-down help page would do.

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by limecat4eva ( 1055464 ) writes:
  
  If you didn't want society as a whole to benefit from your code, why did you release it under an open source license in the first place? God almighty, you GPL whiners are the best argument going for BSD-style licenses.
- Re: (Score:2)
  
  by Watson Ladd ( 955755 ) writes:
  
  So your the one who made me lose points on that paper!
- Re: (Score:1, Interesting)
  
  by Dominic_Mazzoni ( 125164 ) writes:
  
  I work at Google. Email me with more information and I'll pass it on to the Gmail team.
  - Re: (Score:2)
    
    by Dominic_Mazzoni ( 125164 ) writes:
    
    Why was this modded as flamebait? Look me up, I really do work at Google. I'd be happy to pass this on to the Gmail team, I think it deserves a response.
- Re:This wouldn't be the first time... (Score:5, Insightful)
  
  by Anonymous Coward writes: on Sunday April 08, 2007 @04:23PM (#18657477)
  
  the dozens of person-years that went into writing the actual dictionaries for aspell were simply co-opted by Google.
  Get off your high horse - you're just another holy roller.
  
  Thousands of people donate their time, money, and code to GPL-licensed projects. As one of those contributors, I can tell you that I don't believe that Google is doing anything wrong at all with aspell. The terms of the license are clear. Users are no way required to give attribution. In fact, there is not even a suggestion, hint, or implication that attribution would be nice. You suggesting that it should be that way is fine, but to state that aspell was "co-opted" is factually incorrect and falsely implies that Google is doing something against the GPL license.
  
  If you, as a contributor to aspell, don't like aspell's license terms, you are free to start another project with similar goals under different license terms.
  
  Parent Share
  twitter facebook
- Bullshit (Score:2)
  
  by microbee ( 682094 ) writes:
  
  You are totally comparing apples to oranges. aspell is a released package for people to use - much like any software components such as libc for people to build software upon. It's understandable that Google does not credit it on their website - it would be too many of those.
  
  sogou's dictionary is a totally different story. It's never released separately with the intent for others to reuse freely. This is bloody copyright violation. I am really surprised that Google has done this.
  
  I remember before that googl
- But (Score:2)
  
  by Colin Smith ( 2679 ) writes:
  
  You released it under the GPL... Didn't you read the license? If you want recognition, which yes is a valid requirement, shouldn't you use a license which demands attribution?
- Re:This wouldn't be the first time... (Score:5, Interesting)
  
  by cubic6 ( 650758 ) writes: <tom@nOSpaM.losthalo.org> on Sunday April 08, 2007 @04:45PM (#18657631) Homepage
  
  Care to release those words that prove that Google uses Aspell? I don't see any proof in your post, just claims that are impossible to verify because you give very little information. You're an author of some dictionary that's used in Aspell, you put intentionally misspelled words in your dictionary, but you don't tell us which dictionary or which words, so what do we have to go by? Why is your post any more trustworthy than any other AC post? Furthermore, it's pretty suspicious that you claim that you INTENTIONALLY put incorrect words in your dictionary to catch people using it as part of a larger project, when such use is perfectly legal. Things like that undermine Aspell's credibility as a reference tool, which, as a contributor, I would think you'd care about.
  
  Parent Share
  twitter facebook
  - - Re: (Score:2)
      
      by cubic6 ( 650758 ) writes:
      
      '"Things like that" have been done for thousands of years in dictionaries, encyclopedias, and even maps. All of this backed by the idea that if you give something to someone, you somehow still have power over it because it's somehow your "property".' The difference is that Aspell is under an open license that encourages people to use it with very few restrictions. There's no need to be paranoid about someone else using your work, you've EXPLICITLY ALLOWED it by releasing it under that license.
  - - Re: (Score:2, Funny)
      
      by pepsee ( 6891 ) writes:
      
      WTF?! So that's why "lkjsdflkjsaf" made it into my term paper!
- Re: (Score:1)
  
  by noidentity ( 188756 ) writes:
  
  Why don't you simply add giving credit a requirement in your licensing terms? It doesn't make sense that you'd have this implicit requirement that you don't state, but get in a fit if not met.
- Re: (Score:2)
  
  by Bogtha ( 906264 ) writes:
  
  I'm the author of one of the dictionaries that Google "adopted"
  
  Then if you wanted attribution, you should have included it in the license. As far as I know, the Googlebot doesn't index people's minds yet, so if you want something, you need to ask for it.
Pinyan Input Method (Score:1, Troll)

by dtfinch ( 661405 ) * writes:

I misread pinyin and just about lost it.
http://en.wikipedia.org/wiki/Kenneth_Pinyan [wikipedia.org]
- Re: (Score:2)
  
  by ColdWetDog ( 752185 ) writes:
  
  Ewwww, Gross!
  Next time, use cut and paste...
evidence not very clear (Score:2)

by belmolis ( 702863 ) writes:

Without more evidence it isn't clear to me whether Google has done anything wrong. The fact is, there are various files around pairing Chinese characters and words written in Chinese characters with their Roman equivalents, many of them "free" for various notions of "free". Fairly comprehensive lists of this type have been around long enough that it is unlikely that anybody would start from scratch. You'd get some existing lists, combine them, review them for errors, and look for things that need to be add
- - Re: (Score:2)
    
    by belmolis ( 702863 ) writes:
    
    So what? That MIGHT be evidence that Google incorporated Sogou's data, but as I said, we need to know more about where Sogou got it's data. Furthermore, it isn't even clear evidence that Google borrowed from Sogou. They might have put in those names as a compliment, or as part of a long list of personal names that they included. (The total number of Chinese personal names of any frequency is actually not that large.)
    - Re: (Score:2)
      
      by Achromatic1978 ( 916097 ) writes:
      
      I love how you've dismissively waved your hand and blown this off as either a derivative work (Hint, taking a dictionary of 100,000 words and adding 5,000 of your own does not creative a derivative work of 105,000 words that is all covered and okay now), but my favorite is this laughable gem, "they might have put in the [Sogou developers] names as a compliment". Perhaps, or perhaps the Sogou developers put their names in a dictionary to catch out people using their work. Sitting where I am, I see the latter
that's not plagiarism (Score:1)

by nanosquid ( 1074949 ) writes:

"Plagiarism" is an academic and creative concept. Plagiarism is about denying an individual credit for his creativity and misrepresenting someone else's ideas as your own. Using a text input method file from another product in your own product is not plagiarism because it doesn't involve creative ideas and because creating this kind of product is not about giving credit to individuals. Google may or may not be violating copyright (but, then, this is China).

That is not to say that a company can never commi
Combing (Score:5, Insightful)

by eMbry00s ( 952989 ) writes: on Sunday April 08, 2007 @04:39PM (#18657583)

Everybody who says something along the lines of "bah, chinese complaining about stealing" should note that all Chinese are not connected into one single conscious entity, but are different individuals.

The people who own this IP need not have stolen any other IP.

It is as dumb as saying that all Americans are christian, guntouting, fat fuckasses.

Share
twitter facebook
- Re: (Score:1)
  
  by Seumas ( 6865 ) writes:
  
  all Chinese are not connected into one single conscious entity, but are different individuals.
  Well, duh. That's the definition of communism.
  
  Wait. I mean... what the fuck?!
- Re: (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  It is as dumb as saying that all Americans are christian, guntouting, fat fuckasses.
  Well, I'm not christian, but I am gun-touting - I tout the virtues of guns frequently. I admit I am fat, but if you want to find out if I'm a fuckass you'll have to drop trou and bend over.
  Seriously though, China's entire industrial base and manufacturing technology, and its start in the industrialized world, is directly based on copying other's IP. China is famous for turning out copies of other company's equipment so ide
Plagerism is all too common in China (Score:1)

by Diamonddavej ( 851495 ) writes:

I read recently that plagiarism is common in China, it extends not just to software (apparently) but to journal publications, this is due to English being a second language. Researchers who often have an incomplete grasp of English, copy large sections of text from western papers in order to help them write their papers. I think this is a bit of a grey area. I have myself been utterly suck for words and resorted to using a phrase or colloquialism, just a few words long, from another paper (I always re-write
Sohu has patents?? (Score:1)

by iminplaya ( 723125 ) writes:

So what's all this talk about China not having legal restrictions on the distribution of ideas? (People get mad when I call it IP law)
Clarifications. (Score:3, Informative)

by MaWeiTao ( 908546 ) writes: on Monday April 09, 2007 @12:11AM (#18660017)

There seem to be a few misunderstandings here regarding Chinese text entry. First, because this is China and the official language is Mandarin Chinese. This means there are 37 distinct syllables, not the hundreds some have claimed. The distinction is that in addition to those there are 5 tones. This doesn't mean there are that many syllables times the number of tones. Think of tones as accents. Additionally, certain syllables only appear in certain places in a word. So it isn't quite an overwhelming task to type Chinese on a computer as you'd think.

The keyboards used in China, Taiwan, Singapore and even Japan are almost always QWERTY, but that's irrelevent. Virtually nobody except Westerners use that to type. Printed on Chinese keyboards are 4 sets of characters. The first set is our alphabet, and the next 3 sets include characters for different text entry methods.

I don't know about China, but in Taiwan one of the sets is Zhuyin fuhao. That system, as I've seen mentioned here, is a set of simple characters, each corresponding to a distinct sound, 21 consonants and 16 syllables. It's the closest thing to a Chinese alphabet in existence. It's only really used for educational purposes, but I don't see why it isn't widely adopted in the same way the Japanese use hiragana or katakana.

Anyway, that system is comparable to Pin Yin, which is more or less a romanized version of the same thing and it's what is used for signage in China, and now in Taiwan as well. This is the method a westerner is more likely to use to type Chinese.

The funny thing about Chinese is that the same word could have many different meanings each of which has a distinct character. So you type the word, including the appropriate tones and up comes a list with all the corresponding characters. Then one character is chosen from a list. It's kind of like predictive text. In same cases, when a set of characters produce a meaning, upon entering the first character the user is given a list of additional characters. It's all done, obviously to speed up the typing process.

So, this input method can be sufficiently quick. Comparable to typing English. However, there are other entry methods, based on different factors which can be more precise and significantly quicker. I have no idea how to use any of those, but it's my impression that typing in those methods can be quite faster than most people typing in English.

Of course, this begs the question, why did Google bother coming up with their own system? Things are always a bit of a mess with all the options out there.

As for the possibility of code being plagiarized. I'm really not surprised at all. This is one of the consequences of outsourcing. The company might have a policy against this sort of thing, but the programmer clearly didn't care. He probably thought he could save himself a bit of trouble and ultimately saw nothing wrong with it. I've experienced similar things first hand. Unless you have a team you trust there needs to be a lot of oversight and careful management

Share
twitter facebook
- - Re: (Score:3, Informative)
    
    by MaWeiTao ( 908546 ) writes:
    
    You might be right. I can't speak for everyone in China. In Taiwan, and other regions, however, no one uses Pinyin or any other romanized systems, not even with mobile phones.
Google evolves (Score:5, Funny)

by Lorean ( 756656 ) writes: on Monday April 09, 2007 @01:00AM (#18660197)

Google has learned how to do business in China.
Congrats to them.

Share
twitter facebook
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
Plagiarism? (Score:2)

by PMuse ( 320639 ) writes:

There is no action for plagiarism. Either this is a copyright infringement issue, or it's nothing.
Every day (Score:2)

by stonecypher ( 118140 ) writes:

For a company whose motto is "don't be evil," it strikes me as hilarious that Google is now responsible for more corporate piracy than Microsoft.
- Re: (Score:1)
  
  by Threni ( 635302 ) writes:
  
  > Then again, you had to be really stupid to believe that to begin with.
  
  This sort of thing always reminds me of Lewis Carroll's excellent "Alice in Wonderland":
  
  `I like the Walrus best,' said Alice: `because he was a little sorry for the poor oysters.'
  
  `He ate more than the Carpenter, though,' said Tweedledee. `You see he held his handkerchief in front, so that the Carpenter couldn't count how many he took: contrariwise.'
  
  `That was mean!' Alice said indignantly. `Then I like the Carpenter best -- if he didn
  - Re: (Score:3, Funny)
    
    by linuxmop ( 37039 ) writes:
    
    That's funny, because your excerpt reminds me of Lewis Carroll's Through the Looking Glass. :)
    - Re: (Score:1)
      
      by Threni ( 635302 ) writes:
      
      Yes, I realised that immediately after I posted (I have no idea why you can't edit your posts here). I think I made the mistake because I never read one book without reading the other so it's all one big story in my head!
- Re: (Score:1)
  
  by ChromeAeonium ( 1026952 ) writes:
  
  How long has laboratory been a verb?
  - Re: (Score:1, Informative)
    
    by Anonymous Coward writes:
    
    How long has laboratory been a verb?
    The title previously read "Google's Faces Plagiarism Questions Over Chinese Software"
- Re: (Score:2)
  
  by Aladrin ( 926209 ) writes:
  
  Using http://www.google.com/search?hl=en&q=use%20posses s ive%20with%20verb&btnG=Google+Search [google.com] you find http://usawocc.army.mil/IMI/wg6.htm [army.mil] to be the first result. At the very end of the page, what's that say? Oh yeah, hmm. You CAN do that.
  
  In this case it's not only wrong, but the s doesn't belong there at all. If you're going to grammar-nazi, at least do it well.
  - Re: (Score:1, Informative)
    
    by Anonymous Coward writes:
    
    The link you mentioned specifically refers to gerunds. A gerund (verb ending in "ing") is not the same thing as a standard (not ending in "ing") verb. If you're going to correct the grammar police, at least make sure you've got your own grammar correct...
- Re: (Score:2)
  
  by I(rispee_I(reme ( 310391 ) writes:
  
  Here's [google.com] my suggestion.
  
  Back on topic, if IP doesn't exist in China, then why does this matter?
  Contrariwise, if IP is recognized in China, why doesn't somebody tell the Chinese? ;)
  - Re: (Score:3, Insightful)
    
    by microbee ( 682094 ) writes:
    
    This is not just about China. Both GOOG and SOHU are NASDAQ companies, and the software is released to the world (including US). So SOHU could sue GOOG.
    
    If GOOG or whatever US companies think a Chinese company infringed on their rights, they can sue, instead of whining on online forums.
    
    So, what's your exact point?
- true perspective (Score:4, Interesting)
  
  by WindBourne ( 631190 ) writes: on Sunday April 08, 2007 @07:55PM (#18658775) Journal
  
  It is Chinese stealing from other Chinese. Not really surprising since they have no qualm stealing from any company and then trying to claim it for their own work.
  
  It also partially why you do not want to use china to do any IP type work. They will steal from others and leave your company at risk, as well as allow other chinese companies to steal from yours.
  
  Understand that this is simply a big part of who they are now. They have been taught for the last 60 years that all the property belongs to the state and the community. It will be difficult for them to consider private ownership of anything for a number of generations. I am guessing that it will end about the time that China considers itself a superpower (which will happen). Sadly, that may be when a war occurs with between either China and (America|Russia|Europe|India). Offhand, I am guessing Russia. They will need a number of their resources (land, water, oil, etc).
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by aminorex ( 141494 ) writes:
    
    Given the surplus of males in the populations of both India and China, an apocalyptic conflagration seems inevitable, in order to cull and rebalance those two populations.
    - Re: (Score:2)
      
      by WindBourne ( 631190 ) writes:
      
      I agree. That is what is going to lead China to invade others. The other big issue that they will have is that if the climate models come true, then western China and major parts of eastern, will have great difficulty with water, and subsequently, food. They will need to obtain it from somewhere. It will either be the Himalayas or it will be Russia.
  - - Re: (Score:2)
      
      by WindBourne ( 631190 ) writes:
      
      Actually, I do know your spacecrafts, very well. I have followed your program with interest for some time. I have worked on NASA projects as well as have worked for other US gov. groups (during mid 80's and then post 9/11), so I have a very strong interest in China and other groups (of coruse, back in the 80's, it was USSR). Even now, I still read some of your papers (it is a bit like reading foxnews or pravda, but still worth trying to understand the leadership of china). [xinhuanet.com]
      
      In addition, I have taught and wo
  - - MOD GP UP, please (Score:2)
      
      by WindBourne ( 631190 ) writes:
      
      China in many ways, is the ultimate in Capitalism. In fact, had American founders not implemented small IP rights, we probably would have been very similar in out business approach.
      
      I know that lots of theft goes on between the business (it is obviously not just chinese stealing from outsiders).
      
      But as you point out, they improve on the design. It is interesting to see some of the designs that come out. In particular, I have seen a number of toys that were made here be heftier and last much longer with bet
- Translation of Google China's Official Blog (Score:2)
  
  by amerinese ( 685318 ) writes:
  
  Since the announcement of Google's Pinyin Input System on April 4th, 2007, Google has received large amounts of feedback and suggestions. Among those, we are especially concerned about suspicions of the origins of the vocabulary database for Google's Input System. During the testing stage, indeed, it included non-Google data. We are willing to face this problem and, thus, apologize to both our users and Sohu. We have simultaneously taken action, this Sunday, April 8th, 2007, noontime, we have completed

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Google Should Defend Themselves the OpenBSD Way (Score:5, Funny)

The plagiarism has been confirmed by Google (Score:3, Informative)

Re: (Score:2, Informative)

I'm a stupid American, so... (Score:5, Funny)

Re: (Score:1, Funny)

Input method (Score:5, Interesting)

Re: (Score:2)

Identical typos... (Score:5, Insightful)

Re:Identical typos... (Score:5, Insightful)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:Identical typos... (Score:5, Insightful)

Ironic, isn't it? (Score:2, Insightful)

Re: (Score:1, Insightful)

Re: (Score:3, Funny)

Re: (Score:1)

Re:Ironic, isn't it? (Score:5, Insightful)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:1)

Re: (Score:2)

Re:Ironic, isn't it? (Score:4, Insightful)

Or, basically... (Score:5, Insightful)

Hmm... (Score:5, Funny)

Re: (Score:3, Interesting)

Re: (Score:2)

not saying it's the case (Score:5, Insightful)

Re: (Score:1)

Re:not saying it's the case (Score:4, Informative)

Re: (Score:2, Interesting)

Re: (Score:2)

This is big news in China (Score:5, Informative)

Re: (Score:1)

Re:This is big news in China (Score:5, Interesting)

Re: (Score:2)

a dark secret revealed... (Score:1)

Of course they're pirates! (Score:4, Funny)

Pot calling Kettle (Score:1, Flamebait)

Re: (Score:1)

Re: (Score:2)

hits++ (Score:3, Funny)

Maybe more to the story (Score:1, Flamebait)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

On the Bright Side (Score:2, Funny)

This wouldn't be the first time... (Score:3, Interesting)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:1, Interesting)

Re: (Score:2)

Re:This wouldn't be the first time... (Score:5, Insightful)

Bullshit (Score:2)

But (Score:2)

Re:This wouldn't be the first time... (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2, Funny)

Re: (Score:1)

Re: (Score:2)

Pinyan Input Method (Score:1, Troll)

Re: (Score:2)

evidence not very clear (Score:2)

Re: (Score:2)

Re: (Score:2)

that's not plagiarism (Score:1)

Combing (Score:5, Insightful)

Re: (Score:1)

Re: (Score:2)

Plagerism is all too common in China (Score:1)

Sohu has patents?? (Score:1)

Clarifications. (Score:3, Informative)