Massive Yandex Code Leak Reveals Russian Search Engine's Ranking Factors (arstechnica.com) 24
An anonymous reader quotes a report from Ars Technica: Nearly 45GB of source code files, allegedly stolen by a former employee, have revealed the underpinnings of Russian tech giant Yandex's many apps and services. It also revealed key ranking factors for Yandex's search engine, the kind almost never revealed in public. [...] While it's not clear whether there are security or structural implications of Yandex's source code revelation, the leak of 1,922 ranking factors in Yandex's search algorithm is certainly making waves. SEO consultant Martin MacDonald described the hack on Twitter as "probably the most interesting thing to have happened in SEO in years" (as noted by Search Engine Land). In a thread detailing some of the more notable factors, researcher Alex Buraks suggests that "there is a lot of useful information for Google SEO as well."
Yandex, the fourth-ranked search engine by volume, purportedly employs several ex-Google employees. Yandex tracks many of Google's ranking factors, identifiable in its code, and competes heavily with Google. Google's Russian division recently filed for bankruptcy after losing its bank accounts and payment services. Buraks notes that the first factor in Yandex's list of ranking factors is "PAGE_RANK," which is seemingly tied to the foundational algorithm created by Google's co-founders.
As detailed by Buraks (in two threads), Yandex's engine favors pages that: - Aren't too old
- Have a lot of organic traffic (unique visitors) and less search-driven traffic
- Have fewer numbers and slashes in their URL
- Have optimized code rather than "hard pessimization," with a "PR=0"
- Are hosted on reliable servers
- Happen to be Wikipedia pages or are linked from Wikipedia
- Are hosted or linked from higher-level pages on a domain
- Have keywords in their URL (up to three)
Yandex, the fourth-ranked search engine by volume, purportedly employs several ex-Google employees. Yandex tracks many of Google's ranking factors, identifiable in its code, and competes heavily with Google. Google's Russian division recently filed for bankruptcy after losing its bank accounts and payment services. Buraks notes that the first factor in Yandex's list of ranking factors is "PAGE_RANK," which is seemingly tied to the foundational algorithm created by Google's co-founders.
As detailed by Buraks (in two threads), Yandex's engine favors pages that: - Aren't too old
- Have a lot of organic traffic (unique visitors) and less search-driven traffic
- Have fewer numbers and slashes in their URL
- Have optimized code rather than "hard pessimization," with a "PR=0"
- Are hosted on reliable servers
- Happen to be Wikipedia pages or are linked from Wikipedia
- Are hosted or linked from higher-level pages on a domain
- Have keywords in their URL (up to three)
left out the top ranking factor (Score:5, Funny)
Putin endorsement level.
Re: (Score:2, Interesting)
I know you mean to be funny, but it would be bigger news if it turns out that Yandex actually serves more ethical and accurate results than Google.
Re:left out the top ranking factor (Score:5, Informative)
it would be bigger news if it turns out that Yandex actually serves more ethical and accurate results than Google.
According to leaks, "Yandex tried to make it impossible to stumble upon the image of the current Russian president when searching for individual words: pizdabol, bald, fucker // and also phrases: bunker grandfather, master thief, condom of all Rus', dick in a suit, fuck in the hole // and whole sentences: what do pedophiles look like, when he dies, strange creature waving". https://breached.vc/Thread-Yan... [breached.vc]
When using google, the first result for "bunker grandfather" is the image of Putin associated to the Wikipedia page explaining the phrase. So at least for this one, Yandex is less accurate than google, since "bunker grandfather" is an actual (albeit irreverent) phrase used to refer to Putin and should be returned as first result.
Google does not remove these disrespectful phrases used to refer to top leaders. In Google images, "Orange man" returns images of Donald Trump, and "President of the Rich") returns Emmanuel Macron. (At least in my place, someone could check from another country with another search history.)
Re: (Score:2)
It's only natural to expect that every business will try to not get shut down by powerful entities.
It's the reason why google worked so hard to remove copyright infringing search results from it's system.
Re: (Score:3)
They're the same on the angle of being something that if you don't remove, you get shut down, regardless of your opinion on the matter.
Re: (Score:1)
Copyright is mostly evil. Getting rid of it would do more good than harm.
If Biden wasn't corrupt and so old the end of the world is coming for him personally due to his old age ( someone that corrupt is unlikely to care about any other human being - he is like the thief who doesn't hesitate to break a 500 dollar window to steal $3.50 on the seat ) , I wonder if sufficient politicians would find it worth it to fight a proxy war in Ukraine to keep Europe energy independent from Russia when they couldn't be b
Re: (Score:3)
Cool, now do a crazy rant about how Biden doesn't show up as the first result when you type in "senile president", "ukraine corruption godfather", "the big guy", or "hairsniffing pedophile" into Google...
Except they all do except the second one, which is "Ukraine Authorities Arrest Suspected 'Godfather of Contraband'" which is obviously a better result
Re: (Score:2)
Cool, now do a crazy rant about how Biden doesn't show up as the first result when you type in "senile president", "ukraine corruption godfather", "the big guy", or "hairsniffing pedophile" into Google...
I did the Google Image search and I have a full page of Biden's pictures when looking for "senile president". Google images even suggests me to add the keyword "Dementia" and the thumbnail for "Dementia" is another Biden picture. The keyword "Democrat" is also suggested.
Re: (Score:3)
A lot of mistranslations there, or at least, some nuance is lost
> pizdabol
A very vulgar word for a habitual liar. They also block its relation to Medvedev.
> fucker
Literally "dickhead". Also an actual reference to Putin thanks to Ukrainians IIRC.
> bald
Baldheaded in the sense of partial hair loss and with an insulting undertone.
Especially interesting is the fact that they not only block the association with Putin, but also all the associations with any other irreverent terms for him like "rat", "moth
Re: (Score:2)
It does in most cases empirically. I haven't found one where it doesn't but I haven't looked that hard. Could be a don't be evil loss leader
Of course this 'hack' is almost certainly an intentional leak of the code they want you to see, and may not be the code they actually run.
Continual dictatorship (Score:2)
People who breed animals say that it takes about 10 generations before a wild animal becomes tame.
The Russian revolution was in 1917, and a human generation (age of first child) is about 25 years in Russia [cia.gov].
Russians have been eliminating dissidents from their population for a little more than 4 generations. Recent protests of the Ukraine war led to a few (like, three) thousands of dissidents being arrested - and news sources report that those people will be simply "disappeared".
I conclude that Russia will
Re: (Score:1)
Now do google... (Score:2)
Nearly 45GB of source code files, allegedly stolen by a former employee, have revealed the underpinnings of Russian tech giant Yandex's many apps and services. It also revealed key ranking factors for Yandex's search engine, the kind almost never revealed in public. [...]
Good. Now do Google.
Re: (Score:3)
Given how Russia "innovates", they probably have.
SEO (Score:2)
Bad timing for Google (Score:3)
They no longer have enough staff to pour over the code and see what they can use from it.