Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet Google Privacy Security

Massive Yandex Code Leak Reveals Russian Search Engine's Ranking Factors (arstechnica.com) 24

An anonymous reader quotes a report from Ars Technica: Nearly 45GB of source code files, allegedly stolen by a former employee, have revealed the underpinnings of Russian tech giant Yandex's many apps and services. It also revealed key ranking factors for Yandex's search engine, the kind almost never revealed in public. [...] While it's not clear whether there are security or structural implications of Yandex's source code revelation, the leak of 1,922 ranking factors in Yandex's search algorithm is certainly making waves. SEO consultant Martin MacDonald described the hack on Twitter as "probably the most interesting thing to have happened in SEO in years" (as noted by Search Engine Land). In a thread detailing some of the more notable factors, researcher Alex Buraks suggests that "there is a lot of useful information for Google SEO as well."

Yandex, the fourth-ranked search engine by volume, purportedly employs several ex-Google employees. Yandex tracks many of Google's ranking factors, identifiable in its code, and competes heavily with Google. Google's Russian division recently filed for bankruptcy after losing its bank accounts and payment services. Buraks notes that the first factor in Yandex's list of ranking factors is "PAGE_RANK," which is seemingly tied to the foundational algorithm created by Google's co-founders.

As detailed by Buraks (in two threads), Yandex's engine favors pages that: - Aren't too old
- Have a lot of organic traffic (unique visitors) and less search-driven traffic
- Have fewer numbers and slashes in their URL
- Have optimized code rather than "hard pessimization," with a "PR=0"
- Are hosted on reliable servers
- Happen to be Wikipedia pages or are linked from Wikipedia
- Are hosted or linked from higher-level pages on a domain
- Have keywords in their URL (up to three)

This discussion has been archived. No new comments can be posted.

Massive Yandex Code Leak Reveals Russian Search Engine's Ranking Factors

Comments Filter:
  • by backslashdot ( 95548 ) on Monday January 30, 2023 @08:09PM (#63252391)

    Putin endorsement level.

    • Re: (Score:2, Interesting)

      by Narcocide ( 102829 )

      I know you mean to be funny, but it would be bigger news if it turns out that Yandex actually serves more ethical and accurate results than Google.

      • by test321 ( 8891681 ) on Monday January 30, 2023 @09:42PM (#63252553)

        it would be bigger news if it turns out that Yandex actually serves more ethical and accurate results than Google.

        According to leaks, "Yandex tried to make it impossible to stumble upon the image of the current Russian president when searching for individual words: pizdabol, bald, fucker // and also phrases: bunker grandfather, master thief, condom of all Rus', dick in a suit, fuck in the hole // and whole sentences: what do pedophiles look like, when he dies, strange creature waving". https://breached.vc/Thread-Yan... [breached.vc]

        When using google, the first result for "bunker grandfather" is the image of Putin associated to the Wikipedia page explaining the phrase. So at least for this one, Yandex is less accurate than google, since "bunker grandfather" is an actual (albeit irreverent) phrase used to refer to Putin and should be returned as first result.

        Google does not remove these disrespectful phrases used to refer to top leaders. In Google images, "Orange man" returns images of Donald Trump, and "President of the Rich") returns Emmanuel Macron. (At least in my place, someone could check from another country with another search history.)

        • by Z80a ( 971949 )

          It's only natural to expect that every business will try to not get shut down by powerful entities.
          It's the reason why google worked so hard to remove copyright infringing search results from it's system.

        • A lot of mistranslations there, or at least, some nuance is lost

          > pizdabol
          A very vulgar word for a habitual liar. They also block its relation to Medvedev.

          > fucker
          Literally "dickhead". Also an actual reference to Putin thanks to Ukrainians IIRC.

          > bald
          Baldheaded in the sense of partial hair loss and with an insulting undertone.
          Especially interesting is the fact that they not only block the association with Putin, but also all the associations with any other irreverent terms for him like "rat", "moth

      • It does in most cases empirically. I haven't found one where it doesn't but I haven't looked that hard. Could be a don't be evil loss leader

        Of course this 'hack' is almost certainly an intentional leak of the code they want you to see, and may not be the code they actually run.

  • Nearly 45GB of source code files, allegedly stolen by a former employee, have revealed the underpinnings of Russian tech giant Yandex's many apps and services. It also revealed key ranking factors for Yandex's search engine, the kind almost never revealed in public. [...]

    Good. Now do Google.

  • Interesting that they have a SEO factor "FI_IS_SEO" 134 that is no longer used. This seems like a really important factor to ignore SEO content. It's interesting that they are able to identify SEO content. I wonder with how much accuracy they can identify it.
  • by berchca ( 414155 ) on Monday January 30, 2023 @10:38PM (#63252653) Homepage

    They no longer have enough staff to pour over the code and see what they can use from it.

All seems condemned in the long run to approximate a state akin to Gaussian noise. -- James Martin

Working...