abusesaffiliationarrow-downarrow-leftarrow-rightarrow-upattack-typeblueskyburgerchevron-downchevron-leftchevron-rightchevron-upClock iconclosedeletedevelopment-povertydiscriminationdollardownloademailenvironmentexternal-linkfacebookfilterflaggenderglobeglobegroupshealthC4067174-3DD9-4B9E-AD64-284FDAAE6338@1xinformation-outlineinformationinstagraminvestment-trade-globalisationissueslabourlanguagesShapeCombined Shapeline, chart, up, arrow, graphLinkedInlocationmap-pinminusnewsorganisationotheroverviewpluspreviewArtboard 185profilerefreshIconnewssearchsecurityPathStock downStock steadyStock uptagticktooltiptriangletwitteruniversalitywebwhatsappxIcons / Social / YouTube

Diese Seite ist nicht auf Deutsch verfügbar und wird angezeigt auf English

Artikel

10 Aug 2023

Autor:
Matt Burgess, Wired

Leaked Yandex code exposes privacy concerns as company splits & faces increased Russian government influence

"Leaked Yandex Code Breaks Open the Creepy Black Box of Online Advertising", 10 August 2023

If you live in Russia, there’s no avoiding Yandex. The tech giant—often referred to as “Russia’s Google”—is part of daily life for millions of people. It dominates online search, ride-hailing, and music streaming, while its maps, payment, email, and scores of other services are popular. But as with all tech giants, there’s a downside of Yandex being everywhere: It can gobble up huge amounts of data.

In January, Yandex suffered the unthinkable. It became the latest in a short list of high-profile firms to have its source code leaked. ...The trove, which is said to have come from a disgruntled employee, doesn’t include any user data but provides an unparalleled view into the operation of its apps and services. Yandex’s search engine, maps, AI voice assistant, taxi service, email app, and cloud services were all laid bare.

The leak also included code from two of Yandex’s key systems: its web analytics service, which captures details about how people browse, and its powerful behavioral analytics tool, which helps run its ad service that makes millions of dollars.

Now, an in-depth analysis of the source code belonging to these two services, by Kaileigh McCrea, a privacy engineer at cybersecurity firm Confiant, is shedding light on how the systems work. Yandex’s technologies collect huge volumes of data about people, and this can be used to reveal their interests when it is “matched and analyzed” with all of the information the company holds, Confiant’s findings say.

McCrea says the Yandex code shows how the company creates household profiles for people who live together and predicts people's specific interests. From a privacy perspective, she says, what she found is “deeply unsettling.” ...The findings also reveal that Yandex has one technology in place to share some limited information with Rostelecom, the Russian-government-backed telecoms company.

Yandex’s chief privacy officer, Ivan Cherevko, in detailed written answers to WIRED’s questions, says the “fragments of code” are outdated, are different from the versions currently used, and that some of the source code was “never actually used” in its operations.

However, the analysis comes as Russia’s tech giant is going through significant changes. Following Russia’s full-scale invasion of Ukraine in February 2022, Yandex is splitting its parent company, based in the Netherlands, from its Russian operations. Analysts believe the move could see Yandex in Russia become more closely connected to the Kremlin, with data being put at risk.

“They have been trying to maintain this image of a more independent and Western-oriented company that from time to time protested some repressive laws and orders, helping attract foreign investments and business deals,” says Natalia Krapiva, tech-legal counsel at digital rights nonprofit Access Now. “But in practice, Yandex has been losing its independence and caving in to the Russian government demands...”

Data Harvesting

The Yandex leak is huge. The 45 GB of source code covers almost all of Yandex’s major services, offering a glimpse into the work of its thousands of software engineers.

McCrea manually inspected two parts of the code: Yandex Metrica and Crypta. Metrica is the firm’s equivalent of Google Analytics, software that places code on participating websites and in apps, through AppMetrica, that can track visitors, including down to every mouse movement. Last year, AppMetrica, which is embedded in more than 40,000 apps in 50 countries, caused national security concerns with US lawmakers after the Financial Times reported the scale of data it was sending back to Russia.

“The amount of data that Yandex has through the Metrica is so huge, it's just impossible to even imagine it,” says Grigory Bakunov, a former Yandex engineer and deputy CTO who left the company in 2019. “It's enough to build any grouping, or segmentation of the audience.” The segments created by Crypta appear to be highly specific and show how powerful data about our online lives is when it is aggregated.

While the leaked source code offers a detailed view of how Yandex’s systems may operate, it is not the full picture. Artur Hachuyan, a data scientist and AI researcher in Russia who started his own firm doing analytics similar to Crypta, says he did not find any pretrained machine learning models when he inspected the code or references to data sources or external databases of Yandex’s partners. It’s also not clear, for instance, which parts of the code were not used.

McCrea’s analysis says Yandex assigns people household IDs.

The code also shows how Yandex can combine data from multiple services. McCrea says in one complex process, an adult’s search data may be pulled from the Yandex search tool, AppMetrica, and the company’s taxi app to predict whether they have children in their household. Some of the code categorizes whether children may be over or under 13.

One element within the Crypta code indicates just how all of this data can be pulled together.

Government Influence

Yandex is going through a breakup. In November 2022, the company’s Netherlands-based parent organization, Yandex NV, announced it will separate itself from the Russian business, following Russia’s invasion of Ukraine. Internationally, the company, which will change its name, is planning to develop self-driving technologies and cloud computing, while divesting itself from search, advertising, and other services in Russia. Various Russian businessmen have been linked to the potential sale.

While the uncoupling is being worked out, Russia has been trying to consolidate its control of the internet and increasing censorship.

These nationalization efforts coupled with the planned ownership change at Yandex are creating concerns that the Kremlin may soon be able to use data gathered by the company. Stanislav Shakirov, the CTO of Russian digital rights group Roskomsvoboda and founder of tech development organization Privacy Accelerator, says historically Yandex has tried to resist government demands for data and has proved better than other firms. However, Shakirov says he thinks things are changing.

Bakunov, the former Yandex engineer, who reviewed some of McCrea’s findings at WIRED’s request, says he is scared by the potential for the misuse of data going forward. He says it looks like Russia is a “new generation” of a “failed state,” highlighting how it may use technology. “Yandex here is the big part of these technologies,” he says.

But the leaked code shows, in one small instance, that Yandex may already share limited information with one Russian government-linked company.

Overall, McCrea says that whatever happens with the company, there are lessons about collecting too much data and what can happen to it over time when circumstances change. “Nothing stays harmless forever,” she says.