Commentary on Political Economy

Tuesday 22 September 2020


TikTok’s Algorithm Can’t Be Trusted

If it operates like other recommendation engines, it can be used for good or for evil.

Prepare to be influenced.
Prepare to be influenced. Photographer: Brent Lewin/Bloomberg

Everybody’s worried about the algorithm behind TikTok, the wildly popular Chinese-run short-video app. President Donald Trump thinks it poses a threat to national security (or maybe to his campaign rallies), and so is demanding that its U.S. operations be sold to an American company. China wants to keep it out of U.S. hands. TikTok insists it’s friendly and open for all to see.

So what is it? As best I can tell, it’s like any other algorithm, which means it can be used for evil or for good.

Like other social-media apps, TikTok makes money by keeping people watching, so it can serve them more ads. When users sign up, the app gathers “metadata” such as birthdate, interests and location, then tracks behavior to figure out what kinds of videos get their attention. When videos get uploaded, TikTok categorizes them using information such as captions, hashtags and certain elements of sound. It then tracks engagement data such as views and shares to understand what kinds of people the videos tend to attract.

I’m not sure exactly how TikTok’s algorithm matches people and videos: The company’s “Transparency Center” hasn’t responded to my request to examine the source code. That said, I can offer an educated guess.

 Such recommendation engines tend to store information about users and content as short arrays of numbers, typically around 20 digits in length. So a person would have an array describing her “tastes,” and a video would have one describing its “qualities.”

Each digit refers to a specific and statistically independent characteristic, and placement matters: The most powerful predictor of engagement comes first. If, for instance, the most important trait were gender, the first number would be a 1 for people who always like things that women like, and range all the way down to -1 for people who lean completely male. The corresponding digit for a video might range from 1 for a video that women love and men don’t, down to -1 for a video that men love and women don’t.

The second digit might be violence -- how much a person is attracted to violence (assuming this is independent of gender), and how much a video appeals to people who are attracted to violence -- and so on in descending importance. In most cases, 20 digits is enough to describe a person or a thing. (To be clear, nobody would be a perfect 1 or -1, and the categories wouldn’t be as clear-cut as gender or violence. This was just for illustration.)

To decide which videos to recommend, the algorithm takes a user’s array and the arrays of all the videos and performs a sort of multiplication known as a dot product, which produces higher scores for videos with more positive or negative matches in more important positions. There’s probably also some editing to ensure people don’t keep seeing the same (or too similar) videos. As people watch more and different things, their arrays and the arrays of the videos they watch are updated.

So far so good. But if I’m right about how this works, the algorithm also has the power to favor videos with certain types of content. Anti-vaxxing clips will have a characteristic 20-digit array, as will QAnon clips and clips aimed at undermining voting in the 2020 presidential election. Whoever manages the algorithm will be able to squelch or magnify the impact of those videos by suppressing or boosting the relevant qualities – akin to adding an ad hoc coding tweak that multiplies all the gender scores by zero or 1000.

This isn’t science fiction. It happens a lot, sometimes for perfectly good reasons. For example, Meetup tweaked its recommendation engine to be less sexist. But tweaks can also promote the most dangerous and divisive kinds of content, for profit or political ends.

In short, even if TikTok is transparent about how its algorithm works, and even if it has been acting benignly so far, I wouldn’t assume that it can be trusted not to engage in damaging manipulation. Why else would China want to keep control over the code?

No comments:

Post a Comment