Reflecting on Spotify’s Recommender System

by Jada E. Watson

For the last few months, I have been thinking and reading about artificial intelligence — about the algorithms underlying the recommender systems for music streaming platforms. Because of the type of research that I do (working with radio and chart data), I often get asked about streaming. This has not been my focus, and I point everyone to Liz Pelly‘s 2018 study of Spotify’s prime brand playlists. Through partnership with Smirnoff, Spotify developed “The Smirnoff Equalizer” and sought to “analyze users’ listening habits and ‘equalize’ the gender ratio of their listening experience.” But Pelly’s results show anything but an equalizing experience. In fact, her assessment of Spotify’s prime brand playlists revealed that the popular streaming service maintains the gender imbalance perpetuated in country radio — with “Hot Country” offering the “lowest percentage of women overall.” While Pelly’s work set a lot of my reading and thoughts in motion, was Martina McBride’s recent experience that got me in front of my computer to look at what was going on for myself. This blog reflects on the recommender system, and shares some results of an experiment that we did last week.  It’s not a study of the algorithm, but a study of what the algorithm produced.

But first, some context.


On September 9, 2019, Martina McBride published an Instagram Story about her experience trying to build a playlist using the Spotify recommender system. She titled her playlist “Country Music”, and was startled to see that the service was only recommending songs by male artists. McBride refreshed her recommendations 13 times before a song by a women was included in her list of options – a song by Carrie Underwood after 135 songs by male artists. McBride’s Instagram story went viral and she had an opportunity to speak directly with a representative of Spotify, who was unaware that there was a lack of female artists in their recommender system (see Reuter 2019).

NewsChannel5, “Martina McBride calls out Spotify over lack of female representation’ (September 9, 2019)

With an interest in seeing what happens for users beyond 13 clicks (and if there had been any changes since McBride’s experience), SongData decided to make their own “country music” playlist. On Tuesday, September 24, 2019, we followed McBride’s experiment; we created a playlist called “country music” and began refreshing recommendations until we came to the first song by a female artist, and then continued until the operator reset. The results were similar to McBride’s: after 12 refreshes, the first song by a female artist was generated – Miranda Lambert’s “Mama’s Broken Heart” at #122. Within the first 200 songs (19 refreshes), only 6 songs (3%) by women and 5 (3%) by male-female ensembles were included (all emerging after 121 songs by male artists).

The resulting recommendations for the first 19 refreshes (200 songs) in the recommender system (September 24, 2019).

We continued to refresh the recommendations 23 more times (to a total of 42 refreshes, or 430 songs) until the recommender reset and Luke Bryan’s “Play it Again” returned to the generator window. This data was captured in an Excel sheet, cleaned and coded by the ensemble type and gender of the lead artist or ensemble, using the same coding system as previous SongData studies: M for songs with a lead male artist or all-male ensemble, F for songs with a lead female artist or all-female ensemble, and M-F for male-female ensembles (who are often coded as “females” in programming). 

The results presented here offer a snapshot of this experiment, showing the average experience when using the genre label “country music” to curate a playlist. The graphics use the same colour-coding system as previous SongData studies: grey for male artists, plum for female artists, and yellow for male-female ensembles.

These results are not surprising, to be sure. But they are still startling. Not only are there few songs by women, but there are also very few women performing the recommended songs. Just 40 songs by 20 female artists are recommended by the algorithm, against 372 songs by 148 individual male artists. Of course, male-female ensembles are hardest hit: with 18 songs by just 8 ensembles. These numbers are hard to stomach in 2019.

If all genders were treated equally by the recommender algorithm, we would not see peaks in representation. Men, women, and male-female ensembles would be represented by a solid (or stable) line across the playlist as graphed in the 7th slide. But this slide shows us that only male artists hold a stable or constant place within the Spotify recommender system, while female artists are pushed to the back end and male-female ensembles peak toward the middle section. Even after women and male-female ensembles are introduced by the recommender system, male artists continue to maintain a stable place throughout the back end of the playlist. Women and male-female ensembles are pushed to the margins of the genre’s streaming ecosystem, and the implications — for the artists, for the industry, for the fans — are profound.


The writing of Cathy O’Neill (Weapons of Math Destruction) and Safiya Umoja Noble (Algorithms of Oppression) has reshaped how I think about algorithms and their impact on cultural spaces. Noble speaks of technological redlining, and the ways in which algorithms directly or indirectly use criteria like gender, race, ethnicity (and more) to make assessments (or recommendations). While her work centres on Google search engines, much of what she writes about algorithm or data discrimination resonates in cultural spheres.

Streaming, like Google, is often viewed as the “great equalizer” — as a format that offers an equal playing field for musical ideas, forms and identities. But algorithm discrimination is a very real socio-cultural problem. Noble argues that algorithms are anything but benign, neutral or objective. The mathematical formulations that drive such automated decision-making are made by humans, and their work is embedded within a much larger cultural ecosystem — an ecosystem riddled with inequalities at all artistic and administrative/industry levels. In the same way that Noble is concerned about gender and racial biases of artificial intelligence that results from the combination of private interests and the monopoly status of a small number of Internet search engines, we should be concerned about the ways in which private interests and the programming perpetuates gender imbalances in genre cultures.

While an adjustment to the algorithm is certainly in order, it is not a simple task or one that can be done in haste. “What is needed,” as Stuart Dredge recently reflected, “is a clear understanding of why this case happened: what the factors were that drove those 14 refreshes of man-filled country tracks for McBride, and what levers might be pulled to deliver a better experience.” The algorithm seems predisposed to male artists, but why and how? If, as McBride reported, Spotify seemed unaware that there was that significant a lack of female representation through their recommender system: what is going on within the algorithm for these results to occur and reoccur?

We rely on data as a marker of historical value, as an archive of a genre’s evolution, changing dynamics, and (eventual) canon of (presumably) influential songs and artists. And indeed, we are inundated by various forms of data — weekly popularity charts, sales statistics and streaming recommendations. In a world in which this constantly evolving and expanding data influences decision-making within the industry and impacts how labels sign, produce and promote artists, discrimination embedded within algorithms plays a vital role in the broader cultural space of the genre. The resulting data reinforces pre-inequalities and discriminatory practices within the industries that rely on them (O’Neill 2017). The results presented here (and in McBride’s experiment) show that artificial intelligence privileges male artists and disadvantages everyone else.

Noble believes that AI will be a major human right issue of this century. “We are only beginning to understand the long-term consequences of these decision-making tools in both masking and deepening social inequality,” she states in the introduction of her book. As streaming becomes a preferred point of access to music, we need to start considering more critically the broader cultural ecosystem that data and algorithms are curating. What is lost? Who is harmed? How are data impacting the vitality and diversity of the genre?

The results are also available as a PDF document here.


References

Dredge, Stuart. 2019. “Martina McBride criticises Spotify again over playlist algorithm (and Spotify agrees).Musically, 17 September.

NewsChannel5. 2019. “Martina McBride Calls Out Spotify Over Lack of Female Representation.” YouTube video, 2:15. Posted by NewsChannel5.

Noble, Safiya Umoja. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. New York: New York University Press. 

O’Neill, Cathy. 2017. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York: Broadway Books.

Pelly, Liz. 2018. “Discover Weakly: Sexism on Spotify.” The Baffler, 4 June.

Reuter, Annie. 2019. “Martina McBride ‘Felt Like We’d Been Erased’ When Spotify Didn’t Recommend a Single Female Artist.Billboard.com, 16 September.