Barriers to academic data science research by digital platforms

In a publication in the prestigious Nature Machine Intelligence journal, prof. David Martens and his co-authors Travis Greene and Galit Shmueli discuss the consequences, and encourage academics to take on new roles in promoting platform transparency and public debate.

Big data and Behavior modification

A growing community of researchers are using behavioral data from platforms for applied and methodological research. Think of all the posts or pages you like on Facebook, the webpages you visit online, or the wifis you log into. This set of digital breadcrumbs you leave behind make for valuable and predictive behavioural big data (BBD). Large platforms that generate and hence have access to such data include Facebook, Instagram, Youtube, TikTok, just to name a few.

These platforms have started to shield this data, even from academic researchers that obtain informed consent from the end user. What’s even worse is that these platforms include behavior modification: showing ads, deciding what story to show you, etc. The combination of this behavior data and behavior modification leave academics more isolated than ever.

As described in the paper, consider Facebook’s News Feed algorithm, which decides what users see in their feeds. Initially, the algorithm might use generic, hard-coded rules for displaying and ranking content in the News Feed, but, after collecting implicit feedback on how users interact with the displayed content (such as which posts were clicked, liked, shared or dwelt on), the algorithm’s parameters are updated to reflect what the algorithm has learned about which content achieves the platform’s goal of, say, engagement. In the next iteration, the algorithm might display a slightly different set of items because they are associated with improved engagement, or even decide to randomly display other sub-optimal items so as to gather more information about users’ underlying preferences.

A feedback loop based on sequential human–machine interaction drives the learning and adaptation process: what the machine shows users influences their behaviour, which in turn impacts the machine’s predictions, which determine its next action, and so on.

Barriers to data access

This whole interplay of human behavioral data and the related machine data used for behavior modification and predictions, is totally obscured and unavailable to academics. In this paper, the authors focus on the mechanics and scientific challenges caused by the widening legal, scientific and methodological barriers between academic data science researchers and commercial platforms.

The authors argue that if we want effective public policy, we need unbiased and reliable scientific knowledge about what individuals see and do on platforms, and how they are influenced by algorithmic BMOD. The importance of briding this gap, and the way we might be able to get there is described in the paper.

Former Facebook data scientist and whistleblower Frances Haugen echoes this the importance of transparency and independent researcher access to platforms. In her recent US Senate testimony, she states:

“….No one can understand Facebook’s destructive choices better than Facebook, because only Facebook gets to look under the hood. A critical starting point for effective regulation is transparency: full access to data for research not directed by Facebook….As long as Facebook is operating in the shadows, hiding its research from public scrutiny, it is unaccountable….Left alone Facebook will continue to make choices that go against the common good, our common good.”


As of today, the role of academic data scientists in this new realm is still unclear. New positions and responsibilities for academics emerge that involve participating in independent audits and cooperating with regulatory bodies to oversee platform BMOD, developing new methodologies to assess BMOD impact, and leading public discussions in both popular media and academic outlets.

Breaking down the barriers may require moving beyond traditional academic data science practices, but the collective scientific and social costs of academic isolation in the era of algorithmic BMOD are too great to ignore.

This post is based on the original paper  ‘Barriers to academic data science research in the new realm of algorithmic behaviour modification by digital platforms’ and a summary by Travis Greene.

Hide comments

Leave a Reply