A large number of data Marketplaces (DMs) have appeared in the last few years to help owners monetize their data, and data buyers optimize their marketing campaigns, train their ML models, and perform other data-driven decision processes. Even in Europe that took the lead in the data protection race with GDPR, several new initiative like the European Data Spaces, Gaia-X, and the Data Governance Act show that we have entered in the post-privacy era, in which data may be sold as a product, so long as certain rules are obeyed. While such rules are being discussed, the market is moving fast to come up with new business models, technologies, and actual data data products offered for purchase as we speak.
In our recent report, we present a first of its kind measurement study of the growing DM ecosystem and shed light on several totally unknown facts about it. For example, we show that the median price of live data products sold under a subscription model is around US $1,400 per month. For one-off purchases of static data, the median price is around US $2,200. At the extreme of the pricing spectrum we see data products that reach up to a million US $. We analyze the prices of different categories of data and show that products about telecommunications, manufacturing, automotive, and gaming command the highest prices. We also develop classifiers for comparing prices across different DMs as well as a regression analysis for revealing features that correlate with data product prices.
Read more in: S. Andres Azcoitia, C. Iordanou, N. Laoutaris, “What Is the Price of Data? A Measurement Study of Commercial Data Marketplaces,” [arXiv:2111.04427].
What if instead of having to implement controversial user tracking techniques, Internet advertising & marketing companies asked explicitly to be granted access to user data by name and category, such as Alice–>Mobility–>05-11-2020? The technology for implementing this, already exists, and is none other than the Information Centric Networks (ICN) developed for over a decade in the framework of Next Generation Internet (NGI) initiatives.
Beyond named access to personal data, ICN’s in-network storage capability can be used as a substrate for retrieving aggregated, anonymized data, or even for executing complex analytics within the network, with no personal data leaking outside. In this opinion article, we discuss how ICNs combined with trusted execution environments and digital watermarking can be combined to build a personal data overlay inter-network in which users will be able to control who gets access to their personal data, know where each copy of said data is, negotiate payments in exchange for data, and even claim ownership, and establish accountability for data leakages due to malfunctions or malice.
Of course, coming up with concrete designs about how to achieve all the above will require a huge effort from a dedicated community willing to change how personal data are handled on the Internet. Our hope is that this opinion article can plant some initial seeds towards this direction. For more details check out our opinion column in ACM CCR.
UPDATE: An extended version of this post appears as an opinion column in IEEE Internet Computing. Link to it here.
Data, and the economy around it, are said to be driving the fourth industrial revolution. Interestingly, the people — whose data are what moves the new economy — have a rather passive role in this economy, as they are left outside the direct value flow that transforms raw data into huge monetary benefits. This is a consequence of a de facto understanding (or, one may say, misunderstanding) between people and companies that the former receive unpaid access to online services in exchange for unpaid access to their personal data. This situation is increasingly being challenged by various voices calling for the establishment of a new and renegotiated relationship between users and services.
For example, technologist, philosopher, and writer Jaron Lanier argues in his 2012 book “Who Owns the Future?” that online services should return some of their revenue back to the people that feed with data their business models and AI algorithms. Lanier’s arguments include the following:
— Sustainability: The current economic model imposes serious privacy risks on individuals and the society at large, has lead to market failures in the form of large data monopolies and oligopolies, and may, in fact, even be a threat to employment in the future due to job loss from data-driven automation. Paying people for their data could, therefore, be an alternative to labor-based compensation in a future in which most work will be done by machines. Indeed, it was estimated recently that, if fair remuneration algorithms are set in place, a family of four could earn up to $20,000 per year from their data (Posner, 2018). The above figure may seem too small to be a full alternative to labor-based compensation, but can only increase as more and more sectors are being catalyzed by automation.
— Fairness: Paying people for their data, or the related idea of (universal) guaranteed minimum income are potential remedies for modern societal ailments such as increased income disparity, increased unemployment, and other labor-related challenges emerging in the context of machine learning automation, robots, 3D printing, self-driving cars, and other employment-threatening technologies. Critics of guaranteed minimum income are rejecting it as a non-sustainable form of charity. Instead, Lanier has argued that paying people for their data is an altruism-free idea compatible with modern capitalism for achieving the positive objectives of guaranteed minimum income without harming but instead benefiting the market, innovation, and investment in technology. The fundamental argument behind this position is a simple one: business models and machine learning algorithms have zero value without data, and, therefore, paying for those data is not charity but rather neoclassical economics. As Lanier notes “We have invented only half of the data driven industrial revolution — the part that compensates users in kind (i.e., service); we need to invent the other half that will provide explicit (monetary) benefits.”
But the arguments in favor of what I will henceforth refer to as a Human-Centric Data Economy, go beyond the above mentioned.
First, paying for data opens up the pathway to getting more data and data of higher quality, thereby increasing the size of the data economy. In other words, the data economy is not a zero-sum game between users and data collectors; paying the former does not have to harm the latter. It is not surprising, therefore, that the idea of paying or being taxed for data has been positively received by many, including industry leaders such as Bill Gates (Delaney, 2017), Elon Musk (Thomas, 2017), and Mark Zuckerberg (Gillespie, 2017).
Second, paying for data creates economic pressure on online services for applying data minimization principles. Currently, collecting and processing data costs close to zero and, therefore, services greedily collect all the data that they can, even when the actual information that they need is much less. The resulting privacy-related tragedy of the commons of the web can be avoided if online services a) collect the minimum amount of data that they need, and b) do so, only when the benefit that they create for society outweighs, and can therefore re-compensate, for the risks it creates. In other words, in the same way that factories and private cars pay for the amount of pollution that they impose on the environment, online services should pay for the privacy risks they impose on people. Currently, data minimization is just a principle quoted by data protection laws. To realize it in practice, it will require establishing the currently missing economic signals that would push the market in the right directions. Paying people for data is a direct way of achieving this. Two important related questions are, therefore:
What part of a service’s revenue should be returned to its users?
How should the total returned payoff be split among different users?
To answer the first question we are modeling the data value chain using tools like Nash Bargaining used before for modeling the inter-connection value chain on the Internet. To answer the second, we are using the Shapley Value to compute the relative importance of different users’ data on the decisions taken by machine learning algorithms for tasks like movie recommendation.
Stay tuned for more thoughts, ideas, and papers around Human-Centric Data Economies and please do contact me if interested to work or do a PhD in the area.
References
Delaney, K. J. (2017). The robot that takes your job should pay taxes, says Bill Gates. Retrieved from https://qz.com/911968/bill-gates-the-robot-that-takes-your-job-should-pay-taxes/
Gillespie, P. (2017). Mark Zuckerberg supports universal basic income. What is it? Retrieved from https://money.cnn.com/2017/05/26/news/economy/mark-zuckerberg-universal-basic-income/index.html
Posner, E. A. (2018). Radical Markets: Uprooting Capitalism and Democracy for a Just Society. Princeton University Press.
Thomas, L. (2017). Universal basic income debate sharpens as observers grasp for solutions to inequality. Retrieved from https://www.cnbc.com/2017/03/25/universal-basic-income-debate-sharpens.html
Last week I had the honor and privilege of organizing the 11th Annual Workshop of IMDEA Networks. The event gathered an excellent set of keynote speakers and panelists who, together with IMDEA researchers and other colleagues from Madrid and beyond, sat to review the status of networking research and discuss its future.