DTL Award Grants’17 announced!

Proud to announce the DTL Award Grant winners for 2017. Our latest batch of funded projects covers new and upcoming areas in data transparency such as: Detection of Algorithmic Bias, Location Privacy, Privacy in Home IoT Devices, Online-Offline Data Fusion, and others. Full list here.

Congratulations to the winners and a big thanks to everyone that participated in the program.

10 thoughts that stuck with me after attending a data protection event almost every month for the last two years

1. Privacy is not hype.

The uncontrolled erosion of privacy is not a “victimless crime”. The cost is just shifted to the future. Could be paid tomorrow — an offending ad — or in a few years — a record of your assumed health status leaking to an insurance company.

2. People don’t currently care much about it but this can change fast.

Indeed people don’t seem to care that much right now, certainly not enough to give up any of the conveniences of the web. But nothing about it is written on stone. Some other things that people didn’t use to care about: smoking, car safety, airport security, dangerous toys, racial or sexual discrimination. Societies evolve … privacy discussions and debates have started reaching the wider public.

3. Privacy problems will only get worse.

Privacy vs. web business models is a textbook example of a Tragedy of the Commons. The financial temptation is just too great to be ignored, especially by companies that have nothing to risk or loose. Just find a niche for data that somebody would be willing to pay good money and go for it. Even if all the big companies play totally by the book, there’s still a long tail of thousands of medium to small trackers/data aggregators that can destroy consumer and regulator trust.

4. The web can actually break due to (lack of) privacy.

The web as big and successful as it is, is not indestructible. It too can fall from grace. Other media that once were king are no longer. News papers and TV are nowhere near their prior glory. Loss of trust is the Achilles’ heel of the web.

5. Privacy extremism or wishful thinking are not doing anybody any good.

Extremists at both sides of the spectrum are not doing anybody any good. Stopping all leakage is both impossible and unnecessary. Similarly, believing that the market will magically find its way without any special effort or care is wishful thinking. There are complex tradeoffs in the area to be confronted. That’s fine and nothing new really. Our societies have dealt with similar situations again and again in the past. From financial systems, to transportation, and medicine, there are always practical solutions for maximizing the societal benefits while minimising the risks for individuals. They just take time and effort before they can be reached with lots of trial and error along the way.

6. Complex technology can only be tamed by other, equally advanced, technology.

Regulation and self-regulation have a critical role in the area but are effectively helpless without specialised technology for auditing and testing for compliance, whether pro-actively or reactively. Have you lately taken your car to service? What did you see? A mechanic nowadays is merely connecting a computer to another that checks it by running a barrage of tests. Then he analyses and interpretes the results. A doctor is doing a similar thing but for humans. If the modern mechanic and doctor depend on technology for their daily job, why should a lawyer or a judge be left alone to make sense of privacy and data protection on the internet only with paper and a briefcase at hand?

7. Transparency software is the catalyst for trusting again the web.

Transparency software is the catalyser that can empower regulators and DPAs while creating the right incentives and market pressures to expedite the market convergence to a win-win state for all. But hold on a second … What is this “Transparency software”? Well it’s just what its name suggest. Simple to use software for checking (aha “transparency”) for information uses that users or regulators dont like. You know things like targeting minors online, targeting ads to patients, making arbitrary assumptions about one’s political, religious beliefs, or sexual preference.

A simple but fundamental idea here is that since it is virtually impossible to stop all information leakage (this would break the web faster than privacy), we can try to reduce it and then keep an open eye for controversial practices. A second important idea is to resist the temptation of finding holistic solutions and instead start working on specific data protection problems in given contexts. Context can improve many of our discussions and lead to tangible results faster and easier. If such tangible results don’t start showing up in the foreseeable future its only natural to expect that everyone will eventually be exhausted and give up the whole privacy and data protection matter altogether. Therefore why dont we start interleaving in our abstract discussions some more grounded ones. Pick up one application/service at a time, see what (if anything) is annoying people about it, and fix it. Solving specific issues in specific contexts is not as glamorous as magic general solutions but guess what — we can solve PII leakage issues in a specific website in a matter of hours and we can come up with tools to detect PII leakages in six months to a year, whereas coming up with a general purpose solution for all matters of privacy may take too long.

8. Transparency works. Ask the telcos about Network Neutrality.

Transparency has in the past proved to be quite effective. Indeed, almost a decade ago the Network Neutrality debate was ignited by reports that some Telcos were using Deep Packet Inspection (DPI) equipment to delay or block certain types of traffic, such as peer-to-peer (P2P) traffic from BitTorrent and other protocols. Unnoticed among scores of public statements and discussions, groups of computer scientists started building simple to use tools to check whether a broadband connection was being subjected to P2P blocking. Similarly, tools were built to test whether a broadband connection matched the advertised speed. All a user had to do to check whether his ISP was blocking BitTorrent was to visit a site and click on a button that launches a series of test and … voila. Verifying actual broadband speeds was made equally simple. The existence of such easy to use tools seems to have created the right incentives for Telcos to avoid blocking while making sure they deliver on speed promises.

9. Market, self-regulation, and regulation, in that order.

Most of the work for fixing data protection problems should be undertaken by the market. Regulators should alway be there to raise the bottom line and scare the bad guys. Independent audit makes sure self regulation is effective. It gives it more credibility since it can be checked by independent parties that it delivers on its promises.

10. The tools are not going to build themselves. Get busy!

Building the tools is not easy. Are we prepared? Do we have enough people with the necessary skills to build such tools? Questionable. Our $heriff tool for detecting online price discrimination took more than 2 years and very hard work from some very talented and committed PhD students and researchers. Similarly for our new eyeWnder tool for detecting behavioural targeting. Luckily the Data Transparency Lab community of builders is growing fast. Keep an eye for our forthcoming call for proposals and submit your ideas.

Thoughts on reviewing and selection committees


At last, after a very intense month of running DTL Award Grants’16 I can sit back on the balcony, have a coffee, and relax without getting Comment Notifications from HotCRP, or urgent emails every 2 mins from my super-human co-chair Dr Balachander Krishnamurthy (aka bala ==> spanish translation ==> bullet … not a coincidence if you ask me or if you ‘ve been in my shoes).

We’ve selected 6 wonderful groups to fund for this year, the work is done, the notifications have been sent, and the list of winners in already online.

So what am I doing writing about DTL Grants on my first calm Saturday morning after a month? Well, I guess I cannot disconnect yet. But there’s more to it. Having done this for a second year in a row I am starting to see things that were not apparent before, despite the numerous committees on which I have served in the past. So I guess this is a log of things that I have learned in these two years. In no particular order:

Selection CAN be lean and fast

We asked applicants to describe their idea for transparency software in 3 pages. We asked each TPC member to review 9 submissions. Given the length of proposal we estimated this to be a day’s worth of work. Additional time was set aside for online discussions but this again lasted only 4 days, while the final TPC phone meeting (no need to fly to the other side of the world) lasted ** exactly ** 1 hour. Last but not least, we announced the winners 5 weeks after receiving the submissions.

Why and What vs. How

Anyone having some experience with either paper selection (sigcomm etc) or grant selection (H2020, FP7, etc) committee work surely understands that the above numbers are far from usual. The load we put on both the applicants and the committee was modest and we returned the results in record time. Assuming that the produced result is of high quality (it is, but I’ll come to this next) the natural question is “what’s the trick?”.

Well if you haven’t already noticed, the answer in on the heading above and in bold. We focused on the “Why” and the “What” instead of the “How”. Technical people love the How. It’s what got them their PhD and professional recognition and what they actually love doing and are great at — finding smart solutions to difficult problems. But the mission on the committee was not to have in-depth How-discussions. We do that all the time and in various contexts. What we wanted was to hear from the community about important problems on the intersection between transparency/privacy/discrimination (the Why) and ideas about new software to address those problems (the What). The How would be nice to know in detail but this would come at a high cost in terms of workload for both the applicants and the committee. And remember we wanted to run a lean and fast process. So we set the bar low on How. You just needed to show that it is doable and that you have the skills to do it and we would trust that you know How.

Good enough vs. perfect rank

Ok lets be clear about this — our ranking is NOT perfect. There are several reasons for this, including: we cannot define perfect, even if we could, probably we would be fooling ourselves with the definition, we cannot understand perfect, we don’t have enough time or energy for perfect, we are not perfect ourselves and last but not least … we don’t need perfect. Yes we don’t need perfect because our job is not to bring perfection and justice in an other wise imperfect and unjust world. This is not what we set out to do.

What we set out to do is to find proposals addressing interesting transparency problems and maximise the probability that they may actually deliver some useful software in the end. This means that our realistic objective was to select ** good enough proposals **. Thus our biggest priority was to quickly identify proposals that did not target an interesting enough problem or that failed to convince that they would deliver useful software. Notice here, that most proposal that failed on the latter, did so because they didn’t even bother to explain the “What?”. In no case that I remember there was a rejection due to “How?” but there were several because it was not clear what would be delivered in the end.

Having filtered those, we were left with several good enough proposals. From there to final acceptance little things could make a big difference in terms of outcome. Things like “has this already been funded?”, “would I install/use this software?”, “is this better for a scientific paper instead of a Grant?”, “is this something that would appeal to non-researchers?”. We did our best to make the right calls but there should be no doubt that the process is imperfect. Given our objectives however: 1) good enough vs. perfect, and 2) lean vs. heavy I believe we have remained truthful to our principles.

I’ll stop here by thanking once more all the applicants, the committee, Bala, and the board for all their hard work.

This has been fun and I am very optimistic that it will also be useful.

See you next year.


DTL Award Grants’16 notifications sent

We are done!

6 great new proposals selected to receive funding this years. List and details coming online at the Data Transparency Lab web site in the coming days.

Congratulations to the winning proposals and and a big thanks to all applicants, the selection committee, my co-chair Dr Balachander Krishnamurthy and the members of DTL’s board.

New transparency software on its way!


54 proposals in DTL Award Grants’16

This past weekend marked the deadline for submitting proposals for the second DTL Award Grants. We received 54 proposal in total — US (19), EU (23), Asia/Oceania (3), and joint teams (9). Time for the committee to get busy in selecting the best ones. More great transparency software on its way to complement our first batch from last year.

Cows, Privacy, and Tragedy of the Commons on the Web

As part of my keynote during the inaugural workshop of the Data Transparency Lab (Nov 20, 2014, Barcelona) I hinted that a Tragedy of the Commons around privacy might be the greatest challenge and danger for the future sustainability of the web and the business models that keep it going. With this note I would like to elaborate on this statement and maybe explain why my slides were full of happy, innocent looking cows.

What is the Tragedy of the Commons?

According to Wikipedia:

The tragedy of the commons is an economic theory by Garrett Hardin, which states that individuals acting independently and rationally according to each’s self-interest behave contrary to the best interests of the whole group by depleting some common resource. The term is taken from the title of an article written by Hardin in 1968, which is in turn based upon an essay by a Victorian economist on the effects of unregulated grazing on common land.

In the classical Tragedy of the Commons, individual cattle farmers acting selfishly keep releasing more cows onto a common parcel of land despite knowing that a disproportionate number of animals will eventually deplete the land of all grass and inevitably drive everyone out of business. All farmers share this common knowledge but still do nothing to avoid the obvious impending disaster. For an explanation of this “paradox” one has to consider human selfishness and self-illusion.

Selfishness dictates that it is better for a farmer to reap the immediate benefits of having more cows, diverting the damage to others and pushing the consequences to the future. Self-illusion refers to the utopic belief that he can keep accumulating cows without ever facing the tragedy because, miraculously, others will self-restrain and reduce the size of their respective herds, thereby saving the field from depletion. Unfortunately, everyone thinks alike and thus, eventually, the field is overgrazed to destruction.

Are there any cows on the Web?

There are several.

Not only in .jpeg, .gif or .tiff but also in other formats that, unlike the aforementioned graphics standards, can lead to (non-grass related) tragedies. In my talk I have hinted at the following direct analogy between the aforementioned fictitious cow-related metaphor and the very real public concern around the erosion of privacy on the web.

Farmer: A company having a business model around the monetization of personal information of users. This includes online advertisers, recommenders, e-commerce sites, data aggregators, etc.

Cow:  A technology for tracking users online without their explicit consent or knowledge. Tracking cookies in browsers and apps, analytics code, browser and IP fingerprinting, leakage of Personally Identifiable Information (PII), etc.

Grass:  The trust that we as individuals have on the web, or more accurately, our hope and expectation that the web and its free services are doing “more good than bad.”

The main point here is that if the aforementioned business models (farmers) and technologies (cows) eat away user trust (grass) faster than its replenishment rate (free services that make us happy), then at some point the trust will be damaged beyond repair and users … will just abandon the web. What’s even worse, such loss of trust can be caused by the actions of a minority of companies (even small ones) that by engaging in questionable and offending data collection practices may harm the entire industry, including a majority of companies that are sensitive to users’ privacy requirements.

As extreme as it may sound that users may one day abandon the web for another medium, the reader needs to be reminded that other immensely popular media have been dethroned in the past. Print newspapers are nowhere near as popular as they used to be in, say, the 30’s. Broadcast television is nowhere near its prominence in the 60’s (think the moon-landing, JFK’s assassination, etc.).

The signs of a quickly decaying public trust on the web are already here.

– More than 60% of web traffic was recently measured going over encrypted HTTP, and all reports agree that the trend is accelerating.

–  AdBlock Plus is the #1 add-on for both Chrome and Firefox with close to 50 million users and a 41% annual growth in the last year. Other browser or mobile app marketplaces are heavily populated with anti-tracking add-ons and services.

–  Mainstream press is increasingly covering the topic on front pages and prime time, sometimes revealing truly shocking news.

–  Regulators and privacy activists on both sides of the Atlantic are mobilizing to address privacy related challenges.

If ignored, the mounting concern around online privacy and tracking on the web can lead to mass adoption of tracking and advertisement blocking tools. Removing advertising profits from the web directly leads to the demise of free services that we currently take for granted.

This impacts negatively on innovation, investment in services and network infrastructure, tech employment, etc.

Last but not least, let’s not forget that advertisement and recommendation is something desired and appreciated by most people so long as it does not cross any red lines in terms of privacy.

What constitutes a red line may change from person to person but certain categories are obvious candidates (health, sexual preference, political or religious beliefs).

In a recent study we have shown that it is possible to detect Online Behavioral Advertising (OBA) driven by personal data, including very sensitive ones. Our methodology is based on training artificial “personas”, i.e., clean web-browsers on freshly installed operating systems, which we use to imitate human users interested in particular categories and then test whether these categories are targeted on web-sites that the persona visits. Surprisingly (or maybe not) we found strong evidence that even very sensitive categories were indeed targeted (see slide 30 here for a list).

Is there something we can do to avoid a tragedy of the commons around privacy?

“Sunlight is the best disinfectant”

The famous quote of American Supreme Court litigator Louis Brandeis may have found yet another application in dealing with the privacy challenges of the web.

Despite the buzz around the topic, the average citizen is in the dark when it comes to issues relating to how his personal information is gathered and used online without his explicit authorization.

A few years ago we demonstrated that Price Discrimination seems to have already crept into e-commerce. This means that the price that one sees on his browser for a product or service may be different than the one observed at the same time by user in a different location.

Even at the same location, the personal traits of a user, such as his browsing history, may impact the price offered.

To permit users to test for themselves whether they are being subjected to price description we developed (the price) $heriff, a simple to use browser add-on that shows, in real time, how the price seen by a user compares with the prices seen by other users or fixed measurement proxies around the world.

Researchers at Columbia UniversityNortheastern University, and INRIA have, in a similar spirit, developed tools and methodologies that permit end users to test whether the advertisements or recommendations they received have been specifically targeted at them, or if they are just random or location dependent.

Tools like $heriff and X-ray improve the transparency around online personal data. This has multifold benefits for all involved parties:

– End users can exercise choice and decide for themselves whether they want to use ad blocking software and when.

– Advertising and analytics companies can use the tools to self regulate and prove that they abstain from practices that most users find offensive.

– Regulators and policy makers can use the tools to obtain valuable data that point to the real problems and help in drafting the right type of regulation for a very challenging problem.

Transparency has in the past proved to be quite effective in stirring the Internet in the right direction. Indeed, almost a decade ago the Network Neutrality debate was ignited by reports that some Telcos were using Deep Packet Inspection (DPI) equipment to delay or block certain types of traffic, such as peer-to-peer (P2P) traffic from BitTorrent and other protocols. Lost sometime among scores of public statements and discussions, groups of computer scientists started building simple to use tools to check whether a broadband connection is being subjected to P2P blocking. Similarly, tools were built to test whether a broadband connection matches the advertised speed or not. These tools made it very simple for end users to understand technical details about their broadband connection that would otherwise be far beyond their reach and seem to have created the right incentives for Telcos to avoid such practices.

In a similar way, we believe that the development of transparency tools around privacy and data protection can only help the Internet ecosystem move again in the right direction. For this reason we founded in November 2014 The Data Transparency Lab with the mission to develop software tools that would shed light on data collection and processing on the Internet, and by doing so make sure that the previously described tragedy of the common stays a metaphor and does not become reality.