nikbcn – Page 2 – Personal homepage

DTL Award Grants’17 announced!

Proud to announce the DTL Award Grant winners for 2017. Our latest batch of funded projects covers new and upcoming areas in data transparency such as: Detection of Algorithmic Bias, Location Privacy, Privacy in Home IoT Devices, Online-Offline Data Fusion, and others. Full list here.

Congratulations to the winners and a big thanks to everyone that participated in the program.

A brief farewell after 10 years

I read Malcolm Gladwell’s Outliers little after I joined Telefonica as a researcher 10 years ago. He said that it takes approximately 10000 hours or 10 years to become good at something. I was on my way back from US. I was leaving academia after 10+ years and I was heading to my first real job. I had a recent PhD, a good number of publications, and a growing number of scientific collaborations under my belt. My viewpoint on things was more or less as follows:

— A Network was a Graph

— Competition was a Strategic Game

— Investment was a Facility Location problem

— Complexity was Combinatorial

— A good solution had to be non trivial

Today is my last day with Telefonica.

— A Network is a mindboggling mess of cables, boxes, buildings, antennas, people, and companies that run around it like bees to keep it running. It’s so amazing that it works most of the time. Nobody fully understands why.

— Competition is an even worse monster. Companies collaborate in one place and compete in the another. They are friends today and enemies tomorrow. Regulation, public opinion, and random events can change the game from one day to the next. Good luck trying to make sense of it through Game Theory.

— Cost structures, CAPEX, OPEX are so complex that even producing an accurate bill of something like our total electricity consumption is a highly non-trivial task. You can try to optimize one thing here only to find out that you are breaking 10 things there.

— Complexity is still combinatorial but not so much on the number of links but on the number of business units, business models, and on the number of assumption that one makes about the future.

— It’s really great when a solution is trivial

Telefonica has been a great school for me. I saw more cable than I even believed existed. I switched off rows of modems to save energy and got surprised to see the remaining ones locking at a higher bit-rate due to reduced crosstalk. I participated in building real stuff, from CDNs to WIFI aggregators, and from ride-sharing systems, to browser addons for privacy. I was given access to tons of numbers about how much things cost and how much traffic goes through them. Got to work with regulators, investment planers, strategy departments, innovation departments, communication and PR people, HR, and almost any other specialty that you can imagine. I was allowed to create an NGO.

Throughout all this I managed to remain a researcher. I don’t know if I managed to fit Gladwell’s predictions, but I am sure I stand way more firm on my feet today than when I walked in. I am grateful for the great opportunities I was given and for the wonderful people that I got to work with these 10 years.

Telefonica is a fantastic place for any young researcher that wants to take a walk on the real side of things.

Online advertising, data protection, and privacy concerns of users, industry, and regulators (Video)

A video of our panel at CPDP’17

10 thoughts that stuck with me after attending a data protection event almost every month for the last two years

1. Privacy is not hype.

The uncontrolled erosion of privacy is not a “victimless crime”. The cost is just shifted to the future. Could be paid tomorrow — an offending ad — or in a few years — a record of your assumed health status leaking to an insurance company.

2. People don’t currently care much about it but this can change fast.

Indeed people don’t seem to care that much right now, certainly not enough to give up any of the conveniences of the web. But nothing about it is written on stone. Some other things that people didn’t use to care about: smoking, car safety, airport security, dangerous toys, racial or sexual discrimination. Societies evolve … privacy discussions and debates have started reaching the wider public.

3. Privacy problems will only get worse.

Privacy vs. web business models is a textbook example of a Tragedy of the Commons. The financial temptation is just too great to be ignored, especially by companies that have nothing to risk or loose. Just find a niche for data that somebody would be willing to pay good money and go for it. Even if all the big companies play totally by the book, there’s still a long tail of thousands of medium to small trackers/data aggregators that can destroy consumer and regulator trust.

4. The web can actually break due to (lack of) privacy.

The web as big and successful as it is, is not indestructible. It too can fall from grace. Other media that once were king are no longer. News papers and TV are nowhere near their prior glory. Loss of trust is the Achilles’ heel of the web.

5. Privacy extremism or wishful thinking are not doing anybody any good.

Extremists at both sides of the spectrum are not doing anybody any good. Stopping all leakage is both impossible and unnecessary. Similarly, believing that the market will magically find its way without any special effort or care is wishful thinking. There are complex tradeoffs in the area to be confronted. That’s fine and nothing new really. Our societies have dealt with similar situations again and again in the past. From financial systems, to transportation, and medicine, there are always practical solutions for maximizing the societal benefits while minimising the risks for individuals. They just take time and effort before they can be reached with lots of trial and error along the way.

6. Complex technology can only be tamed by other, equally advanced, technology.

Regulation and self-regulation have a critical role in the area but are effectively helpless without specialised technology for auditing and testing for compliance, whether pro-actively or reactively. Have you lately taken your car to service? What did you see? A mechanic nowadays is merely connecting a computer to another that checks it by running a barrage of tests. Then he analyses and interpretes the results. A doctor is doing a similar thing but for humans. If the modern mechanic and doctor depend on technology for their daily job, why should a lawyer or a judge be left alone to make sense of privacy and data protection on the internet only with paper and a briefcase at hand?

7. Transparency software is the catalyst for trusting again the web.

Transparency software is the catalyser that can empower regulators and DPAs while creating the right incentives and market pressures to expedite the market convergence to a win-win state for all. But hold on a second … What is this “Transparency software”? Well it’s just what its name suggest. Simple to use software for checking (aha “transparency”) for information uses that users or regulators dont like. You know things like targeting minors online, targeting ads to patients, making arbitrary assumptions about one’s political, religious beliefs, or sexual preference.

A simple but fundamental idea here is that since it is virtually impossible to stop all information leakage (this would break the web faster than privacy), we can try to reduce it and then keep an open eye for controversial practices. A second important idea is to resist the temptation of finding holistic solutions and instead start working on specific data protection problems in given contexts. Context can improve many of our discussions and lead to tangible results faster and easier. If such tangible results don’t start showing up in the foreseeable future its only natural to expect that everyone will eventually be exhausted and give up the whole privacy and data protection matter altogether. Therefore why dont we start interleaving in our abstract discussions some more grounded ones. Pick up one application/service at a time, see what (if anything) is annoying people about it, and fix it. Solving specific issues in specific contexts is not as glamorous as magic general solutions but guess what — we can solve PII leakage issues in a specific website in a matter of hours and we can come up with tools to detect PII leakages in six months to a year, whereas coming up with a general purpose solution for all matters of privacy may take too long.

8. Transparency works. Ask the telcos about Network Neutrality.

Transparency has in the past proved to be quite effective. Indeed, almost a decade ago the Network Neutrality debate was ignited by reports that some Telcos were using Deep Packet Inspection (DPI) equipment to delay or block certain types of traffic, such as peer-to-peer (P2P) traffic from BitTorrent and other protocols. Unnoticed among scores of public statements and discussions, groups of computer scientists started building simple to use tools to check whether a broadband connection was being subjected to P2P blocking. Similarly, tools were built to test whether a broadband connection matched the advertised speed. All a user had to do to check whether his ISP was blocking BitTorrent was to visit a site and click on a button that launches a series of test and … voila. Verifying actual broadband speeds was made equally simple. The existence of such easy to use tools seems to have created the right incentives for Telcos to avoid blocking while making sure they deliver on speed promises.

9. Market, self-regulation, and regulation, in that order.

Most of the work for fixing data protection problems should be undertaken by the market. Regulators should alway be there to raise the bottom line and scare the bad guys. Independent audit makes sure self regulation is effective. It gives it more credibility since it can be checked by independent parties that it delivers on its promises.

10. The tools are not going to build themselves. Get busy!

Building the tools is not easy. Are we prepared? Do we have enough people with the necessary skills to build such tools? Questionable. Our $heriff tool for detecting online price discrimination took more than 2 years and very hard work from some very talented and committed PhD students and researchers. Similarly for our new eyeWnder tool for detecting behavioural targeting. Luckily the Data Transparency Lab community of builders is growing fast. Keep an eye for our forthcoming call for proposals and submit your ideas.

DTL Award Grants’16 summary

Few slides from my talk at Columbia University during DTL Conf’16. Summary of the review process and some advice on what to do and what to avoid in future calls. Slides here.

“Oh … but people don’t care about privacy”

If only I had a penny for every time I’ve heard this aphorism!

True, most typology studies out there as well as our own experiences verify that currently most of us act like the kids that rush to the table and grab the candy in the classic delayed gratification marshmallow experiment: convenience rules over our privacy concerns.

But nothing is written in stone about this. Given enough information and some time to digest it, even greedy kids learn. Just take a look at some other things we didn’t use to care about:

Airport security

Never had the pleasure of walking directly into a plane without a security check but from what I hear there was a time that this was how it worked. You would show up at the airport with ticket at hand. The check-in assistant would verify that your name is on the list and check your id. Then you would just walk past the security officer and go directly to the boarding gate. Simple as that.

Then came hijackers and ruined everything. Between 1968 and 1972, hijackers took over a commercial aircraft every other week, on average. So long with speedy boarding and farewell to smoking on planes 20 years later. If you want to get nostalgic, here you go:

Smoking

Since we are in the topic of smoking and given that lots of privacy concerns are caused by personal data collection practices in online advertising I cannot avoid thinking of Betty and Don Draper with cigarettes at hand at work, in the car, or even at home with the kids.

bettysmoking

To be honest I don’t have to go as far as the Mad Men heroes to draw examples. I am pretty, pretty, pretty sure I’ve seen some of this in real life.

Dangerous toys

Where do I start here? I could list some of my own but they are nowhere near as fun as some that I discovered with a quick search around the web. Things like:

Glass blowing kit
Lead casting kit
Working electric power tools for kids
The kerosine train
Magic killer guns that impress, burn, or knock down your friends.

Pictures are louder than words. Just take a look at The 8 Most Wildly Irresponsible Vintage Toys. Last in this list is the “Atomic Energy Lab” which brings us to:

Recreational uses of radio active materials

I love micro-mechanics and there’s nothing more lovable about it than mechanical watches. There is a magic in listening to the ticking sound of a mechanical movement while observing the seconds hand sweep smoothly above the dial. You can even do it the dark because modern watches use super luminova to illuminate watch dial markings and hands.

But it was not always like that. Before super luminova watches used Tritium and before that … Radium.

Swiss Military Watch Commander model with tritium-illuminated face

Radium watch hands under ultraviolet light

I am stretching dangerously beyond my field here but from what I gather, Tritium, a radio-active material, needs to be handled very carefully. Radium is downright dangerous. I mean “you are going to die” dangerous. Just read a bit about what happened to the “Radium Girls” who used to apply radium on watch dials in an assembly line in the ’20s.

But we are not done yet. Remember the title of the section is “Recreational uses of radio active materials”. Watch dials are just the tip of the iceberg. It’s more of a useful than a recreational thing to be able to read the time in the dark (with some exceptions). Could society stomach the dangers for workers? Who knows? It doesn’t really matter because there are these other uses, that were truly recreational (in the beginning at least) for which I hope the answer is pretty clear. Here goes the list:

Radium chocolate
Radium water
Radium toothpaste
Radium spa

Details and imagery at 9 Ways People Used Radium Before We Understood the Risks.

Anyhow, I can go on for hours on this, talk about car safety belts, car seat headrests, balconies, furniture design etc but I think where I am getting at is clear: Societies evolve.

It takes some time and some pain but they evolve. Especially in our time with the ease at which information spreads, they evolve fast. Mark my words, it wont be long before we look back and laugh at the way we approached privacy in the happy days of the web.

happy-days

Dagstuhl seminar on Online Privacy and Web Transparency

This April at Schloss Dagstuhl. Info here: https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=17162

User profiling in the time of HTTPS

Check out our latest paper in ACM IMC’16 demonstrating that visited pages, and hence user interests, can be profiled despite HTTPS. Transport layer fingerprinting is the magic word(s).

Workshop on Data and Algorithmic Transparency (DAT’16)

After popular demand, here comes DAT, a venue dedicated to transparency related research. Co-located this year with the main DTL Conf and FatML. Nov 19, Columbia University, New York.

The role and importance of Context and Verifiability in Data Protection

Over the last 18 months I’ve been attending a Data Protection/Privacy event almost every month. It has been a pretty rewarding experience; one that is very different to the usual round of CS conferences that I have been following for the better part of my career.

I’ve been listening to policy makers, lawyers, marketeers, journalists, and occasionally engineers, discussing and debating the perils from the “erosion of privacy”, the measures to be taken, and the need to find a balance between innovation and growth on one side, and civil rights, on the other.

In addition to the events that I have attended myself, I have also read several reports on the outcomes of other consultations on the topics (for example the “bridges” and “shifts” reports). With this post I would like to discuss two issues that have been spinning in my head since the earliest days of my involvement with privacy and data protection. I am sure that these are thoughts that must have occurred to others as well, but I haven’t seen them spelled out clearly, hence the post.

Context (or lack of)

I’ve always enjoyed discussing abstract ideas — fairness, transparency, reputation, information, privacy. There’s something inherently tempting in discussing such abstract notions (I’ll try to avoid using the “ph” word). Maybe it is the hope that a breakthrough at this abstract layer will automatically solve innumerable specific and practical problems relating to each on of these abstract ideas. Whoever makes such a contribution certainly has a claim (and a chance) on immortality.

I am tempted to believe that this might be the underlying reason that the huge majority of the discussions that I have attended stay at this very high, very abstract level. “A general solution to the privacy issue”, “the value of private information”, “the danger from privacy leakage”. All these statements provide good and natural starting points for debates in the area. But to make a founded argument, and hopefully reach some useful conclusion, one that stands a chance to have an impact on real world technologies and services, you need to have a handle, something concrete enough to build upon. I call this “Context”. My main point here is that most discussions that I have attended stay at a very abstract level and thus lack concrete Context.

Having Context can improve many of our discussions and lead to tangible results faster and easier. If such tangible results don’t start showing up in the foreseeable future its only natural to expect that everyone will eventually get fed up, become bored and exhausted, and forget about the whole privacy and data protection matter altogether. Therefore why dont we start interleaving in our abstract discussions some more grounded ones. Pick up one application/service at a time, see what (if anything) is annoying people about it, and fix it. Solving specific issues in specific contexts is not as glamorous as magic general solutions but guess what — we can solve PII leakage issues in a specific website in a matter of hours and we can come up with tools to detect PII leakages in six months to a year, whereas coming up with a general purpose solution for all matters of privacy may take too long.

Making tangible progress, even in specific contexts, is good for moral. It’s also the best chance that we have to eventually develop a general solution (if such a thing is possible anyway).

In a following up post I’ll touch upon Verifiability, which is the second idea that I have not seen in most public discussions around data protection.