Computer Science – Personal homepage

Podcast: The Future of Privacy with ProperData PIs

Episode 04: The Future of Privacy with ProperData PIs description

This week, we have Johanna, Umar and Hieu as hosts. It is a special episode, recording live, with a panel of guests in front of an audience. We’ll be unpacking the current state of privacy and exploring paths forward.
Our panelists are four professors from the ProperData Frontier project, which is the research center we’re a part of. ProperData spans 6 institutions, 11 professors, and more than 50 researchers. With us today are Athina Markopoulou (UC Irvine), Zubair Shafiq (UC Davis), David Choffnes (Northeastern University), and Nikolaos Laoutaris (IMDEA Networks).

Podcast available here.

Myth-busting: Most tracking flows on European citizens DO NOT terminate outside EU28 GDPR borders

In our latest measurement study we use data from 350 volunteers combined with aggregate network logs from 60M ISP subscribers to show that around 90% of tracking flows originating within Europe also terminate within Europe.

This of course does not preclude that the collected data is not subsequently moved elsewhere but at least we know that the tracking end points are more frequently than not within reach of European Data Protection Authorities.

An optimistic result contrasting prior belief that tracking flows went straight out of Europe. For more details check:

C. Iordanou, G. Smaragdakis, I. Poese, N. Laoutaris, “Tracing Cross Border Web Tracking,” ACM IMC’18. [pdf]

The three types of research papers and how I learned to recognise them

After 15+ years of reading, writing, presenting, reviewing, selecting, discussing, stapling, and doodling on the margins of papers I have concluded that there exist three large families of research papers:

About-papers

About papers are usually written, read, discussed, championed, sent as attachments by people that care about an area. They love the area so much such that everything that has anything to do with it immediately becomes an interesting read. You can write an about-paper about a dataset that you managed to get hold of, about a trial you ran with real users, about the latest research infrastructure that you are developing, or about your favourite new technology. I love reading well written about-papers. I just prefer reading them in magazines, news papers, blogs, newsletters, etc. I definitely don’t like waking up in the middle of the night to review them, especially in weekends and during holidays. The hallmark of an about-paper is its general interest about the area and its relative disinterest about specific contributions and questions in this area.

Concept-papers

A concept is a magic lens through which complex things become simple helping us to finally understand them. Think of price of anarchy, differential privacy, betweenness centrality, power-usage efficiency. Great concept-papers can have a profound positive impact in our understanding of the world. They cut across areas and problems and reveal underlying hidden truths and structures. Unfortunately, most concept papers are not of the great type. It’s really tempting to think that you’ve come across the silver bullet that will pierce through any type of steel and concrete. Bad concept-papers confuse and distract. Instead of being a means, they become an end to themselves. In the process they distract our attention from real problems and waste huge amounts of time. The easiest way to write the wrong concept-paper is to believe too much in genius and divine intervention. Despite being more than welcome, neither the first nor the second are strict pre-requisites for a concept-paper. Experience and domain expertise is often all it takes to come up with a great concept after having observed a common structure across different fields and problems. A special case of concept-paper craziness is the technology-concept-paper. Using bit-torrent to send people to Mars, bitcoin to cure cancer, and tcp to alleviate traffic jams in Beijing.

Question-papers

“Is location-based price discrimination happening in e-commerce?”
“Which advertisers place targeted ads driven by sensitive personal data?
“How much cross-subsidization exists between heavy and light consumers of residential broadband?”
“What percentage of online advertising revenues go to fake clicks?”
“Who starts fake news campaigns in social media?”
“Can we build sub 10ms delay networks?”.

Questions-papers are all about answering a clear and easy to understand question about something that is important and hard to guess without doing some work first. Surely you can find questions in both about- and concept-papers. The difference is that in question-papers it is the question that leads the entire effort as opposed to taking the back seat as in the other two. A clear and important question is an infallible compass for finding your way among the myriads of alternatives arising during any research effort. Putting the question on the driver’s seat makes everything else fall easily in place: the dataset that you need, the expertise required for answering it, the right definition, the right algorithm, the right system, the results to show.

Over the years I have written papers of all three types but I must admit that lately I only care about question-papers. I would love to write a good concept paper in the area where I currently work but I am afraid I still have some question to ask and answer before being ready to do so,

User profiling in the time of HTTPS

Check out our latest paper in ACM IMC’16 demonstrating that visited pages, and hence user interests, can be profiled despite HTTPS. Transport layer fingerprinting is the magic word(s).

The role and importance of Context and Verifiability in Data Protection

Over the last 18 months I’ve been attending a Data Protection/Privacy event almost every month. It has been a pretty rewarding experience; one that is very different to the usual round of CS conferences that I have been following for the better part of my career.

I’ve been listening to policy makers, lawyers, marketeers, journalists, and occasionally engineers, discussing and debating the perils from the “erosion of privacy”, the measures to be taken, and the need to find a balance between innovation and growth on one side, and civil rights, on the other.

In addition to the events that I have attended myself, I have also read several reports on the outcomes of other consultations on the topics (for example the “bridges” and “shifts” reports). With this post I would like to discuss two issues that have been spinning in my head since the earliest days of my involvement with privacy and data protection. I am sure that these are thoughts that must have occurred to others as well, but I haven’t seen them spelled out clearly, hence the post.

Context (or lack of)

I’ve always enjoyed discussing abstract ideas — fairness, transparency, reputation, information, privacy. There’s something inherently tempting in discussing such abstract notions (I’ll try to avoid using the “ph” word). Maybe it is the hope that a breakthrough at this abstract layer will automatically solve innumerable specific and practical problems relating to each on of these abstract ideas. Whoever makes such a contribution certainly has a claim (and a chance) on immortality.

I am tempted to believe that this might be the underlying reason that the huge majority of the discussions that I have attended stay at this very high, very abstract level. “A general solution to the privacy issue”, “the value of private information”, “the danger from privacy leakage”. All these statements provide good and natural starting points for debates in the area. But to make a founded argument, and hopefully reach some useful conclusion, one that stands a chance to have an impact on real world technologies and services, you need to have a handle, something concrete enough to build upon. I call this “Context”. My main point here is that most discussions that I have attended stay at a very abstract level and thus lack concrete Context.

Having Context can improve many of our discussions and lead to tangible results faster and easier. If such tangible results don’t start showing up in the foreseeable future its only natural to expect that everyone will eventually get fed up, become bored and exhausted, and forget about the whole privacy and data protection matter altogether. Therefore why dont we start interleaving in our abstract discussions some more grounded ones. Pick up one application/service at a time, see what (if anything) is annoying people about it, and fix it. Solving specific issues in specific contexts is not as glamorous as magic general solutions but guess what — we can solve PII leakage issues in a specific website in a matter of hours and we can come up with tools to detect PII leakages in six months to a year, whereas coming up with a general purpose solution for all matters of privacy may take too long.

Making tangible progress, even in specific contexts, is good for moral. It’s also the best chance that we have to eventually develop a general solution (if such a thing is possible anyway).

In a following up post I’ll touch upon Verifiability, which is the second idea that I have not seen in most public discussions around data protection.

Mathematical fixed points, scale invariance, and poetry

I know a tiny bit of mathematics and even less of poetry, but I recognise fixpoints and scale invariance when I read them:

What is Good? What is Evil?:

“A point A point
and on it you find balance and exist
and beyond it turmoil and darkness
and before it the roar of angels

A point A point
and on it you can progress infinitely
otherwise, nothing else exists any more”

And the Scales which, stretching my arms,
seemed to balance light and instinct, were

THIS WORLD
this small world the great!

“Axion Esti”
Odysseus Elytis

Thoughts on reviewing and selection committees

einstein

At last, after a very intense month of running DTL Award Grants’16 I can sit back on the balcony, have a coffee, and relax without getting Comment Notifications from HotCRP, or urgent emails every 2 mins from my super-human co-chair Dr Balachander Krishnamurthy (aka bala ==> spanish translation ==> bullet … not a coincidence if you ask me or if you ‘ve been in my shoes).

We’ve selected 6 wonderful groups to fund for this year, the work is done, the notifications have been sent, and the list of winners in already online.

So what am I doing writing about DTL Grants on my first calm Saturday morning after a month? Well, I guess I cannot disconnect yet. But there’s more to it. Having done this for a second year in a row I am starting to see things that were not apparent before, despite the numerous committees on which I have served in the past. So I guess this is a log of things that I have learned in these two years. In no particular order:

Selection CAN be lean and fast

We asked applicants to describe their idea for transparency software in 3 pages. We asked each TPC member to review 9 submissions. Given the length of proposal we estimated this to be a day’s worth of work. Additional time was set aside for online discussions but this again lasted only 4 days, while the final TPC phone meeting (no need to fly to the other side of the world) lasted ** exactly ** 1 hour. Last but not least, we announced the winners 5 weeks after receiving the submissions.

Why and What vs. How

Anyone having some experience with either paper selection (sigcomm etc) or grant selection (H2020, FP7, etc) committee work surely understands that the above numbers are far from usual. The load we put on both the applicants and the committee was modest and we returned the results in record time. Assuming that the produced result is of high quality (it is, but I’ll come to this next) the natural question is “what’s the trick?”.

Well if you haven’t already noticed, the answer in on the heading above and in bold. We focused on the “Why” and the “What” instead of the “How”. Technical people love the How. It’s what got them their PhD and professional recognition and what they actually love doing and are great at — finding smart solutions to difficult problems. But the mission on the committee was not to have in-depth How-discussions. We do that all the time and in various contexts. What we wanted was to hear from the community about important problems on the intersection between transparency/privacy/discrimination (the Why) and ideas about new software to address those problems (the What). The How would be nice to know in detail but this would come at a high cost in terms of workload for both the applicants and the committee. And remember we wanted to run a lean and fast process. So we set the bar low on How. You just needed to show that it is doable and that you have the skills to do it and we would trust that you know How.

Good enough vs. perfect rank

Ok lets be clear about this — our ranking is NOT perfect. There are several reasons for this, including: we cannot define perfect, even if we could, probably we would be fooling ourselves with the definition, we cannot understand perfect, we don’t have enough time or energy for perfect, we are not perfect ourselves and last but not least … we don’t need perfect. Yes we don’t need perfect because our job is not to bring perfection and justice in an other wise imperfect and unjust world. This is not what we set out to do.

What we set out to do is to find proposals addressing interesting transparency problems and maximise the probability that they may actually deliver some useful software in the end. This means that our realistic objective was to select ** good enough proposals **. Thus our biggest priority was to quickly identify proposals that did not target an interesting enough problem or that failed to convince that they would deliver useful software. Notice here, that most proposal that failed on the latter, did so because they didn’t even bother to explain the “What?”. In no case that I remember there was a rejection due to “How?” but there were several because it was not clear what would be delivered in the end.

Having filtered those, we were left with several good enough proposals. From there to final acceptance little things could make a big difference in terms of outcome. Things like “has this already been funded?”, “would I install/use this software?”, “is this better for a scientific paper instead of a Grant?”, “is this something that would appeal to non-researchers?”. We did our best to make the right calls but there should be no doubt that the process is imperfect. Given our objectives however: 1) good enough vs. perfect, and 2) lean vs. heavy I believe we have remained truthful to our principles.

I’ll stop here by thanking once more all the applicants, the committee, Bala, and the board for all their hard work.

This has been fun and I am very optimistic that it will also be useful.

See you next year.

Disconnect.