Your tools shouldn’t spy on you

Let’s start by stating something I believe is so obvious it shouldn’t need stated:

You should not have to worry about your tools spying on you.

You should be able to run a command that doesn’t use the network, knowing that it won’t open a network port. You should be confident that your tool is doing its best for you, not reporting back on you to someone else. In short, you should be able to run software without it looking over your shoulder like a voyeur with a clipboard.

But nearly a year ago that kind of spyware is just what Microsoft/.Net Foundation added to the dotnet command line.

I’ve been using the dotnet core since well before then and I never knew about this. And I’m one of the few people I know who tries to keep up with this kind of nonsense! I feel foolish and embarrassed for not knowing about this spyware when it was added. And maybe my embarrassment at having been spied upon for months is colouring my judgement a little. But I still believe it’s wrong.

I’m sure they’ll say that it’s to improve the tools, but - while I have my doubts that’s true - it does bring up the question:

Would you prefer a tool you can just trust, or a tool that may have better features but that you constantly have to check to verify isn’t doing anything it shouldn’t?

I’d rather be able to trust my tools. I just don’t like the idea of a voyeur with a clipboard watching over my shoulder, sating its prurient interest by taking notes and gathering statistics.

This dotnet voyeur then sends these notes and statistics to Microsoft without asking the user.

Your only chance of opting out is knowing the special environment variable incantation to use.

But maybe they’ve tweaked it so that today it’s sending files as well? They managed to sneak the first change past me, so have I missed another? No? Maybe not. But tomorrow? I can’t know, since they’ve demonstrated I can’t trust them or the tool they created.

What used to be a simple ‘dotnet run’ command has turned into something that has me watching my back. Why are they so interested in my typos that they’ve paid someone to sit down and write code to capture them? If they actually want to improve the product, why not have that developer writing code that adds new features rather than spying on me?

And that’s why it’s not a minor thing. I’m not (quite) so arrogant that I think Microsoft is targeting me. I don’t even think they’re especially interested in the telemetry from ‘dotnet run’. It’s that they’re seeking to normalise this spying that makes it more than a minor problem.

We’ve seen this with Windows 10 hoovering up all the data it can get, just like Facebook, Google, Apple and Amazon. It’s in all their interests to have us become inured to this constant surveillance. And I don’t like it.

Homebrew faced a similar issue around the same time dotnet introduced their telemetry. I noticed the Homebrew debacle but didn’t notice the introduction of telemetry in the tool I use all the time. (I’m still embarrassed by that.) To show I’m not the only person concerned about telemetry-gathering tools, here’s a blog post about Homebrew - ‘Homebrew betrayed us all to Google’. It starts with the summary:

    1. Open-source is about trust. Trust is underminded by things like tracking.
    2. Do not track your users. In the rare case you really need anonymous data, ask your users first.
    3. Never use Google products (or any other “big data” company that relies on making money out of the data you provide) to track your users.
    4. Using Google’s tracking and then calling it “anonymous” is a lie. Google collects tons of information of its users and even non-users. There’s no way to know what data Google will relate internally. Even if you don’t get to see all of the collected information, Google still has them.
    5. Opt-out is never an excuse. It always excludes most users (which either don’t care, or have more severe things to care about than protecting their privacy in every random app they’re using).

(Source: ‘Homebrew betrayed us all to Google’)

Homebrew backed down a little and provided a better opt-out mechanism, but it annoyed a lot of people. (More, probably, than are annoyed at Microsoft. Let’s hear it for low expectations!)

Opt-out mechanisms aren’t really enough though. For one thing, why should I have to opt out when I didn’t opt in in the first place? For another, that may fix it for me, but I don’t want your tools spying on you either. For a third, the opt-out procedure is (deliberately?) awkward.

It’s not something you just pick, it’s something that needs to be set for every user on every machine in every shell and every container. And you need to get it perfect every single time, or else the tool will assume it can report back on what you’re doing.

Opting everyone in automatically as Microsoft have done is just plain dishonest. There’ll always be some portion of users who’d opt in, some portion who’d opt out, and some who’d go with the default. But you know what, Microsoft? Those people who wouldn’t have opted in but who haven’t opted out? They’re the ones whose data you’re taking without permission. You just don’t have permission to take that data. (Don’t start me on EULAs when the person agreeing to the EULA may not be the person running the software…) You don’t have informed consent here, because you didn’t actually ask. Worse - you know that if you asked for informed consent, you might not get it. That’s an argument against spying on people, not an argument for spying and not asking.

And that’s before you get to people like me who - despite what you consider ‘transparency’ - didn’t even know there was a possibility of a voyeur with a clipboard looking over my shoulder.

So how could Microsoft fix the issue?

There’s really only one fix I’d like - take the spying code out of the tool completely. If there are people who really want to send their telemetry to Microsoft, by all means find a way to accommodate them. But don’t put spying code into the tool. Keep it clean. Have the telemetry spyware in a separate module that has to be explicitly downloaded and installed. (Call it ‘Voyeur.DLL’ if you like.) Keep the core pure.

And have a strong ‘Private By Default’ policy. Allow people to feel safe using your tools. It’s hard enough keeping up with the latest in technology without having to keep up with the latest in obnoxious business practices.

Private By Default would mean guaranteeing that it never gathered any information on you, even in aggregate. That it never sent any data that you didn’t explicitly ask it to send. That it never opened any network connections you didn’t ask it to open. That it never did anything not explicitly to do with carrying out the user’s intent.

In absence of that, what can I do to stop it spying on me?

  1. I could just not use dotnet. For me this is the easiest and the hardest approach. It’d be easy because just walking away from dotnet would mean it’s not my problem any more. There’d be no voyeur looking over my shoulder. It’d be hard for me too though. I’m getting to the point where a large side project is becoming useful, and it’s based on dotnet. It’d be difficult just to walk away from that.
  2. I could block telemetry traffic on the router or firewall. Here’s someone’s (not my) best guess at the hosts to which it sends data. I like the idea of ISPs blocking all those hosts - denying access to because of Microsoft’s telemetry-gathering could be hilarious.
  3. I could wrap the dotnet command in a script that automatically sets the environment variable for every single invocation of the command. Here’s one way to do it:

    echo "Trying to run a non-spying version of dotnet..."
    DOTNET_CLI_TELEMETRY_OPTOUT=true /usr/local/share/dotnet/dotnet $*
    (That’s for bash on OS X - if you call it ‘dotnet’, make sure it’s on your $PATH ahead of /usr/local/share/dotnet/dotnet.)
  4. I could add the environment variable to every single RC file for every single shell for every single user. And every single docker file. For every single development machine and server.

I’ll be doing a combination of all those things. I might keep using dotnet for existing projects, but I’m fucked if I’m starting any new dotnet projects now.

The ‘tech stack’ conversation has come up in $WORK a few times recently. Where before I’d have talked about dotnet core I’m sure as hell not going to now. I won’t just not be talking it up, I’ll be actively talking it down and discussing alternatives.

From a wider perspective, what could I do to fix the root of the dotnet spying problem?

  1. Rewrite the part of the tool that calls the spying code. It’d be easy enough for me to fix (it’s right here), but that wouldn’t solve the problem of Microsoft writing tools that spy on users, it would just stop my version of the tool from spying on me. Your version could still spy on you.
  2. Send the code change to Microsoft as a ‘pull request’. I think we both know what would happen with that.
  3. ‘Fork’ the code, and provide a binary distribution of the fixed/improved code so that everyone that wants can use it.
  4. Start a ‘Private By Default’ campaign in the hope we can shame Microsoft into behaving better.

But you know what? I’m not going to do any of that. I’m just going to point out why I think it’s wrong, then try moving on to using better, more trustworthy tools. I’ll still use it for current projects but I’ll be trying to move away from the platform.

Today I was planning on settling down to read the new AssemblyLoadContext design document pull request and delving a lot deeper into that area. My dotnet project needs to generate and load assemblies in different contexts and it has got as far as it can without this kind of functionality. I might even have written a blog post about it. After all, it’s an area not well served by others and the documentation doesn’t go into a lot of detail about how to use the API.

Instead I’m writing about how dotnet has managed to shatter my trust.

I’ve no enthusiasm for working with dotnet now. No desire to watch the weekly ASP.Net standups. No desire to write C#. No desire to work on my side project built on dotnet core MVC. I keep looking around for the voyeur with the clipboard.

Tags: Clueless Idiocy
Created by on Logo15659OpinionatedGeek Ltd.Logo15659


The blogs you read shouldn’t spy on you either, yet that hasn’t stopped you from including Google Analytics on this page. Why is it okay for you to feed my browsing data to a third party company, but when Microsoft wants to see what their own users are doing it’s completely out of the question?

Created by RyanAKearney on Monday, 24 July 2017 at 2:42PM

Is it?

Thanks for your comment. I’m certainly open to changing the use of Google Analytics here if that's appropriate.

You ask me why do I think they’re different. It seems to me there’s a qualitative difference between ‘asking my computer to run a command’ and ‘asking someone else’s computer for something they wrote’. One can perhaps be done independent of sharing any data, and one cannot. To my mind that changes the user’s model of what happens when performing each action.

If a browser asks my server for a web page, I can keep a note of that request. (I may want to do so to, for example, charge for something, to provide an audit trail in case someone is accessing something they shouldn’t, or to monitor activity to see if certain network addresses are repeatedly trying to download large files.)

Is this my data, given it’s my server sending my content? If so, is there a qualitative difference between me uploading my data to Google Analytics, and me putting the Google Analytics code on my pages?

And taking the example in the other direction, is there a qualitative difference between me having the data as a result of responding to the request, and me forcing an action to make a request that I can then use?

To complicate matters further, some pages on this site are interactive tools supported by adverts. Again, I think what happens matches the user’s model of that interaction - some data is collected, some screen real-estate is used to display an advert, and in return the tool is available for use.

Would I have bothered to create the tools without being able to recoup some costs from the adverts? I don’t know, but that’s another area I’m open to persuasion.

I’m genuinely interested in the answers to these questions. I really am (always!) just trying to do what’s right.

Created by OpinionatedGeek on Monday, 24 July 2017 at 6:38PM (source)

In the interest of being constructive

I hear your concerns and to some degree share them. However, as you would well know, there are two phases to the development of any good software - the initial creative phase and the longer term refinement phase. Whilst the former is often a largely creative exercise, the latter is best served when based on data. For anyone who’s had the misfortune of relying on NPS, they would be acutely aware that there is provably no correlation between what people say and what people do. So here we have the software developer stuck firmly on the horns of a dilemma, when addressing the refinement phase for their product. As someone who’s been writing software for over 40 years on many different platforms, I’m firmly of the opinion that software has always spied on its users, it’s just that we used to call it logging. We didn’t tell anyone we were logging, but were expected to have adequate logs to assist in identifying problems - the most basic form of refinement. I completely agree that the accumulation of large amounts of ‘logged’ data is a concern, but I’m equally of the opinion that having that data provides the best possible insight to people’s use, successful or unsuccessful, of your product and therefore the best foundation for knowing where best to put the effort to improving your product. I’m personally encouraged by the effort being made by many organisations to ensure that this data is de-identified and see publishing it, as a good step in the right direction. During my career I have worked for a number of large software vendors and have never personally experienced the ‘evil intent’ that you seem so concerned about, but am not naïve enough to believe that no-one has this intent. So, what to do? For me, I’m happy to live with either an opt in or an opt out model, simply because this is now the norm, for the reasons stated here, and it’s something I’m aware of, so will always pay attention to it. I do think that believing that there is any software out there that is not doing some form of spying is naïve, so am interested in what you propose the broader industry should do? The clock is not going to be turned back and the ever increasing complexity of the software we use today, demands that decisions are based on telemetry, not guesswork. Finally, I also agree with you that this is an important debate that needs to be held in a non-emotive, public forum, so will wait with interest for your constructive suggestions on how best to deal with this dilemma.

Created by NigelGPage on Monday, 31 July 2017 at 8:32PM

Informed Consent

I think there are a couple of important differences between logging and telemetry:

  1.  Purpose. Logging is typically used for debugging purposes - to provide a trace of what a program was doing when something went wrong. That’s not what Microsoft is doing with its telemetry - that’s purely focused on spying on what the user is doing when the program is working properly.
  2. Control. If someone needs logs from my machine to debug an issue, I can send them the logs. Or I can keep them and not help debug the issue (possibly meaning it will never be addressed). Or I can look at the logs and remove anything I don’t want public before sending them. Telemetry is just automatically sent, in a largely invisible process (unless you’re actively watching for network connections with something like Little Snitch).

As for a constructive answer to the problem, it doesn’t seem that difficult: informed consent.

If you really, really want to spy on the user, ask them upfront, in an unambiguous way, if that’s OK, and tell them exactly what information youll be taking.

I was struck by how easy this is when I installed Atom a few days ago. Both Atom and Visual Studio Code share Electron as a common base, but they have two different approaches to telemetry.

Atom asks for permission to send telemetry the first time you run it, requiring a Yes or a No. It’s a very big screen and very hard to miss. I’m pretty sure someone clicking Yes or No has understood what they’re doing. (It would be better if the page specified exactly what data was collected.)

VSCode doesn’t take this approach. If you want to turn telemetry off you have to hand-edit a JSON file. And in the past even if you did that, it didn’t work - it used to send telemetry to Microsoft saying youd turned telemetry off!

I’m increasingly concerned that EULAs fail the informed consent test. It seems to me that very, very few people ever read the EULA, and if it isn’t read then how is there ‘informed consent’?

I don’t think informed consent is hard or bad, as Atom shows. If a program wants to watch what users do, should it not be up front and honest about it?

Created by OpinionatedGeek on Tuesday, 1 August 2017 at 6:43AM (source)