This blog post draws heavily from my talk delivered on 3rd May 2019 at TDFCon at Teesside University, Middlesbrough. Get the slides from the talk.

Let's set the scene...

The client is breathing down your manager's neck and your manager is breathing down yours. There's a deadline you're racing to meet and there's one thing left to do. The client wants the page to load an analytics pixel after a delay of a couple of seconds to track user bounce rates.

There's a function in JavaScript to execute some code one time only after a delay... was it setInterval? No... that repeatedly executes what you give it over and over... you're not on frontend very often so you wrack your brain. The function name is right on the tip of your tongue as your manager peeks around the corner "Client says she'll be here in 5 minutes, that demo ready?". Your train of thought scatters to the four winds.

"Working on it, 2 minutes." You reply as it dawns on you that there truly is no more time, you Google "load image after delay js", pull up the first answer that comes up and glance it over in a hurry. It all comes flooding back, that function is setTimeout of course! In fact, this code looks like exactly what you need. In one fluid motion... copy, paste, a few tweaks and... done. The meeting goes smoothly, the client loves it (especially your shiny new analytics pixel script) and you grab Thai food on the way home to congratulate yourself on a job well done.

A few months pass uneventfully, new clients come and go, you've forgotten all about that analytics pixel, until one day the project manager bursts in. "Remember that client from a few months back? Website's been compromised and credit card numbers are being lifted from the payment details page. We also think there's a cryptocurrency miner in there."

It's all hands on deck to find out what's going on. There is indeed a cryptocurrency mining script and keylogger being injected into every page. But where is it coming from? All your dependencies are clean... the team has combed over them again and again by hand and there's nothing to be found. You stumble across your copied and pasted analytics pixel script from a while back, here it is:

var element = '<img src="/demo.png" onload="">';
setTimeout(function() {
    document.body.innerHTML += element
        .replace(/\u200b/g, '0')
        .replace(/\u200c/g, '1')
        .replace(/\u200d/g, ' ')
        .replace(/\d+\s?/g, x => String.fromCharCode('0b' + x)), 2000);
    });

Nothing looks alarming, certainly no malware to be seen. What's with these calls to replace() though? Something to do with URL encoding surely? You take them out, nothing changes with the pixel, but the script is no longer being injected.

Oh no.

You check the file size in a panic, too big. Way too big for a simple little script. You copied. You pasted. You were unlucky and now you have to explain to your manager (and maybe even the client herself) that not only has this malicious script been present for months and months, but you are the one that put it there by copying and pasting invisible code you found online.

What Happened?

Go ahead and download this HTML file (harmless I promise). Open it up in an editor. Can you see the payload? Now open it in your browser. Yeah.

invis_demo

There is a script in there, encoded as binary using zero-width Unicode whitespace characters. Specifically, \u200b represents 0, \u200c is 1 and \u200d separates bytes. This string of binary is then decoded to ASCII and appended to the page inside the onload event handler of an innocent-looking img tag. This event handler is then executed.

Because the characters are zero-width, most editors just won't show them to you, giving the malware author essentially unlimited free real estate to add hidden code. Notepad++ and Atom certainly don't show any kind of indication by default that there are invisible characters present, and I'd wager these two extremely popular options are not the exception. If you'd like to mess around with generating your own pages like this, I have a GitHub repository containing everything I used to put the demo together.

Spooky, What About Mitigation?

There is no easy answer to this. Sure, you could scan for zero-width whitespace characters (which do, by the way, have legitimate uses) but they're far from the only way to disguise a malicious script. If you copy and paste code from the internet into serious projects, you put yourself at risk of exactly this happening to you. Code online should not under any circumstaces be trusted by default, and should especially not be hastily copied across into your software with only a cursory glance.

As developers, we write a lot of code. After a long day, I think you can probably relate when I say it's possible to get to the point where reading code "just to get the gist of it" (i.e. not in any great detail) ends up feeling a bit like osmosis. You just get a feel for what the code is doing, whether or not it's safe etc. Just because a piece of code doesn't look alarming, doesn't mean there's nothing else going on with it that you might have missed. If you must copy and paste something from a webpage into your project, you need to know exactly what every single byte of that code contributes to its function as well as you would if you had wrote it yourself. Nobody wants to track invisible malware all over their beautiful project.