‘Trojan Source’ Hides ‘Invisible’ Bugs within Source Code!

‘Trojan Source’ Hides ‘Invisible’ Bugs within Source Code!

The RLO trick of exploiting how Unicode handles script ordering & a related homoglyph attack can slowly switch the real name of malware.

Researchers have found a new way to encode potentially malicious source code, such that human reviewers see a harmless version & compilers see the invisible, bad version.

Unicode

Named “Trojan Source attacks,” the method “exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed.

This leads to vulnerabilities that cannot be perceived directly by human code reviewers,” Cambridge University researchers Nicholas Boucher & Ross Anderson stated in a paper (PDF) published on Mon.

Immediate Threat

Boucher & Anderson explained that the attacks jeopardise all source code, posing “an immediate threat both to 1st-party software & of supply-chain compromise across the industry.”

They’ve published working proofs of concept (PoCs) of attacks in the C, C++, C#, JavaScript, Java, Rust, Go & Python programming languages, though the researchers note that they suspect that the attack will also work against “most other modern languages.”

Co-ordinated Disclosure – 2x CVEs

The researchers have coordinated disclosure with 19 organisations, many of which are now releasing updates to address the security weakness in code compilers, interpreters, code editors & repositories. Some of those organizations dismissed the notification because it did not match vulnerabilities with which they are more familiar, the researchers noted.

There are 2 CVEs involved, both of which MITRE issued against the Unicode specification. What the researchers called a “potentially devastating” attack against the Unicode bidirectional algorithm (BiDi) through version 14.0 is tracked as CVE-2021-42574.

Arabic

BiDi manages the order in which text displays – for example, from left to right with the Latin alphabet, or from right to left with Arabic or Hebrew characters.

A related attack relies on the use of visually similar characters, known as homoglyphs, tracked as CVE-2021-42694.

BiDi Algorithm

With regards to the BiDi attack, the paper explains that computer systems need a way to resolve conflicting directionality when it comes to mixed scripts – i.e., Latin scripts mixed in with Arabic – that have conflicting display orders.

In Unicode, that conflict is typically managed by the BiDi algorithm. But sometimes, the algorithm does not suffice, in which case Unicode uses override control characters that insert invisible characters to enable the switching of character display ordering.

Old Unicode Right-to-Left Override

The Unicode BiDi override method – known as the right-to-left (RLO) technique – is an old attack that keeps getting used.

The overrides enable even single-script characters to be displayed in an order that’s different from their logical encoding, the researchers explained – a fact that’s previously been exploited to disguise the real name of a malicious executable spread via email or, in one 2013 attack, a registry key.

Zero-Day Vulnerability

More recently, in 2018, attackers used RLO to deliver crypto mining malware by exploiting a zero-day vulnerability in the Telegram messaging application, as Kaspersky researchers detailed at the time.

What makes these attacks possible is that most “well-designed” programming languages shun arbitrary control characters found in source code, since they screw up the logic, the researchers explained.

Override Characters

Random BiDi override characters will typically result in a compiler or interpreter syntax error – errors that are avoided by tucking them into comments or strings, both of which are ignored by compilers & interpreters.

“While both comments & strings will have syntax-specific semantics indicating their start & end, these bounds are not respected by Bidi overrides,” according to the writeup.

“Therefore, by placing Bidi override characters exclusively within comments & strings, we can smuggle them into source code in a manner that most compilers will accept.”

Supply-Chain Attack

The researchers suggested that if you put it all together, you get the ability to create perfectly valid, perfectly malicious source code that could be used to create a novel supply-chain attack that can be conducted on source code.

“By injecting Unicode Bidi override characters into comments & strings, an adversary can produce syntactically-valid source code in most modern languages for which the display order of characters presents logic that diverges from the real logic,” they wrote. “In effect, we anagram program A into program B.”

Source Code

Such an attack would be hard for a human code reviewer to detect, given how genuine the rendered source code looks.

“If the change in logic is subtle enough to go undetected in subsequent testing, an adversary could introduce targeted vulnerabilities without being detected,” they continued.

It gets worse: the paper cautioned: Bidi override characters persist in copy-&-paste functions on most modern browsers, editors & operating systems, meaning that “any developer who copies code from an untrusted source into a protected code base may inadvertently introduce an invisible vulnerability.”

Security Exploits

That kind of dangerous code copying has happened before in real-world security exploits, the researchers noted.

One example was in June 2020, when at least 26 open-source code repositories were found to be infected with Octopus Scanner malware, which targets the Apache NetBeans Java integrated development environment (IDE) & was found nesting in GitHub source-code repositories, just waiting to take over developer machines.

The sheer amount of copying & pasting from GitHub, Stack Overflow & other repositories makes this a real possibility as an attack vector, experts say.

Attack Flow

John Bambenek, Principal Threat Hunter at digital IT & security operations company Netenrich, outlined that it would be a “fairly difficult attack flow” to discretely maintain, but the type of threat player that can poison a supply chain are the ones who are slick enough to worry about.

“Software engineering companies should … update their compilers as soon as possible because the groups that engage in supply chain compromise are the exact groups who both have the sophistication to manage this attack flow & the desire to use such techniques,” he explained.

Mitigate What’s Invisible?

Jon Gaines, Senior Application Security Consultant at application security provider nVisium, warned that this scenario demonstrates how unwise it can be to copy & paste code.

It is always better to rewrite it yourself, he outlined Mon., & suggested enabling IDE or text editors to display Unicode.

Alternatively, if you are copying & pasting code, Gaines suggested opening up the code you copied & pasted within a hex editor to check it.

“Hopefully, patches will be promptly released for most compilers, but in the interim, this would be an effective short-term solution,” he suggested.

Homoglyph Attacks Worse

The Trojan Source attacks that rely on BiDi RLO can become even worse if an attacker switches to using homoglyphs, the researchers noted. An early example is a July 2020 campaign in which spammers tried to trick users into disclosing their PayPal passwords by switching the lowercase “l” in the brand name to the visually similar uppercase “I.”

“These domain attacks become even more severe with the introduction of Unicode, which has a much larger set of visually similar characters, or homoglyphs, than ASCII,” the researchers warned – making homoglyph attacks a favourite of spammers a la the “Paypai” scammers.

Recognised Danger

Homoglyphs being used in URLs is a recognised danger – one that Unicode has focused on in security reports such as this one.

“The fact that the Trojan Source vulnerability affects almost all computer languages makes it a rare opportunity for a system-wide & ecologically valid cross-platform & cross-vendor comparison of responses,” the researchers noted.

“As powerful supply-chain attacks can be launched easily using these techniques, it is essential for organisations that participate in a software supply chain to implement defences.”

Not Surprising

Matthew Green, an Associate Professor at the Johns Hopkins Information Security Institute in the US, told Krebs On Security that the possibility of exploiting Unicode isn’t surprising, but the fact that so many compilers “happily parse Unicode without any defences, & how effective their right-to-left encoding technique is at sneaking code into codebases,” does take him aback..

“That’s a really clever trick I didn’t even know was possible. Yikes,” he told security journalist Brian Krebs.

No Evidence

On the plus side, the researchers conducted a widespread vulnerability scan that did not turn up any evidence that the security weakness has been exploited so far.

More worryingly, there is no defences against Trojan Source, Green counselled, so we should all pray that compiler & code editor developers patch quickly.

 

https://www.cybernewsgroup.co.uk/virtual-conference-november-2021/

SHARE ARTICLE