tags: #publish
links: [[Unicode]], [[Security]]
created: 2021-11-02 Tue
---
# Trojan Source
https://trojansource.codes/
A security attack.
*Use unicode characters to fool the human into thinking the code is something different from what the compiler interprets.*
Examples: https://github.com/nickboucher/trojan-source
- **Commenting-out**: Use bidirectional characters to make the rendered source code look like line or range comments are *before* a critical line of source, when the toolchain interprets them as being *after*, rendering the comment inactive.
- **Stretched string**: Make a critical string-equality check seem to be checking one string literal, when in fact it is something else (reversed, or invisible non-visible characters)
- **Homoglyph function**: Functions appear to have the same name, but one secretly has additional characters in the name so you're not calling what you think. Handy for inserting seemingly-correct permissions checks calls which actually call a function that does nothing.
A possible solution is to take a strict position, have your compiler, language standard and tooling be intolerant and not allow any literal non-ASCII characters *at all*. There are ways to encode other characters in source code string literals, and the source code symbols layer is probably not the place for them.
However, this is rather an English-centric viewpoint, and doesn't allow for (say) in-source comments in other languages, and may make it difficult to work with output text in other languages.