Encoding Is Not Magic

Published: 01 Dec 2024

When talking with aspiring hackers, bug bounty hunters, or application security engineers, it often feels that there’s some misunderstanding around encoding. Many seem to view it as a "magic wand" that can bypass filters and open the gates to exploitation—some kind of magical hacking fairy dust you sprinkle on your input to make the "hacking" work.

Let’s be clear: Encoding is not magic.

Encoding transforms data into a different representation, often for safe transport or storage. But for encoded data to be meaningful, something in the application stack must decode it. If you double-encode something, you need something to decode it twice—or two components to decode it once each. Without decoding, encoding alone achieves nothing.

Encoding Requires Decoding

Your application or its underlying systems won’t magically interpret encoded data. Let’s take an example of path traversal: attempting to access sensitive files by encoding ../ into %2e%2e%2f (URL encoding).

What happens without decoding?

You can try in your favorite language. If a value is kept encoded, it simply won't work:

  • In PHP:
  • <?php
    echo file_get_contents('%2e%2e%2f%2e%2e%2fetc%2fpasswd');
    ?>

    Output:

    Warning: file_get_contents(%2e%2e%2f%2e%2e%2fetc%2fpasswd): Failed to open stream: No such file or directory
  • In Ruby:
  • irb(main):001:0> File.read('%2e%2e%2f%2e%2e%2fetc%2fpasswd')

    Output:

    Errno::ENOENT (No such file or directory @ rb_sysopen - %2e%2e%2f%2e%2e%2fetc%2fpasswd)
  • In Python:
  • f = open("%2e%2e%2f%2e%2e%2fetc%2fpasswd", "r")

    Output:

    FileNotFoundError: [Errno 2] No such file or directory: '%2e%2e%2f%2e%2e%2fetc%2fpasswd'

Without something decoding the value, it doesn't work.

Decoding in Action

Now, if something in the application stack is doing the decoding, we can see that it works:

require 'cgi'
filename = '%2e%2e%2f%2e%2e%2fetc%2fpasswd'

filename = CGI.unescape(filename) # <- This decodes '%2e%2e%2f' into '../'

File.read(filename) # Now it works, if the system permits the access

The same applies to other types of encoding and decoding. Unicode bypasses work because something in the application stack is normalizing the value. %bf%27 can become a single quote because it gets decoded and the charset isn't properly set. No magic—just code.

Final Thoughts

Encoding isn’t a magic trick. It works because something, somewhere is doing the opposite operation (decoding), and the filtering doesn't handle this properly. Understanding how applications decode data is key to both exploiting vulnerabilities and securing systems.

Photo of Louis Nyffenegger
Written by Louis Nyffenegger
Founder and CEO @PentesterLab