/i considered harmful

Published: 26 Mar 2020

After reading this blog post on a bug in Github and Unicode, I started playing more and more with Unicode (even bought two domains).

Recently, I had a Eureka moment while camping and started wondering: “what was the impact of those uppercase and lowercase transformations on regular expression?”

And the response is straightforward: it depends!

First, let’s say your website wants to ensure that an URL provided is part of a list of trusted URLs (to avoid SSRF or as part of a CORS policy). Your website can use a list of predefined URLs, but this quickly gets tedious. So after a while, you decide to move to a regular expression. You check that the host in the URL ends with your domain. Your code looks something like this:

host =~ /domain.tld$/

For whatever reasons, you decide to add the i or re.IGNORE_CASE flag to make sure both domain.tld and DOMAIN.TLD will work (and even DoMaIn.Tld). Your regular expression ends up looking like:

host =~ /domain.tld$/i

This could also be used if you want to ensure an email address is part of your domain.

email =~ /domain.tld$/i

Now, a malicious user bought the domain domaın.tld what happens?

The domain domaın.tld contains a LATIN SMALL LETTER DOTLESS I (U+0131) in place of the i.

The answer depends on the programming language used (and the version).

In Python 3.8.1, domaın.tld will match 'domain.tld$', re.IGNORECASE. ſ will match s and K (Kelvin sign) will match k

In Ruby 2.7.0, domaın.tld will NOT match /domain.tld$/i. However, ſ will match s and K (Kelvin sign) will match k.

In Golang 1.13.8, domaın.tld will NOT match '(?i)domain.tld$'. However, ſ will match s and K (Kelvin sign) will match k.

In node 13.8.0, domaın.tld will NOT match /domain.tld$/i, ſ will not match s and K (Kelvin sign) will not match k.

My advice, try to avoid using the i or IGNORECASE if you can for developers and make sure you test for this for pentesters and bounty hunters!

Written by Louis Nyffenegger

Founder and CEO @PentesterLab

/i considered harmful

Published: 26 Mar 2020

And the response is straightforward: it depends!

Now, a malicious user bought the domain domaın.tld what happens?

Join the PentesterLab's Newsletter