Introduction to code review
This exercise covers the different ways to perform code review. It also contains a simple application to review to help you get started.
The application we are going to work on is a simple PHP application that allows users to upload and download files. Think of it as an over-simplified Dropbox.
Why bother doing a code review? There are multiple reasons:
- It can be faster than penetration testing. Some issues are really easy to spot during a code review (for example, a weak encryption algorithm), where others can take a lot more time (an XSS for example).
- Compliance may require you to perform a security code review (for example, as part of PCI DSS 6.3.2).
- After doing penetration testing for a while; you want to do something different.
- You want to find better bugs. Some bugs you will find during a code review can be surprisingly hard to discover with black-box testing.
- You want to check if some code is backdoored (which is actually really hard to do).
- You want to write an exploit for a bug.
There are plenty of ways to perform a code review. You can find some methods below:
- String matching/Grep for bugs.
- Following user inputs.
- Reading source code randomly.
- Read all the code.
- Check one functionality at a time (login, password reset...).
This is probably the fastest way to find low-hanging fruits; you just try to find patterns of known vulnerabilities. For example, you can use grep
to find calls to the PHP system
function:
$ grep -R 'system($_' *
You can find a list of regular expressions to try on your code base in the GRaudit project.
This approach suffers from many limitations:
- You don't get much coverage/assurance on the quality (and therefore security) of the source code. You only know that, based on your list of patterns, you couldn't find any issue.
- You need to know all the dangerous functions/patterns.
- You may end up using very complex regular expressions.
Another way to approach a review is to follow all the user-controlled inputs and identify all the ways to access the application (the routes/URIs available), for example in PHP:
$_POST
/$_GET
/$_REQUEST
.$_COOKIE
/$_SERVER
.- Data coming from the database (for stored XSSs and second-order injections for example).
- Data read from a file or a cache.
- ...
This method provides good coverage. However, you will need a good understanding of the framework/language used. Finally, you may end up reviewing the same function again and again if it's called multiple times.
The more time-consuming way: just start reading the code one file at the time. A better way to do this is to try to find weaknesses, not vulnerabilities. From there, you can see if the weaknesses can become vulnerabilities on their own or by chaining them.
This method is obviously the most laborious way of working, but it brings excellent coverage. It's crucial to keep good notes when using this approach.
Another common way to do a review is to pick one functionality (at a time), for example:
- "Password reset".
- "Database access".
- "Authentication".
And review all the code associated with this functionality. This works especially well if you do this across multiple applications/frameworks, as they will all have different behaviors.
This approach gives you an excellent coverage of the functionality reviewed and will teach you what mistakes people typically make for a given functionality. However, you only have coverage of what you reviewed.
When doing a review, you need to look for everything:
- Weird behaviors.
- Missing checks.
- Complexity.
- Security checks already in place.
- Differences between 2 functions/methods/classes.
- Comparisons and conditions (if/else).
- Regular expressions/string matching.
- What is missing?
You will probably encounter functions, classes and methods you aren't currently aware of. To solve this issue, you need to:
- Google them.
- Test their behavior.
It's going to take time (especially early on), but the more code you review, the easier it gets.
Make sure you create a snippet with the function/class/method to test its behavior. This will make future reviews far simpler. To test it, you will need to run the snippet locally and try to find some edge cases that the developers may have missed.
For this exercise, you have a really simple web application. We are going to go with the "read everything" approach as there isn't that much source code to read.
To get started, don't focus on finding vulnerabilities, try to find weaknesses, some thoughts to keep in mind are:
- "What doesn't seem right".
- "What happens if I put [X] here".
Then, try to see if these weaknesses can become vulnerabilities. Either on their own or by chaining them.
The application is straightforward and has a dozen security issues. As a user, you can only:
- Register/Login/Logout.
- Upload and retrieve files.
To get started, you can get the code:
- Using git:
git clone https://github.com/PentesterLab/cr
- As a Zip file: https://github.com/PentesterLab/cr/archive/master.zip.
You can use the tool cloc to get a better idea of the size of the application:
% cloc .
14 text files.
13 unique files.
2 files ignored.
github.com/AlDanial/cloc v 1.72 T=0.11 s (120.6 files/s, 46503.2 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
CSS 2 676 11 3973
PHP 10 48 4 289
SQL 1 5 0 5
-------------------------------------------------------------------------------
SUM: 13 729 15 4267
-------------------------------------------------------------------------------
You can find below the list of issues present in the application:
- Hardcoded credentials or secrets.
- Information leak.
- Missing security flags.
- Weak password hashing mechanism.
- Cross-Site Scripting.
- No CSRF protection.
- A directory listing.
- A cryptographic issue.
- A signature bypass.
- An authentication bypass.
- An authorization bypass.
- A remote Code Execution.
This section is intentionally left blank.
It's pretty easy to spot the hardcoded credentials or secrets if you read all the code. A modern web application should store all secrets outside of its source code. Here we can see that the following files contain some sensitive information:
classes/db.php
:
$lnk = mysql_connect("127.0.0.1", "pentesterlab", "pentesterlab");
classes/jwt.php
:
return hash("sha256","donth4ckmebr0".$data);
During the deployment of the application, secrets and credentials should be loaded into the application (via environment variables or a configuration file). This will limit access to production secrets/credentials for developers.
To find this issue, you need to think about the way the application will get deployed. The web root of the application is the main directory of the source code. Therefore, the following files and directories will be publicly available:
.git
: this can be used to rebuild the full source code of the application.deploy.sql
: this contains the database password in cleartext and the hashedadmin
password.
The application uses setcookie
to set cookies for authentication. However, it doesn't set the secure
and HttpOnly
flags in the following files:
login.php
:
setcookie("auth", User::createcookie($_POST['username'], $_POST['password']));
register.php
:
setcookie("auth", User::createcookie($_POST['username'], $_POST['password']));
If you look at the register function in classes/user.php
:
public static function register($user, $password) {
$sql = "INSERT INTO users (login,password) values (\"";
$sql.= mysql_real_escape_string($user);
$sql.= "\", md5(\"";
$sql.= mysql_real_escape_string($password);
$sql.= "\"))";
$result = mysql_query($sql);
You can see that the application uses md5
to hash passwords and the hashing is done without any seed
. Furthermore, the hashing of the passwords is done by the database (as opposed to the application). This means that the customers' passwords are likely to end up in the database logs or transmitted in cleartext between the database and the application. Instead, the application should use scrypt
, bcrypt
or PBKDF2
to store passwords.
The application uses the h
function to escape data coming from the users. This function is just an alias for htmlentities
defined in classes/phpfix.php
:
function h() {
return call_user_func_array("htmlentities", func_get_args());
}
We can see that h
is not used in the following places:
index.php
:
<span class="text text-danger"><b><?php echo $error; ?></b></span>
login.php
:
<span class="text text-danger"><b><?php echo $error; ?></b></span>
register.php
:
<span class="text text-danger"><b><?php echo $error; ?></b></span>
However, $error
is not under the user's control. Therefore, those are only weaknesses, not vulnerabilities. They should still get fixed to future-proof the application. The exploitable XSS is located in index.php
:
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" enctype="multipart/form-data">
As $_SERVER['PHP_SELF']
is user-controlled, it can be used to trigger an XSS (using a path like /index.php/">script...
).
There is also another XSS in classes/user.php
that can be triggered by registering the same username multiple times (due to the call to mysql_error
).
We can also see that none of the forms are protected by a CSRF token. This is an example of the "what is missing ?" bugs, you cannot see them because they are not there.
When users retrieve the list of their files, the following code is called (in classes/user.php
):
public static function getfiles($user) {
$base = "files/".$user;
if (!file_exists($base)) {
mkdir($base);
}
return array_diff(scandir($base), array('..', '.'));
}
This function gets the list of files (scandir
) in files/[USERNAME]
, and removes the parent (..
) and current (.
) directory from the list using array_diff
.
Nothing in this code (or during registration) prevents an attacker from using the following usernames: ..
, ../..
, ../../..
. This will allow the attacker to retrieve a directory listing of any directory on the server hosting the application.
The cryptographic issue is located in classes/jwt.php
in the signing function:
public static function signature($data) {
return hash("sha256","donth4ckmebr0".$data);
}
Here we can see that the application is using a hash
instead of an HMAC
. This creates multiple issues:
- It's likely to be vulnerable to Length Extension.
- It's not the RFC way to sign a
JWT
token. This implementation will not work with other applications.
The signature bypass is located in classes/jwt.php
:
public static function verify($auth) {
list($h64,$d64,$sign) = explode(".",$auth);
if (!empty($sign) and (JWT::signature($h64.".".$d64) != $sign)) {
die("Invalid Signature");
}
We can see that the application gets three elements from the $auth
variable:
$h64
.$d64
.$sign
.
Then it checks the signature $sign
using: JWT::signature($h64.".".$d64) != $sign
. However, it only checks the signature if one is provided: if (!empty($sign) and...
.
An attacker can provide a malicious $auth
variable that doesn't contain a signature. This will allow them to bypass the signature mechanism.
The authentication bypass is based on a real vulnerability in the Play framework covered in another PentesterLab exercise: Play Session Injection.
Due to the weird JSON
parser:
public static function parse_json($str) {
$data = explode(",",rtrim(ltrim($str, '{'), '}'));
$ret = array();
foreach($data as $entry) {
list($key, $value) = explode(":",$entry);
$key = rtrim(ltrim($key, '"'), '"');
$value = rtrim(ltrim($value, '"'), '"');
$ret[$key] = $value;
}
return $ret;
}
and the way the JSON
is created:
$header = str_replace("=","",base64_encode('{"alg":"HS256","iat":'.time().'}'));
$token = "{";
foreach($data as $key=>$value) {
$token.= '"'.$key.'":"'.$value.'",';
}
$token .= "}";
It's possible to create a malicious user named test","username":"admin
to become the admin
in the application.
The authorization bypass is very simple. When a file is uploaded, it gets saved in the directory: /files/[USERNAME]/[FILENAME]
:
public static function addfile($user) {
$file = "files/".$user."/".basename($_FILES["file"]["name"]);
if (!preg_match("/\.pdf/", $file)) {
return "Only PDF are allowed";
} elseif (!move_uploaded_file($_FILES["file"]["tmp_name"], $file)) {
return "Sorry, there was an error uploading your file.";
}
It's trivial for an attacker to guess [USERNAME]
and [FILENAME]
to gain access to other users' files. The developer should always implement some form of access control, or make the links impossible to guess.
The RCE is located in the addfile
function:
public static function addfile($user) {
$file = "files/".$user."/".basename($_FILES["file"]["name"]);
if (!preg_match("/\.pdf/", $file)) {
return "Only PDF are allowed";
} elseif (!move_uploaded_file($_FILES["file"]["tmp_name"], $file)) {
return "Sorry, there was an error uploading your file.";
}
We already know that the uploaded files are stored in the web root. We can also see that the developer restricts the uploadable file type to PDF
(.pdf
). However, the regular expression is missing a $
to match the end of the string. As a result, it's possible for an attacker to upload a file named exec.pdf.php
. As the file's extension is .php
and the file is in the web root, it will get executed when accessed by the attacker.
This exercise showed you how to perform your first code review to find potential security issues.
I hope the course provides you with enough information and the methodology to get started. As always with security, practice makes perfect. Try to regularly review some code, this could be from:
- Changes/patches introduced to fix a CVE.
- Applications you use every day.
- Applications in a weird language.
Now it's time to apply what you've just learned!
I hope you enjoyed learning with PentesterLab.