Blog of technology, infosec and other miscellaneous things.
Posted on 06 March 2012.
It seems that many penetration testers do not do much to test cryptographic vulnerabilities. I've always been interested in cryptography, so I've made it a goal of mine to understand how web application developers misuse crypto, and how to exploit those flaws.
In January I had some time at work to do some independent research, so I decided to figure out how to perform hash length extension attacks against poorly implemented message authentication codes (MACs). I found several good research papers and blog posts discussing how these attacks work in a very general sense, but not much that explained what was going on under the hood. In this post, I'll be explaining what is happening in a length extension attack.
Message authentication codes (MACs) are a way to verify the authenticity of a message. In the more naive implementation of a MAC, the server has a secret key that it concatenates with a message, and then hashes the combination with an algorithm such as MD5 or SHA1. Take for example, an application that is designed to give a user the ability to download specific files, so long as the user has authorization to do so. The site may create a MAC for the filename like this:
def create_mac(key, fileName)
return Digest::SHA1.hexdigest(key + fileName)
end
The resulting URL might look something like:
http://example.com/download?file=report.pdf&mac=563162c9c71a17367d44c165b84b85ab59d036f9
When a user sends the request to download a file, the following function is executed:
def verify_mac(key, fileName, userMac)
validMac = create_mac(key, filename)
if (validMac == userMac) do
initiateDownload()
else
displayError()
end
end
With this code, the server should only call initiateDownload if the user has not tampered with the filename... or so the theory goes. In reality, this method of creating a MAC leaves the site vulnerable to an attack where an attacker can append their own content to the end of the file parameter.
Cryptographic hash functions such as MD5, SHA1, SHA2 etc are based upon a construct known as Merkle–Damgård. An interesting issue with this type of hash function is that if you have a message that was concatenated with some secret and the resulting hash of the concatenated value (the MAC), and you know the length of the secret, you can add your own data to the message and calculate a value that will pass the MAC check, without knowing the secret.
Example:
message + padding + extension
Continuing the example from above, an extension attack against the hypothetical file download MAC would look like this:
http://example.com/download?file=report.pdf%80%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%A8/../../../../../../../etc/passwd&mac=ee40aa8ec0cfafb7e2ec4de20943b673968857a5
To understand why this attack works, you must first understand some of what happens inside a hash function.
Hash functions work on blocks of data. 512 bits is the block length for both MD5, SHA1 and SHA256. Most messages that are hashed don't have a length that is evenly divisible by a hash function block length, so the message must be padded to match a multiple of the block length. So with the file download MAC example, the message after padding would look like this (the 'x's represent the secret key):
xxxxxxxxxxxreport.pdf\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xA8
In SHA1, the algorithm being used in this example, a hash consists of a series of five integers. When displayed, these integers are usually in hexadecimal format and concatenated together. The initial value, aka the registers, are set to this value when the algorithm is ran: 67452301, EFCDAB89, 98BADCFE, 10325476, C3D2E1F0. Then once the message has been padded, it is broken up into 512 bit blocks. The algorithm goes through these blocks and does a series of calculations with the blocks that update the registers. Once these calculations are completed, the contents of the registers are the resulting hash of the message.
The first step in calculating an extension is creating a new MAC. To do this, we must hash the contents of what we are extending the hash with, '/../../../../../../../etc/passwd' in our example. However, when we perform this hash, we must override the initial registers with the MAC from the original message (SHA1(key+message)). You can think of this as making our SHA1 function start off at the state where the server's hash function left off.
Attacker's MAC = SHA1(extension + padding) <- but with overridden registers
For this attack to work, the extension must be in it's own block when it goes into the server's hash function. The second step is to calculate enough padding such that key + message + padding == some multiple of 512 bits. In this example, the key is 11 characters long. Therefore, the padded message would look like this:
report.pdf\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xA8
We then send our padded and extended message to the server, with our new MAC:
http://example.com/download?file=report.pdf%80%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%A8/../../../../../../../etc/passwd&mac=ee40aa8ec0cfafb7e2ec4de20943b673968857a5
What the server hashes when it hashes the attacker's hacked message is this: secret + message + padding to the next block + extension + padding to the end of the block. The result of the server's hash will be ee40aa8ec0cfafb7e2ec4de20943b673968857a5, which is what we got when we hashed our extension while overriding the registers with the original MAC, because we essentially started off with our hash where the server is at when it is half way through hashing out attack.
For simplicity, in this example I revealed that the key length was 11 characters. In a real world attack, an attacker won't know the length of the key beforehand. Therefore, the attacker needs a way to determine the key length.
Continuing the example, lets say that our vulnerable website returns different errors (HTTP response codes, error messages in a response body, etc) when a MAC validation fails vs when validation succeeded but the file was not found. An attacker can then calculate multiple extensions, one for each possible key length, and send each to the server. When the server responds with an error indicating the file was not found, then a length extension vulnerability has been found, and the attacker is free to calculate new extensions aimed at gaining unauthorized access to sensitive files on the server.
The solution to this vulnerability is to use an algorithm known as HMAC. Instead of just hashing the key concatenated with the message, HMAC does something like this:
MAC = hash(key + hash(key + message))
How HMAC actually works is a bit more complicated, but that is the general idea. The important part is that since the key is hashed into the message twice, it is not vulnerable to the extension attack described in this post. HMAC was first published in 1996, and has since been implemented in just about every programming language's standard library.
Though there are still some crazy people out there that write their own cryptographic algorithms, people have gradually gotten the idea that writing your own crypto is a bad idea. However, it is not enough to use publicly vetted crypto algorithms: you've got to use those algorithms in the right way. Unless you really know how the algorithms you are using work and how to use them correctly, it is better to rely on professionally vetted high level libraries that take care of the low level stuff for you.
Posted on 26 February 2012.
As I've not updated this site in well over a year, and do not feel particularly attached to any of my old content, I recently decided to delete the old site and start from scratch. Also, I've gotten tired of always updating my WordPress installation. This, along with the fact that I'm sufficiently paranoid that I worry about having a database and server-side code running on my VPS, seemed like the perfect excuse to play with some new tools that I've been meaning to try. In this post I'll outline the tools I used to create crashsystems.net 2.0.
The best way to describe Jekyll is that it is like having a WordPress theme without the the rest of WordPress, and without having to bother with writing PHP. Templates are created with the Liquid templating language, and content is written with Markdown, a simple system for expressing content formatting in plaintext. Jekyll converts the Markdown content to HTML, runs the Liquid templates to assemble the site, and then outputs static HTML.
This setup has several advantages, the first being security. Since I'm not running a database server or any server-side scripting language on this site, there is much less for an attacker to target. Secondly, not running database queries means my site runs much faster. Theoretically, my site performance is only limited by the web server I'm running (more on that in a bit), disk I/O and connection throughput. Thirdly, my content is plain text stored in files, so I can edit it in any program I want, and I don't have to worry about database backups.
Comments don't work with static files, so I'm outsourcing comments to Disqus. Also, for site search I'm using a bit of JavaScript to redirect users to a Google search using the site: parameter.
Foundation is a CSS framework for creating responsive website layouts, meaning that one site template can automatically adjust to varying device form factors. Basically, I created this layout for desktops, and got a mobile site automatically, with no extra effort. In addition to simplifying the development of layouts, Foundation also abstracts many of the tedious aspects of using CSS. I'm far from being a graphics designer, but with Foundation I was able to quickly construct a decent site design.
Apache is a very versatile web server, but it also seems somewhat slow and resource intensive. After I did some research, I found that the Nginx web server seems to be one of the fastest servers, especially for serving static content. The Nginx installation process was quite simple, and this guide was quite useful for performance tuning the server.
The source code for my site can be found on GitHub. Now that the site is set up, I plan on posting somewhat more frequently, about technology, information security, crypto and perhaps a bit of photography.
Previous Page: 1 of 1 Next