Yesterday, I joined the 0x00sec IRC channel and, as many other times, @dtm come up with an interesting concept… and I had to try. The idea was pretty challenging and I have not completely come up with a full implementation but I manage to get a minimal Proof of Concept program to illustrate the concept and, maybe, to be used as a starting point for other people more interested in the topic than me.
The IBI Crypter
Sure, we were talking about crypters and I named mine IBI crypter. IBI stands for Instruction by Instruction Crypter, and that is the concept. Instead of decrypting the whole application at once or even a block, this crypter tries to decrypt just the instruction that has to be executed at any time. After running that instruction, the code is immediately crypted again so a live memory dump of the process will still be encrypted. This is why I’m referring to this concept as a JIT (Just in Time) crypter.
During our discussion we realised that the problem is not trivial, and I have to admit that it is more challenging that I initially expected. I’ll let you know about the issues at the end of this paper.
Before diving into the code, let me explain you how the crypter works.
An Embedded Debugger
The idea is pretty simple. We start with a program with some crypted sections. The program starts tracing itself and sets a break point at the first function to run. Once that function is executed, the program execution will stop and the embedded debugger will take control. Then it will decrypt the next instruction to run and, after that, it will execute that instruction.
Once the instruction is executed, the debugger will take control again of the process, it will crypt again the instruction just executed and repeat the process until the code is completely executed.
The concept is pretty simple but the implementation is not that trivial as we will see in a while.
To follow the rest of the paper, it may be useful for you to also read this other text:
https://0x00sec.org/t/linux-infecting-running-processes/1097
Breakpoints on Intel Processors
As we already know how to trace a program and access its memory and registers (yes, you should read the link I have just mentioned), we only need to know how to set a break point. Again the concept itself is pretty simple, at least for the traditional way of setting breakpoints.
Breakpoints make use of the processor instruction int 3 (opcode 0xcc). This is a 1 byte processor instruction and stops the current processor execution and runs some code defined by us. So, the process to set a break point is as follows:
- Store the content at the memory address where we want to set our break point.
- Write the
int 3instruction in that address. - Run the program
Whenever the program reaches the int 3 instruction, the program will stop at that point and our code will take control and it can do whatever it needs to do, in our case, decrypt the code to execute.
Once we are done, we have to do a couple more things to restore the execution of the original program:
- Copy back the original byte (the one we stored when we set the break point) in their original position.
- Decrease the
IP(Instruction Pointer). As you know, theIPalways points to the next instruction to be executed. In this case, this is the address of our breakpoint plus 1 (the size of theint 3instruction). We want to decrease it in order to run the original instruction (that we broke injecting theint 3opcode). - Give control back to the original process, or ask the process to just run the next instruction.
Overall the concept is pretty straightforward.
SingleStep Execution
Fortunately for us, the ptrace interface offer a function to execute a single instruction, Otherwise we would have to add some code to figure out the size of the current instruction (it can be 1 to 15 bytes), in order to know where to set up our next breakpoint.
The ptrace PTRACE_SINGLESTEP will do that for us. It will just run one instruction and give us control back. This is actually the last piece of the puzzle to build our simple proof of concept.
Let’s look into it.
The Proof of Concept
We have chosen to illustrate the technique a very basic application to check the validity of a user provided code. The function that does the check will be crypted (and only that function) and it will be executed using the awesome IBI Crypter :P.
Let’s start with the main program and our check function. We have call it target.c
#include <stdio.h>
#include "stub.h"
#define CRYPT_ME __attribute__((section(".secure")))
// This function is crypted
CRYPT_ME int
check_key (unsigned char *str)
{
int i;
unsigned char *p = str;
while (*p) {*p -= '0'; p++;};
if (str[0] + str[1] != 5) return 1;
if (str[2] * str[3] != 10) return 1;
return 0;
}
int
main (int argc, char*argv[])
{
_stub (check_key); // Setup run environment
printf ("Code is %s\n", check_key (argv[1]) ? "INCORRECT": "CORRECT");
}
As you can see, the check_key function in the code above, do a couple of stupid checks on the key it receives as parameter, and returns 0 if the key is valid or 1 otherwise. The main function is also pretty simple. It first runs our _stub and then just prints CORRECT or INCORRECT based on the result of the check_key function.
To off-line crypter I will be using is the same described in the paper:
https://0x00sec.org/t/a-simple-linux-crypter/537
We are pushing the functions to secure into a separated section for easy identification (see CRYPT_ME macro). Then we XOR it, as described in the paper I have just mentioned.
The interesting stuff is in the _stub function. Let’s look at it
The _stub Function
The _stub function has two main parts. First, we set up a breakpoint in the crypted function we want to run (check_key in this case). Once we get there, we will start stepping over the function instruction by instruction.
Let’s go with the breakpoint
int
_stub (void *ep)
{
void *bp_ip;
long ip1, op1, op2;
struct user_regs_struct regs;
int status, cnt;
printf ("%s", "0x00pf IbI Crypter Stub\n");
// Start debugging!!!
if ((_pid = fork ()) < 0) PERROR("fork:");
if (_pid == 0) return 0; // Child process just keeps running
else
{
// Father starts debugging child
if ((ptrace (PTRACE_ATTACH, _pid, NULL, NULL)) < 0) PERROR ("ptrace_attach:");
printf ("%s", "+ Waiting for process...\n");
wait (&status);
bp_ip = ep;
// Set breakpoint at get there...
op1 = ptrace (PTRACE_PEEKTEXT, _pid, bp_ip);
DPRINTF("BP: %p 1 Opcode: %lx\n", bp_ip, op1);
if (ptrace (PTRACE_POKETEXT, _pid, bp_ip,
(op1 & 0xFFFFFFFFFFFFFF00) | 0xcc) < 0) PERROR ("ptrace_poke:");
// Run until breakpoint is reached.
if (ptrace (PTRACE_CONT, _pid, 0, 0) < 0) PERROR("ptrace_cont:");
wait (&status);
ptrace (PTRACE_GETREGS, _pid, 0, ®s);
DPRINTF ("Breakpoint reached: RIP: %llx\n", regs.rip);
regs.rip--;
ptrace (PTRACE_SETREGS, _pid, 0, ®s);
// REstore opcode
ptrace (PTRACE_POKETEXT, _pid, bp_ip, op1);
Hope the code is easy to understand. It creates a new process and starts debugging it. Immediately we set the breakpoint (opcode 0xcc) in the address received as parameter, that in our case is the check_key function (check the main function above).
Once the break point is set, we just let the program run using the PTRACE_CONT and we just wait for the program to hit the break point… i.e. we wait until the function we want to decrypt gets executed.
The program will eventually call the check_key function (that is actually the next line in the main function) and the _stub code will take control back. Then we have to get the IP register, decreased in one byte (as explained above), set the register value back and also restore the opcode where the int3 was inserted.
Time to run the function.
Decrypting, Running, Encoding and again
At this point we have stopped the application just at the beginning of the check_key function and in order to continue the execution we have to decrypt it as we go.
This is the code that does the trick
// Start step by step debugging
ip1 = (long) ep;
cnt = 0;
while (WIFSTOPPED (status))
{
cnt ++;
// Read up to 16 bytes to get the longest instruction possible
// Decode and write back the decoded code to execute it
op1 = ptrace (PTRACE_PEEKTEXT, _pid, ip1);
op2 = ptrace (PTRACE_PEEKTEXT, _pid, ip1 + 8);
DPRINTF ("%lx :: OPCODES : %lx %lx\n", ip1, op1, op2);
XOR(op1);
XOR(op2);
DPRINTF ("%lx :: DOPCODES: %lx %lx\n", ip1, op1, op2);
ptrace (PTRACE_POKETEXT, _pid, ip1, op1);
ptrace (PTRACE_POKETEXT, _pid, ip1 + 8, op2);
/* Make the child execute another instruction */
if (ptrace(PTRACE_SINGLESTEP, _pid, 0, 0) < 0) PERROR ("ptrace_singlestep:");
wait(&status);
// Re-encode the instruction just executed so we do not have
// to count how many bytes got executed
XOR(op1);
XOR(op2);
ptrace (PTRACE_POKETEXT, _pid, ip1, op1);
ptrace (PTRACE_POKETEXT, _pid, ip1 + 8, op2);
// Get the new IP
ptrace (PTRACE_GETREGS, _pid, 0, ®s);
ip1 = regs.rip;
// If code is outside .secure section we stop debugging
if ((void*)ip1 < secure_ptr || (void*)ip1 > secure_ptr + secure_len)
{
printf ("Leaving .secure section... %d instructions executed\n", cnt);
break;
}
}
ptrace (PTRACE_CONT, _pid, 0, 0);
wait (&status);
}
printf ("DONE\n");
exit (1);
}
The function is a bit verbose but conceptually very simple. The XOR macro just applies the XOR encoding to a long (8 bytes) with a predefined key. You can check the details in the full source code (check at the end). At this point, it is not relevant.
If we check the format of the Intel opcodes, you will find out that a instruction, for a 64bits architecture may take up to 15 bytes. As we do not know (and we do not really want to know) anything about the next instruction to run, or in other words, we do not want to decode the opcodes ourselves in the code, then we have to decrypt up to 16 bytes to cover the longest possible opcode. Yes, in general, at a given time there are more than 1 single instruction decoded in memoru.
This is why we do two PTRACE_PEEKs to read the current address and that address plus 8 bytes (longs are 8 bytes long). Once we have read the 16 bytes that contains the next instruction to run, we just decrypt it applying our XOR macro, and we update the memory using PTRACE_POKE so, the next program instruction is now correct.
At this point we can just run the next instruction using PTRACE_SINGLESTEP and wait until the instruction is executed and control gets back to us.
Then we just need to encode again the 16 bytes of memory we decoded. This is not just to keep the program encrypted most of the time, but also to avoid some tricky logic to keep at least 16 bytes decoded in the memory program.
The final check in the while loop checks if the current IP is still in the .secure section of it has moved into other executable section… more on this at the end of the text.
Testing
Testing the program is not that straightforward. For the time being we have to do some manual tasks to make it work. It is not hard to fully automatise the process so I leave it as an exercise to you ![]()
First we have to compile the target program:
$ gcc -o target target.c stub.c
Then we have to crypt the .secure section, using the crypter_rt tool (provided with the code).
$ ./crypter_rt target
Finally we need to manually fix the section information in stub.c and redo the process. To get the information stub needs about the section just run this command:
$ readelf -S target | grep secure
[14] .secure PROGBITS 0000000000400e22 00000e22
The two last number in this line has to go to the variables secure_ptr and secure_len in stub.c. This information is used to figure out when the execution leaves the secure section and matches not crypted code.
Recompile and rebuild and you should be able to run the program like this:
$ ./target 1425
0x00pf B3Crypt Stub (Byte By Byte)
+ Waiting for process...
Leaving .secure section... 74 instructions executed
Code is CORRECT
DONE
./target 1426
0x00pf B3Crypt Stub (Byte By Byte)
+ Waiting for process...
Leaving .secure section... 75 instructions executed
Code is INCORRECT
DONE
The Tricky part
This example is very simple on purpose. Making a usable version will require some effort and I do not really have a need for this tool so I do not think I will go further on this implementation. However these are a couple of things to do, in order to extend this PoC into a usable tool
- The first thing to do is to extend the
stubto access the ELF header and get the information associated to the.securesection, so you do not need to update the source code to re-compile - Second is more tricky. You have to detect jumps/calls to functions outside the
.securesection, as for instance the standard C library (have you seen a singleprintfin thecheck_keyfunction ?). In those cases we have to set a break point just after the call in order to run the function without decoding it (those functions are not encoded) and to restart the decoding when the function returns… The last check in the function may give you some hints on how to proceed. - I haven’t extensively tested the program. I just made it work on my machine, it is not optimised and it may have some timing issues.
- This only works on 64bits platforms… should be easy to make it work on 32bits… but in any case it is x86 specific. For ARM or MIPS you need to figure out how to set breakpoints.
Well, this is it. I think this proves the concept is feasible and it is up to you to make it work. I see this more as a SW protection mechanism than as a malware development technique… WTF… they are the same thing ![]()
As usual you can get the complete source code from my github repo
Any comment is welcomed