# Web Content Discovery

If we want to find hidden directories or files, we can enumerate them manually/automatically.

### Manual Discovery <a href="#manual-discovery" id="manual-discovery"></a>

```shellscript
# Settings files
/robots.txt
/security.txt
/.well-known/security.txt
/.well-known/apple-app-site-association
/.well-known/assetlinks.json
/sitemap.xml
/sitemaps.xml

# JavaScript files
/main.js
/script.js
/js/jquery.min.js
/js/main.js
/js/script.js

# CGI scripts
/cgi-bin/example.cgi

# Wave dashes
/~files/
/~hidden/

# PHP files
/index.php
/config.php
/403.php
/404.php

# Python files
/main.py
/module.py
/module/__init__.py
/modules/__init__.py
__init__.py
config.ini
project.wsgi

# Archives
/example.zip
/backup.zip
/backups.zip

# Backup files
/example.bak
/example.jpg.bak
/images/example.jpg.bak

# Directories
/admin/
/blog/

# Sensitive information
/.env

# GitHub
/README.md
/.git
/.github
/.gitignore

# Apache Tomcat
/manager

# ASP.NET
/trace.axd
/example.asp
/example.aspx
/example.aspx/trace.axd
/web.config

# If you know the users manage the website, try the usernames
/admin
/administrator
/john
/michael

# API endpoints
/api/login
/api/signin
/api/user
/api/user/1
/api/users
/api/v1/
/api/v2/

# If we have the secret keyword found when investigating, we can attempt to access following contents.
/<keyword>
/<keyword>.html
/<keyword>.txt
/<keyword>.php
/<keyword>.py
/?<keyword>=test

# We might be able to access directories by using keywords we found.
/<site_title>
/<site_theme>
/<site_author>
/<image_theme>
/?<post_param>=test
```

### Wordlists <a href="#wordlists" id="wordlists"></a>

#### CeWL <a href="#cewl" id="cewl"></a>

[CeWL](https://github.com/digininja/CeWL) is a curstom wordlist generator from websites.

```
# -d: Depth (default: 2)
# -w: Write the output to the file
cewl -d 3 https://example.com/ -w output.txt
```

#### SecLists <a href="#seclists" id="seclists"></a>

[SecLists](https://github.com/danielmiessler/SecLists) is a collection of multiple types of lists.\
They are usually located in */usr/share/seclsits/* in Linux.

```
less /usr/share/seclists/Discovery/Web-Content/common.txt
```

### Automation <a href="#automation" id="automation"></a>

#### Ffuf <a href="#ffuf" id="ffuf"></a>

For bug bounty programs, set the **‘-t’** flag and the **‘-p’** flag to decrease requests per second.

```shellscript
# Avoid rate limiting
# -rate: Request per second
# -t: The number of threads
ffuf -u https://example.com/FUZZ -w wordlist.txt -rate 1 -t 1

# FUZZ Variations
ffuf -u https://example.com/FUZZ -w wordlist.txt 
ffuf -u https://example.com/.FUZZ -w wordlist.txt
ffuf -u https://example.com/FUZZ.txt -w wordlist.txt
ffuf -u https://example.com/FUZZ.php -w wordlist.txt
ffuf -u https://example.com/index.php?FUZZ=test -w wordlist.txt

# -X POST: Send POST requests
ffuf -u https://example.com/FUZZ -X POST -w wordlist.txt

# -t: Threads e.g. 5 threads
# -p: Pause N seconds per request e.g. 0.1 seconds
ffuf -u http://example.com/FFUF -w wordlist.txt -t 5 -p 0.1

# Custom header (-H)
ffuf -H "Cookie: key=value" -u https://example.com/FUZZ -w wordlist.txt 

# -mc: Match HTTP statuc code
ffuf -u http://example.com/FUZZ -w wordlist.txt -mc 200
# 422 status code
ffuf -u https://example.com/FUZZ -w wordlist.txt -mc 422
# -ms: Match HTTP response size
ffuf -u http://example.com/FUZZ -w wordlist.txt -ms 1234
ffuf -u http://example.com/FUZZ -w wordlist.txt -ms 50-300

# -fc: Filter HTTP statuc code
ffuf -u http://example.com/FUZZ -w wordlist.txt -fc 302
# -fs: Filter HTTP response size
ffuf -u http://example.com/FUZZ -w wordlist.txt -fs 1234
ffuf -u http://example.com/FUZZ -w wordlist.txt -fs 50-300

# File extensions
ffuf -u https://example.com/FUZZ -e .html,.txt,.js,.php,.py,.asp,.json -w wordlist.txt
```

For fuzzing with numbers, we can use the following commands.

```
for i in {0..255}; do echo $i; done | ffuf -u 'http://example.com/?id=FUZZ' -w -

seq 0 255 | ffuf -u 'http://example.com/?id=FUZZ' -w -
```

#### Dirsearch <a href="#dirsearch" id="dirsearch"></a>

[Dirsearch](https://github.com/maurosoria/dirsearch) is a web path scanner.\
For bug bounty programs, set the flag **“-t”** and **“—max-rate”** to decrease requests per second.

```shellscript
dirsearch -u https://example.com/

# -w: wordlist
dirsearch -u https://example.com/ -w wordlist.txt

# -t: number of threads
# --max-rate: max requests per second
dirsearch -u https://example.com/ -t 1 --max-rate=1

# -m: Method
dirsearch -m POST -u https://example.com/

# Extensions
dirsearch -u https://example.com -e html,txt,js,php,py,asp,json -w wordlist.txt
```

#### Gobuster <a href="#gobuster" id="gobuster"></a>

```
gobuster dir -u https://example.com -w wordlist.txt
```

#### Dirb <a href="#dirb" id="dirb"></a>

```shellscript
dirb https://example.com/
dirb https://example.com/ wordlist.txt

# Custom header (-H)
dirb https://example.com/ -H "Authorization: Basic {token}" wordlist.txt
# File Extensions (-X)
dirb https://example.com/ -X .txt
```

#### FeroxBuster <a href="#feroxbuster" id="feroxbuster"></a>

[**FeroxBuster**](https://github.com/epi052/feroxbuster) is a recursive content discovery.

```shellscript
feroxbuster -u https://vulnerable.com

# Specify extensions (-x)
feroxbuster -u https://vulnerable.com -x html,js,php
# No recursion (-n)
feroxbuster -u https://vulnerable.com -n
# Custom header (-H)
feroxbuster -u https://vulnerable.com -H "Authorization: Bearer {token}"
```

#### Hakrawler <a href="#hakrawler" id="hakrawler"></a>

[**Hakrawler**](https://github.com/hakluke/hakrawler) is a simple web crawler designed for quick discovery of endpoints and assets within a web application.

```
echo https://vulnerable.com | hakrawler
```

#### Wfuzz <a href="#wfuzz" id="wfuzz"></a>

```shellscript
# -w: wordlist (alias for -z file,wordlist)
wfuzz -w wordlist.txt https://example.com/FUZZ
# -z: payload
wfuzz -z file,wordlist.txt https://example.com/FUZZ
```

### Framework Detection from Favicon <a href="#framework-detection-from-favicon" id="framework-detection-from-favicon"></a>

Get the information of the used framework from favicon.

```
curl https://vulnerable.com/images/favicon.ico | md5sum
```

Then check what is the framework used in the website with the [OWASP Favicon Database](https://wiki.owasp.org/index.php/OWASP_favicon_database).

### Parsing .DS\_Store <a href="#parsing-ds_store" id="parsing-ds_store"></a>

[ds\_store\_exp](https://github.com/lijiejie/ds_store_exp) is a tool that parses .DS\_Store file and downloads files recursively.

```
pip3 install ds-store
python3 ds_store_exp.py https://example.com/.DS_Store
```
