# Captcha Bypass with OCR

If the website has Image Captcha for preventing brute force in login page, we might be able to recognize programmatically using OCR. This article is that I learned while solving the TryHackMe’s Capture Returns room.

### Shape Detection <a href="#shape-detection" id="shape-detection"></a>

Awesome Script: <https://github.com/Kript0r3x/caturereturns/blob/main/captcha-solver.py>

Assume that a captcha image is embedded in the `src` attribute of the `img` tag in HTML response. For example, `<img src="data:image/png;base64,iVBORw0KGg..." />` . We detect a shape of the image using Python script as below:

```shellscript
import base64
from bs4 import BeautifulSoup
from io import BytesIO
import numpy as np
import requests

url = "https://example.com/login"

# 1. Send request to get HTML response.
resp = requests.get(url)

# 2. Parse HTML to extract an img element to be detected the shape.
soup = BeautifulSoup(resp.text, 'html.parser')
img_tag = soup.find('img')
img_src = img_tag.get('src')

# 3. Decode Base64 and retrieve an image data.
_, base64_data = img_src.split(',')
img_data = base64.b64decode(base64_data)
image = np.array(Image.open(BytesIO(img_data)))

# 4. Detect a shape of the image
shape = ""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 1.5)
thresh = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
# 4-a. Detect circle
circle = cv2.HoughCircles(blurred, cv2.HOUGH_GRADIENT, 1, 20, param1=50, param2=30, minRadius=0, maxRadius=0)
if circle is not None:
  shape = "circle"
else:
# 4-b. Detect other shapes
    contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    for cnt in contours:
    perimeter = cv2.arcLength(cnt, True)
    approx = cv2.approxPolyDP(cnt, 0.04 * perimeter, True)
    if len(approx) == 3:
      shape = "triangle"
      break
    elif len(approx) == 4:
      x, y, w, h = cv2.boundingRect(approx)
      aspect_ratio = float(w) / h
      if 0.95 <= aspect_ratio <= 1.05:
          shape = "square"
        break

# 5. Use this 'shape' value for resolving captcha...
```

### Math Equation Solving <a href="#math-equation-solving" id="math-equation-solving"></a>

To solve mathematical equation that is contained in an image, we need to recognize text and evaluate it with `eval` function in Python script.

For doing that, we use `pytesseract` Python library. We need to install **Tesseract** in our machine:

```
# For Linux
sudo apt install tesseract-ocr
```

Then write Python script to solve that.

```shellscript
import base64
from bs4 import BeautifulSoup
import numpy as np
import pytesseract
import requests

url = "https://example.com/login"

# 1. Send request to get HTML response.
resp = requests.get(url)

# 2. Parse HTML to extract an img element to be solved as mathematical equation.
soup = BeautifulSoup(resp.text, 'html.parser')
img_tag = soup.find('img')
img_src = img_tag.get('src')

# 3. Decode Base64 and retrieve an image data.
_, base64_data = img_src.split(',')
img_data = base64.b64decode(base64_data)
image = np.array(Image.open(BytesIO(img_data)))

# 4. Extract the math equation and solve it.
text = pytesseract.image_to_string(image, config="--psm 6")
text = re.sub(r'[^0-9+\-*/(). ]', '', text).strip()
result = eval(text)

# 5. Use the 'result' value for resolving captcha...
```

### References <a href="#references" id="references"></a>

* [TryHackMe](https://tryhackme.com/r/room/capturereturns)
* [Kript0r3x](https://github.com/Kript0r3x/caturereturns/blob/main/captcha-solver.py)
