Module `SICAR.drivers.tesseract`

Tesseract OCR Driver Module.

This module provides an implementation of the Captcha driver using Tesseract OCR. The Tesseract driver utilizes Tesseract OCR to extract text from captcha images.

Note

This driver requires the pytesseract library and Tesseract OCR to be installed.

Classes

Tesseract: Implementation of the Captcha driver using Tesseract OCR.

Classes

class Tesseract

Implementation of the Captcha driver using Tesseract OCR.

This driver utilizes Tesseract OCR to extract text from captcha images.

Note

This driver requires the pytesseract library and Tesseract OCR to be installed.

Ancestors

Captcha
abc.ABC

Methods

def get_captcha(self, captcha: ) ‑> str

Extract text from the provided captcha image.

Parameters

captcha (Image): The captcha image.

Returns

str: The extracted text from the captcha.

Note

This method processes the captcha image, improves its quality, and uses pytesseract's image_to_string function to perform optical character recognition. The extracted text is then cleaned using regular expressions to remove non-alphanumeric characters.