Module SICAR.drivers.tesseract
Tesseract OCR Driver Module.
This module provides an implementation of the Captcha driver using Tesseract OCR. The Tesseract driver utilizes Tesseract OCR to extract text from captcha images.
Note
This driver requires the pytesseract library and Tesseract OCR to be installed.
Classes
Tesseract: Implementation of the Captcha driver using Tesseract OCR.
Classes
class Tesseract
-
Implementation of the Captcha driver using Tesseract OCR.
This driver utilizes Tesseract OCR to extract text from captcha images.
Note
This driver requires the pytesseract library and Tesseract OCR to be installed.
Ancestors
- Captcha
- abc.ABC
Methods
def get_captcha(self, captcha:
) ‑> str -
Extract text from the provided captcha image.
Parameters
captcha (Image): The captcha image.
Returns
str
- The extracted text from the captcha.
Note
This method processes the captcha image, improves its quality, and uses pytesseract's image_to_string function to perform optical character recognition. The extracted text is then cleaned using regular expressions to remove non-alphanumeric characters.