DecodeCaptcha

Decode captcha by python.

Ref [ http://slid.es/jingchaohu/decoding-weibo-captcha-in-python]

[http://www.boyter.org/decoding-captchas/]

Introduction

In this post, we will investigate the methods of decode captcha. Currently, the texts in the captcha is not overrided with each other, just for simplicity.

Generally Speaking, there are several steps to do that:

Remove Noise
Seperate Characters
Extract Characters
Classifying Features, data training
Predicting charactoer

Remove Noise

Here the Image module is used in this post, and there are two ways to remove noise from captcha.

convert to L Pattern

convert the Image into black-white mode,

im = im.point(lambda x:255 if x>128 or x==0 else x)
im = im.point(lambda x:0 if x<255 else 255)

extract characters according to color

extract characters from Image according to the color of them， first of all, the color of text should be known previously. should be hard code here.

Image filters which blur the image horizontally and look for darker areas (because the blur causes the text to be highlighted).

Edge detection filters which just leave the outline of text, pattern detection, colour extraction

Seperate Characters

Base on the previous process, then the range of one character can be detected. Since there are only two colors (black and white) in the captcha after noise removing, the boundary of one character can be recognize by counting the number of black dots. After that, we get the coordinates of boundary.

Extract Characters

chop the characters from captcha through coordinates of character boundarys.

Image.chop(coordianttopleft, coordinaterightbottom)

Classify Features, data training

There are many algorithm to do data training.

Netual network
Vector Space module

curretly, this method is implemented in this post.
svm machine learning

TODO:

Image filter and Edge detection
sklearn.svm to training data
characters are overrided in captcha
...

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
captcha_L.py		captcha_L.py
captcha_P.py		captcha_P.py
sklearn.md		sklearn.md
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DecodeCaptcha

Introduction

Remove Noise

Seperate Characters

Extract Characters

Classify Features, data training

TODO:

About

Releases

Packages

Languages

richzw/DecodeCaptcha

Folders and files

Latest commit

History

Repository files navigation

DecodeCaptcha

Introduction

Remove Noise

Seperate Characters

Extract Characters

Classify Features, data training

TODO:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages