Automated way of solving a Fun Captcha

Fun captcha is a cool thing which most websites have as a check to verify if it’s a human interacting with the site features. These are an extension of the regular captchas which have characters shown, which need to be fed in as an input to pass the verification gate. 

                                                 

Fun captcha has images rotated about a certain angle with reference to the actual default position. There are controls provided to the user to rotate the image both clockwise and anti-clockwise, where the user visually inspects and bring the image back to the default position. 

The purpose of this blog is to just give a fun explanation on how to solve a captcha without human interference and not to motivate any wrong practices. (Most of the sites will not have the script access for the user to manipulate the images from front-end). 

The concepts and tools used here are Python, OpenCV, Mathematics and Statistics. 

Approach- 

The major challenge here is to understand what is the current orientation of the images. As humans we have a defined image of a particular object stored in our head, so when we solve the captcha, we refer to that reference image and rotate the image accordingly. So, the 1st part of the solution is to recognise what is there in the image. Even for a human, if he does not have the idea on what is there in the image, he cannot solve the captcha. 

To detect the object in the captcha fasterrcnn_resnet50_fpn is used which has been trained on the COCO dataset with about 5 million images in it. 

The next trick to be solved here is the images in the captcha are never exactly the same as they look in actual. 

As in the above case of dog the outlines of the dog are not exactly as the dog will look like. We as humans can understand it with our tremendous brain computational power, but it is tricky for the model to recognize it. Even sometimes the head of the animal is cropped and the captcha has just the body portion. 

Hence, the results of the object detection model has multiple classes being detected. In the above case of the dog captcha, it either detects it to be a dog or a cat or something else. There needs to be some cross verification logic to check if the object being detected is correct or not. 

Baseline of approaching the solution- 

Once the object is detected it may be of single class, or multiple classes, next thing is to rotate the image from current position to the default position. Here again we try to mimic the way human mind works. So, the question is how do we understand by what degrees is the image orientation wrong? For this we first should know what the default position is and then with reference to that we understand that the image has to be rotated by certain degrees. Our brain back propagated from the solution, to the question, to find the way to solve. 

This is not possible with the machine. Hence the model needs to understand what the default position is and then it will rotate the image by the required number of degrees. 

A way to solve this is to understand that as humans when do we recognise the object best? When the object/image is at its default position. Same thing applies with the machine too. 

Final Solution- 

The image/captcha will be rotated by 360 degrees from its initial position by intervals of 5 degrees each. At every degree of rotation, the captcha image will be fed to the object detection system, and the probabilities of different classes detected will be stored. The class and the rotated degree with the highest probability will be the best recognised by the object detection system. Hence this position can be assumed to be the default position of the image/captcha. This degree value is the amount of rotation needed to bring the captcha to its the default position. 

There is one more issue to be solved with this technique. As mentioned earlier the images in the captcha are not exactly the same as the animals/objects look in the real world, hence there are multiple object classes predicted, and hence many times the wrong class has the highest probability value, which will not bring the image to the default position by rotating the captcha image.

To solve these some statistical and mathematical methods are used. Firstly top 5 highest probability values are selected and for each value adjacent 6 values are selected 3 on both sides. 

Standard deviation of these 6 probabilities is taken to eliminate the false high probability values. The probability range with the lowest standard deviation gives the best chance of being close to the default image position. 

Also, there are two other factors considered to cross verify the false high probability values. One is the difference between the mean of 1st three and the mean of last three probability values out of the 6 values.This difference is stored in a variable ==’’ diff ‘’. Both the ” diff ” and std-deviation values are multiplied and stored in as a ” final deciding factor “. 

The probability value with the lowest ” final deciding factor ” is the one to be used further. The corresponding degree associated with this probability value is the degree of rotation which the captcha needs to be rotated in order to get it back to its default position. 

Test case

The Conversion Function- 

def fun_captcha(path):
    
    img  = cv2.imread(path)
    img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)

    prob=[]
    for degree in range(0,360,5):
      test_img=rotate_image(img,degree)
      img_tensor = torch.tensor(test_img,dtype=torch.float)
      img_tensor.shape
      img_tensor = img_tensor.permute(2,0,1)
      img_tensor = img_tensor.unsqueeze(axis=0)
      img_tensor.shape
      img_tensor = img_tensor/img_tensor.max()
      img_tensor = img_tensor.to(device)
      pred = frncc(img_tensor)
      dictt={}
      dictt[COCO[pred[0]['labels'][pred[0]['scores'].argmax()]]]=float(pred[0]['scores'].max())
      dictt['degree']=degree
      prob.append(dictt)


    prob_val=[]
    for value in prob:
        for i,j in value.items():
          if i !='degree':
            prob_val.append(j)


    influence=[]
    for i in range(1,6,1):
      degree=np.array(prob_val).argsort()[-i]
      degree_adj=range(degree-3,degree+3,1)
      mean=[]
      for i in degree_adj:
        mean.append(prob_val[i])
      std=np.array(mean).std()
      diff=np.array(mean).max()-np.array(mean).mean()
      diff_end=(mean[0]+mean[1])/2-(mean[-1]+mean[-2])/2
      influence_factor=abs(std*diff*diff_end*1000)
      infl={}
      infl['influence_factor']=influence_factor
      infl['degree']=degree
      influence.append(infl)

    df=pd.DataFrame(influence)

    influence_lst_operate=[]
    for value in influence:
        for i,j in value.items():
          if i !='degree':
            influence_lst_operate.append(j)


    influence_lst=[]
    for i in influence_lst_operate:
      influence_lst.append(i)

    influence_lst_operate.sort()

    infl_std=np.array(influence_lst).std()

    for i in influence_lst:
        factor=influence_lst_operate[0]
        if i in influence_lst[:2]:
          if i*(infl_std/2)<=influence_lst_operate[0]:
            factor=i
        else:
          if i<=influence_lst_operate[0]:
            factor=i

    final_degree=int(df[df['influence_factor']==factor]['degree'])*5
    test_img=rotate_image(img,final_degree+5)
    degrees_to_rotate=final_degree+5
    return plt.imshow(test_img),degrees_to_rotate

Full code available on the below google colab link- 

https://colab.research.google.com/drive/1czsoOhPye3MeRI4BGS9J13fnHsS-Fjrd?usp=sharing

Leave a Comment

Your email address will not be published. Required fields are marked *