Optical Character Recognition on Android – OCR

by Mohit Gupt
November 6, 2016September 1, 2019
16 Comments
Android

Android itself is a smart OS, still it lacked a very basic feature of text recognition. But not anymore; with the official Optical Character Recognition API of Android and the Mobile Vision library, now Android can perform OCR very efficiently and correctly. I did a very basic feature test to have a look at the new functionality and found out its very fast and easy to use. Here in this Optical Character Recognition(OCR) example of Android, I would simply import the library, click a picture of a piece of text and look for text blocks in it. But before doing so lets take an overview of Android Mobile Vision API to understand the working of Text API better.

Android Mobile Vision Library

Consider a scenario where you wish to scan and an image or a video stream and detect faces, barcodes, QR codes or Text in it on Android. Astonishingly till now no such framework existed on Android. But now Google has introduced Mobile Vision APIs. These set of APIs provide a very easy to use programming interfaces through which we can scan Faces, Barcodes, QR Codes and Text without writing huge amount of code. And the best part is that these can be used offline as well, i.e. only once when the app is downloaded the required dependencies are downloaded, post that an internet connection is not required any more. As this feature is introduced on Android through the Google Play services. To enable your app to use Mobile Vision APIs, you need to add this dependency in your build.gradle file:

compile 'com.google.android.gms:play-services-vision:11.4.0'

Please Note: Learn more on how to setup Google Play Services. Or learn more about Mobile Vision API.

Introducing an Android OCR Library – Text Recognition API

Since the Android OS was brought on to production devices, Optical Character Recognition has been a common area of research. But this Text Recognition API of Mobile Vision suite would bring all these researches to a stop. As this Google powered API contains features like multiple language recognition where languages are like : English, French, German, Spanish or any other Latin based text. Also the text can be parsed from a stream of frames i.e. a video and displayed on the screen in real time as displayed in the image above. But due to the scope of this Android OCR Library example we would keep things simple and scan the text from an image only, as this tutorial is targeted for beginners. Apart from this the interesting part is, all this is done offline by the Google Play services itself, i.e. no internet connection is required after once it has been set up in the app (shown in steps ahead). Now when it comes to structuring the text, this Android OCR library not only recognizes the text but can also divide the captured text into the following categories:

Block – TextBlock – A top level object where a scanned paragraph or column is captured.
Line – Line – A line of text captured from a block of text.
Word – Element – A single word recognized in a Line.

Android OCR Example

Now that we have a basic understanding of Android OCR library, particularly the Text Recognition API. I will demonstrate by an example where we would simply take a picture and scan for text in it. Now as a thumb rule to do so first we may need to set up our Android app to download the play services dependency for Optical Character Recognition. Therefore please include the block of code below in your manifest to instruct installer to download the OCR dependency at the time of installing the app.

<meta-data
            android:name="com.google.android.gms.vision.DEPENDENCIES"
            android:value="ocr"/>

Please Note: This is not a mandatory step, but helps in downloading the dependencies beforehand. Also the link to full source code is at the end of this tutorial.

Next lets define a layout to display the scanned results:

<?xml version="1.0" encoding="utf-8"?>
<android.support.constraint.ConstraintLayout
    android:id="@+id/activity_main"
    xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context="com.truiton.mobile.vision.ocr.MainActivity">

    <ImageView
        android:id="@+id/imageView"
        android:layout_width="200dp"
        android:layout_height="200dp"
        android:layout_marginTop="16dp"
        app:layout_constraintLeft_toLeftOf="parent"
        app:layout_constraintRight_toRightOf="parent"
        app:layout_constraintTop_toTopOf="parent"
        app:srcCompat="@mipmap/truiton"
        tools:layout_constraintLeft_creator="1"
        tools:layout_constraintRight_creator="1"/>

    <Button
        android:id="@+id/button"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginBottom="8dp"
        android:text="Scan Text"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintLeft_toLeftOf="parent"
        app:layout_constraintRight_toRightOf="parent"
        tools:layout_constraintLeft_creator="1"
        tools:layout_constraintRight_creator="1"
        />

    <TextView
        android:id="@+id/textView"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginTop="40dp"
        android:text="Scan Results:"
        android:textAllCaps="false"
        android:textStyle="normal|bold"
        app:layout_constraintLeft_toLeftOf="parent"
        app:layout_constraintRight_toRightOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/imageView"
        tools:layout_constraintLeft_creator="1"
        tools:layout_constraintRight_creator="1"/>

    <ScrollView
        android:layout_width="0dp"
        android:layout_height="0dp"
        android:layout_marginTop="8dp"
        android:paddingLeft="5dp"
        android:paddingRight="5dp"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintHorizontal_bias="1.0"
        app:layout_constraintLeft_toLeftOf="parent"
        app:layout_constraintRight_toRightOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/textView"
        tools:layout_constraintTop_creator="1"
        tools:layout_constraintRight_creator="1"
        tools:layout_constraintBottom_creator="1"
        tools:layout_constraintLeft_creator="1">

        <LinearLayout
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:orientation="vertical">

            <TextView
                android:id="@+id/results"
                android:layout_width="wrap_content"
                android:layout_height="wrap_content"
                android:layout_marginTop="8dp"
                app:layout_constraintLeft_toLeftOf="parent"
                app:layout_constraintRight_toRightOf="parent"
                tools:layout_constraintLeft_creator="1"
                tools:layout_constraintRight_creator="1"
                tools:layout_constraintTop_creator="1"/>
        </LinearLayout>
    </ScrollView>
</android.support.constraint.ConstraintLayout>

I was playing around with ConstraintLayout in Android. Hence made it in Constraint layout, although its not a requirement to use ConstraintLayout for this Android OCR Library example. The above layout basically contains a ScrollView to accurately display the scanned text. Next lets define the main Activity:

package com.truiton.mobile.vision.ocr;

import android.Manifest;
import android.content.Context;
import android.content.Intent;
import android.content.pm.PackageManager;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.net.Uri;
import android.os.Bundle;
import android.os.Environment;
import android.provider.MediaStore;
import android.support.annotation.NonNull;
import android.support.v4.app.ActivityCompat;
import android.support.v4.content.FileProvider;
import android.support.v7.app.AppCompatActivity;
import android.util.Log;
import android.util.SparseArray;
import android.view.View;
import android.widget.Button;
import android.widget.TextView;
import android.widget.Toast;

import com.google.android.gms.vision.Frame;
import com.google.android.gms.vision.text.Text;
import com.google.android.gms.vision.text.TextBlock;
import com.google.android.gms.vision.text.TextRecognizer;

import java.io.File;
import java.io.FileNotFoundException;

public class MainActivity extends AppCompatActivity {
    private static final String LOG_TAG = "Text API";
    private static final int PHOTO_REQUEST = 10;
    private TextView scanResults;
    private Uri imageUri;
    private TextRecognizer detector;
    private static final int REQUEST_WRITE_PERMISSION = 20;
    private static final String SAVED_INSTANCE_URI = "uri";
    private static final String SAVED_INSTANCE_RESULT = "result";

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        Button button = (Button) findViewById(R.id.button);
        scanResults = (TextView) findViewById(R.id.results);
        if (savedInstanceState != null) {
            imageUri = Uri.parse(savedInstanceState.getString(SAVED_INSTANCE_URI));
            scanResults.setText(savedInstanceState.getString(SAVED_INSTANCE_RESULT));
        }
        detector = new TextRecognizer.Builder(getApplicationContext()).build();
        button.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                ActivityCompat.requestPermissions(MainActivity.this, new
                        String[]{Manifest.permission.WRITE_EXTERNAL_STORAGE}, REQUEST_WRITE_PERMISSION);
            }
        });
    }

    @Override
    public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) {
        super.onRequestPermissionsResult(requestCode, permissions, grantResults);
        switch (requestCode) {
            case REQUEST_WRITE_PERMISSION:
                if (grantResults.length > 0 && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
                    takePicture();
                } else {
                    Toast.makeText(MainActivity.this, "Permission Denied!", Toast.LENGTH_SHORT).show();
                }
        }
    }

    @Override
    protected void onActivityResult(int requestCode, int resultCode, Intent data) {
        if (requestCode == PHOTO_REQUEST && resultCode == RESULT_OK) {
            launchMediaScanIntent();
            try {
                Bitmap bitmap = decodeBitmapUri(this, imageUri);
                if (detector.isOperational() && bitmap != null) {
                    Frame frame = new Frame.Builder().setBitmap(bitmap).build();
                    SparseArray<TextBlock> textBlocks = detector.detect(frame);
                    String blocks = "";
                    String lines = "";
                    String words = "";
                    for (int index = 0; index < textBlocks.size(); index++) {
                        //extract scanned text blocks here
                        TextBlock tBlock = textBlocks.valueAt(index);
                        blocks = blocks + tBlock.getValue() + "\n" + "\n";
                        for (Text line : tBlock.getComponents()) {
                            //extract scanned text lines here
                            lines = lines + line.getValue() + "\n";
                            for (Text element : line.getComponents()) {
                                //extract scanned text words here
                                words = words + element.getValue() + ", ";
                            }
                        }
                    }
                    if (textBlocks.size() == 0) {
                        scanResults.setText("Scan Failed: Found nothing to scan");
                    } else {
                        scanResults.setText(scanResults.getText() + "Blocks: " + "\n");
                        scanResults.setText(scanResults.getText() + blocks + "\n");
                        scanResults.setText(scanResults.getText() + "---------" + "\n");
                        scanResults.setText(scanResults.getText() + "Lines: " + "\n");
                        scanResults.setText(scanResults.getText() + lines + "\n");
                        scanResults.setText(scanResults.getText() + "---------" + "\n");
                        scanResults.setText(scanResults.getText() + "Words: " + "\n");
                        scanResults.setText(scanResults.getText() + words + "\n");
                        scanResults.setText(scanResults.getText() + "---------" + "\n");
                    }
                } else {
                    scanResults.setText("Could not set up the detector!");
                }
            } catch (Exception e) {
                Toast.makeText(this, "Failed to load Image", Toast.LENGTH_SHORT)
                        .show();
                Log.e(LOG_TAG, e.toString());
            }
        }
    }

    private void takePicture() {
        Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
        File photo = new File(Environment.getExternalStorageDirectory(), "picture.jpg");
        imageUri = FileProvider.getUriForFile(MainActivity.this,
                BuildConfig.APPLICATION_ID + ".provider", photo);
        intent.putExtra(MediaStore.EXTRA_OUTPUT, imageUri);
        startActivityForResult(intent, PHOTO_REQUEST);
    }

    @Override
    protected void onSaveInstanceState(Bundle outState) {
        if (imageUri != null) {
            outState.putString(SAVED_INSTANCE_URI, imageUri.toString());
            outState.putString(SAVED_INSTANCE_RESULT, scanResults.getText().toString());
        }
        super.onSaveInstanceState(outState);
    }

    private void launchMediaScanIntent() {
        Intent mediaScanIntent = new Intent(Intent.ACTION_MEDIA_SCANNER_SCAN_FILE);
        mediaScanIntent.setData(imageUri);
        this.sendBroadcast(mediaScanIntent);
    }

    private Bitmap decodeBitmapUri(Context ctx, Uri uri) throws FileNotFoundException {
        int targetW = 600;
        int targetH = 600;
        BitmapFactory.Options bmOptions = new BitmapFactory.Options();
        bmOptions.inJustDecodeBounds = true;
        BitmapFactory.decodeStream(ctx.getContentResolver().openInputStream(uri), null, bmOptions);
        int photoW = bmOptions.outWidth;
        int photoH = bmOptions.outHeight;

        int scaleFactor = Math.min(photoW / targetW, photoH / targetH);
        bmOptions.inJustDecodeBounds = false;
        bmOptions.inSampleSize = scaleFactor;

        return BitmapFactory.decodeStream(ctx.getContentResolver()
                .openInputStream(uri), null, bmOptions);
    }
}

In the above piece of code, I simply initialized a TextRecognizer and asked the user to grant the permission to store the captured image on disk. Post which, when an image like shown below is captured, we resize the image in method decodeBitmapUri to a smaller size so that, it can be scanned faster. Once the image is scaled, we check for operational TextRecognizer. Then after the detector is operational, we scan out the text from picture.

When the Android OCR library – the Mobile Vision, returned the text from the picture above, it was very accurate. Have a look at the result below:

For full source code, please refer to the link below:

Full Source Code

Hence I believe this is one of the best OCR libraries available for Android till date. As it gives the unique capability of offline text scanning, without compromising on quality. Therefore if you have to make an app where it is required to scan the text and process it, use the Text API of Mobile Vision for Optical Character Recognition. For more posts like this, please connect with us on Twitter, Facebook and Google+. Hope this helped.

Mohit Gupt

Born in New Delhi, India. A software engineer by profession, an android enthusiast and mobile development evangelist. My motive here is to create a group of skilled engineers, who can build better software. Reason being programming is my passion, and also it feels good to make a device do something you want. Professionally I have worked with many software engineering and product development firms. As of now too, I am employed as a senior engineer in a leading tech company. In total I may have worked on more than 20 projects professionally, but whenever I get spare time I share my thoughts here at Truiton.

16 thoughts on “Optical Character Recognition on Android – OCR”

José January 2, 2017 at 12:58 am
Reply
Hi Mohit,
First of all congratulations for the great post. There isn’t so much information about this topic on internet and you explained it so well.
I’ve been following your step-by-step guide and also deployed your code. However, even using a new Galaxy, the accuracy of the text detected is so poor (not even a word) . I think the problem could be in the decode function because I have deployed de google labs and that one works fine.
Could you please can give me some advice or telling how to fine the results.
Thanks in advance and again, great great job!
Jay May 30, 2017 at 5:46 pm
Reply
Hi.. Mohit,
Its work very fine in Asus Zenfone 2 (Ver. 5.0) but not working in Samsung Galaxy S3 (V. 4.3) and LG nexus 5x (Ver. 7.0) in Samsung it takes picture but error going on: “Scan Failed: Found nothing to scan” and in Nexus 5x error going on: “crash the app”.
So what should i do for that solving error and how to solve their issue ?
Help as soon as possible.
Thanks in Advance.
Ahmed June 29, 2017 at 5:07 pm
Reply
how use this tutorial to support Arabic and other languages not just for English
1. oza san August 24, 2017 at 8:58 pm
  Reply
  Dear Mohit,
  first of all thank you for your great work.next how i can use it for other langues which didn’t use Latin characters?and how i can use images taken from videos?
  thanks in advance!!!
Eleazer July 13, 2017 at 4:17 pm
Reply
Hello Mohit,
This is a nice tutorial, I’d tried this and it works fine but there’s only one bug:
The blocks are not correctly arrange sometimes, the blocks are shuffled if there are 3 or more blocks,
I don’t know why.
Could you help on that, thanks in advance
1. Mark Jones August 9, 2017 at 8:37 am
  Reply
  I’m having the same problem, any help would be greatly appreciated
pourya October 22, 2017 at 2:25 am
Reply
hey thank for the code
is there any way to scan texts in real time??(not using intent for camera open and results)
thanks
Sarounas October 31, 2017 at 12:29 am
Reply
Is there a way to recognize digits? For example numbers that exist in the text….
Shoaib February 10, 2018 at 10:22 am
Reply
Sir how i can add translation in this program?
kiran February 15, 2018 at 10:53 am
Reply
whether the text can be edited
1. Mohit Gupt February 17, 2018 at 7:56 am
  Reply
  yes
Nandang Duryat March 3, 2018 at 8:02 pm
Reply
Thanks its work like a charm, but is it possible to do realtime detect text?
supriya chauhan May 3, 2018 at 4:19 pm
Reply
Hi,
This code works so well with clarity of scanned words, but can we customize the camera view since this time a complete screen is used in scanning. Is it possible to reduce the scanning window size.
samuel wainaina November 30, 2018 at 3:04 pm
Reply
thank for such a clear and understandable guide. the problem i have is that can i scan two sided file like an ID and store it in one file with their corresponding text file. The storage will be a remote server.
s Chatterjee February 15, 2019 at 4:49 pm
Reply
its not scanning hand written text properly.
Vempati Satya Suryanarayana August 9, 2020 at 8:50 am
Reply
Very helpful article. Thanks for the post