In August 2015, Google announced the release of Android Mobile Vision API. At that time this API had mainly three components Face Detection, Barcode scanner and ways to capture the tracking of objects in real time. But later on some bugs were found in its implementation, due to which the access to this API was restricted by Google to the existing users only. Then recently in June 2016 these issues were resolved and Mobile Vision APIs were re-launched. But this time it had a few new features like OCR and scanning of Aztec barcodes. In the next sections lets understand what exactly is the new Mobile Vision API and how can we use it to build an efficient code base.
What is Android Mobile Vision API ?
Ever wondered how to detect a face, a QR code or a Bar code on an Android Device? If yes; you might have heard of, or used the FaceDetector.Face API of the Android framework, or the OpenCV SDK, or maybe you had opted for a cloud based solution like Cloud Vision API, which makes the requests to a web server, fetching the results for scans. But this new Android Mobile Vision API does not make any requests to a web server. Instead it performs real time image/video scanning on the device itself. Although this may sound a little in-effective, but its not. The Mobile Vision API is very efficient and deeply integrated in to the Android system by the means of Google Play Services SDK. This gives this API an added advantage over all other solutions, as being a developer you do not need to integrate any third party SDK to perform media analysis. All you may need to do is integrate the Google Play Services properly and start building on it. Android Mobile Vision API as of now performs three types of image/stream detection as shown in next sections.
Setting up Android Mobile Vision Library
When speaking of this new multipurpose Android offline image recognition library, the only downside is that one has to set it up before the actual usage. Although it does not mean any manual configuration but a tag needs to be added in the application manifest. So that the installer knows that an extra package is needed to be downloaded as the app is being installed. As of now the mobile vision API has three components, therefore all three dependencies can be included in a single manifest tag under the application tag as shown:
<meta-data android:name="com.google.android.gms.vision.DEPENDENCIES" android:value="barcode,face,ocr"/>
This would instruct the OS to download the packages for all three types of offline image analysis. Next lets have a brief understanding of all the features of this powerful Mobile Vision API.
1. Barcode and QR Code scanning on Android
The Barcode Scanner API is used to scan various types of bar codes and qr codes. Some of them are:
- 1D barcodes: EAN-13, EAN-8, UPC-A, UPC-E, Code-39, Code-93, Code-128, ITF, Codabar
- 2D barcodes: QR Code, Data Matrix, PDF-417, AZTEC
Using this Mobile Vision API for Bar codes is very simple, even with so many supported formats. All you need to do is write around 10 lines of code to parse a Barcode or QR Code. One of the most interesting things about this API is that, it also parses the type of barcode/QR code it is scanning. All the information about the scanned barcode or QR Code type is found in the
valueFormat field of Barcode Object besides the scanned barcode. To learn more about the usage of Barcode Scanner API of the Android Mobile Vision API please refer to this tutorial:
2. Optical Character Recognition (OCR) on Android
Mobile Vision Text API is another great feature packed into this library. Its purpose is straight forward, i.e. text recognition in an image or a stream of frames. It can recognize a majority of Latin character based languages like English, French, Italian Dutch etc. Therefore if your app needs to extract text out of an image, this is the perfect solution for you. As once its dependencies are downloaded, you no more required to have an internet connection, all the character recognition takes place on device itself.
Incase you wonder; how would this Android Mobile Vision Text API deliver large results, as when scanning a page of a book- a lot of words could be scanned. But thankfully this API returns text in a structured way. It mainly divides text into three sections:
- Block: Top level structure, contains all the paragraphs.
- Line: Mid level structure, contains lines in a block.
- Word: Low level structure, contains a single word from line.
To learn more about the Text API, please refer to this tutorial:
3. Face Detection on Android
This is the most powerful API in all, as it has human Face Detection capabilities. It is a perfectly suited API for any face filter or camera app, as it performs the analysis on the device itself once the package is downloaded. Interestingly it not only recognizes a face but can also extract the facial features of that face, including eyes nose and mouth etc. Since this API as of now does not support face recognition it cannot identify similarity between two faces, but can still classify the features like, if the eyes are open or not. The functionalities that Mobile Vision Face Detection API supports are:
- Landmark Detection: Face API understands the human face in terms of landmarks. When a face is scanned via this API, it identifies that face via landmarks. In simple terms face landmarks are: nose, mouth, left eye, and right eye etc. By using this API you can actually extract the position of all these landmarks.
- Classification: This API not only scans a face but can also apply some basic logic and identify certain characteristics on the scanned face. For ex. with this feature we can find out whether the face has its eyes open or not. Also we can find out the probability of a smile on that face.
- Tracking: This is the most interesting feature of this API, you can actually track a face in a video sequence through this API. This feature of Mobile Vision API can be used to identify and track a face in a video. Once again this is not an application of face recognition, instead it tracks the face through movement of that particular face in the video.
To learn more in detail about Face Detection, please refer to this tutorial:
Tracking Faces, Bar codes and QR Codes Simultaneously
Interestingly, if you want to scan and track multiple faces simultaneously, even that is possible through this API. All you need to do is initialize a MultiProcessor and track multiple faces through this API. Also if you want to track multiple Bar codes or QR codes, same thing is applicable, as all of this is a part of Mobile Vision API, backed by Google Play Services. But even more interesting thing is that you can track Bar codes, QR codes and Faces in a single frame, by using the MultiDetector class of the same API suite. This makes the mobile vision APIs an outstanding leader in terms of features when compared to any other such library in the market. As its light, easy to integrate and very easy to use. For more updates, please connect with us on Twitter, Facebook and Google+.