We are very excited to announce the release of LinkApic beta to the Android market. Please check the Android App page for all you’d like to know about the app.
The digital revolution has changed the world for better in the less than two decades. Do you remember 1996? From dial-up to 4G mobile Internet, we sure have made a great progress in a span of less than 15 years. The problem, however, is that the two worlds, real and digital, are still by-and-large separate. Need we say that the digital world (read Internet) has seen tremendous growth with billions of websites and information about almost anything? Still, the real world is bigger. Much bigger. We live in the real world. The digital world still lives inside the Laptop or the iPad or the Android phone.
So far we have brought many of the good things from the real world to the digital. It started by being able to write letters (i.e., email). Later we started shopping for everything online. And today we can socialize in the digital world. So now is the payback time; time to bring the good things from the digital world to the real. Wouldn’t it be cool to be able to navigate the real world by “clicking” on hyperlinks placed by someone? OK, let’s go back one. We believe that the physical and the digital worlds should be much more connected. The real world should not be totally disconnected from the digital, the way it is now. The interaction should be smooth and rewarding.
And we are playing our part. With computer vision and image recognition technology, we wish to enable user-generated searchable visual database. We aim to create a vast repository of tags and hyperlinks placed in the real world, let you navigate them, and create a democratic platform for you to decide what you see. Think wikipedia of the real world.
And this is happening today! Did we tell you that LinkApic Beta is now available on the Android Market? Go ahead, play around, and send us your feedback to info{at}linkapic.com!
Computer vision encapsulates several problems related, obviously, to making computers see. Object recognition focuses on making computers recognize the object in a scene (car, person, building, flower, and so on.). Image registration tries to match (or register) two images into a common plane of reference, allowing generation of panoramas, for example. Image segmentation tries to form neighboring groups of pixels (segments) that represent one homogeneous object. Activity recognition tries to find what is happening in a video sequence.
What can be done with computer vision technology today? There are several things that we are now fairly good at.
1) Image or video registration and panorama generation: A bunch of pictures (or a bunch of video frames) of the same scene can be stitched together using image registration technology in a very reliable and robust manner. Thanks largely to SIFT and SIFT-like local features (see my previous post) along with robust estimation algorithms, there are several commercial products available based on this technology. Want to stitch two or more images very accurately to form a mosaic? Try Mayachitra AIPR. Want quick and dirty panaroma on your iPhone? Try 360 Panaroma. Of course, there are many in-betweens, but you can be assured that this technology is getting matured.
2) Image recognition via matching: Recognizing images by matching is a comparatively simple technique that leverages the basic ideas from above (image registration). Have you been awed by the recognition capabilities of Google Goggles, kooaba Visual Search, or oMoby? Loosely speaking, matching is what they do. It is easier to recognize a book cover or landmarks by matching (or registering) it with the picture of the same item in a database. This is how book cover, or Golden gate bridge, or a wine label, or a famous painting can be recognized — because (again loosely speaking) there are picture(s) of the same item already in their database. Since matching works reliably using image registration-like technology, recognition is reliable and robust for items/objects that satisfy two conditions: (i) whose pictures are available to pre-ingest, and (ii) whose geometry is uniquely defined (e.g., different chairs will have different geometry, but the Tim Ferris book will always look the same). Book/CD/DVD covers, wine labels, famous artwork, major landmarks, all fall in this category. And this is how the apps you love, work.
3) Face detection and recognition in a constrained setup: By the shear amount of effort that has been put into face detection and recognition over the past couple of decades, we have made good progress in this inherently difficult problem. If you have used Picassa or iPhoto, you can see how the computer can find faces in your photo collection and ask for your approval. By using key descriptions of your eyes, nose, mouth, and facial structure (geometry), the algorithms can recognize faces from a limited collection. The bottomline today is that, finding faces from your photo collection is reliable (e.g., 40-100 faces in a collection of tens of thousands of pictures), but finding and recognizing faces from ALL of facebook is still very hard. We sure are making progress.
This concludes my trilogy of posts on the possibilities with computer vision technology today. Watch out for more technology related posts and leave comments on what you’d like to hear about.

Professor David Lowe, co-inventor of SIFT
Mobile computer vision is set to touch our lives in a tangible way. To continue the parallel, the big bang has started and the universe is expanding fast enough for us to experience the magic. There are three primary factors, in my view, that have contributed to the recent advance.
1) Powerful local descriptors: Early 2000s marked an exciting development in the field of image recognition, that has now touched every aspect of computer vision. The publication of SIFT descriptors will certainly go down in the history of computer vision in the same light as that of Turbo codes in digital communications. Using SIFT or SIFT-like framework, engineers can now robustly and accurately describe and match local regions in an image or video. A big leap forward, thanks to David Lowe and his team!
2) Machine learning: Although artificial intelligence with it’s rule-based deduction did not deliver on its promise, back in the 1980s, of solving all our problems, a related discipline, that of machine learning, has come to our rescue. Not depending on hand-coded rules and letting machines learn by looking at several examples via solving large optimization problems, it turns out, is the way to go. And we are figuring this out now!
3) Faster machines: Computer vision and image analysis problems are one of few that are always hungry for computing resources. Any computer vision researcher will tell you that having a powerful desktop is better than a laptop, having a cluster is better than a desktop, and having a cloud with thousands of computers is like having a vacation home on the moon. Needless to say Moore’s law has helped.
This is just the beginning. There are many unsolved problems and exciting challenges. This post gives the perspective and I’ll differ the question of what can be done and what can’t today to my next post.

A still from The Big Bang Theory
Have you seen the latest episode of The Big Bang Theory? Despite its name (or due to its name) The Bus Pants Utilization was an amazingly entertaining episode. What caught my attention and had me writing this blog was the fact that it was about smartphone applications! And more so because both the apps mentioned in the episode are –hold your breath– about mobile computer vision! Woah! Interesting.
Leonard, Sheldon, Howard, and Rajesh are sitting in the cafe and Leonard comes up with an interesting app idea (that clicked with my nerdy mind, for sure
): Create an app that can read-in a mathematical equation by snapping a picture of it’s handwritten form. Then use handwriting recognition followed by symbolic mathematics tools to solve the equation, let the user plug in variables, and in general, play around with it. I wonder, what are the chances that some real geek is already working on this?
Then, later in the episode, Penny comes up with another idea that appeals to her (and likely to lots of girls): shopping for shoes by snapping a picture of someone wearing it. Here is another challenging computer vision problem, that anyone in the field has surely heard and thought of before.
This incited mixed feeling in me, a long-time proponent and “well-wisher” of computer vision (CV) technology for mainstream consumers. At first I thought, “This is awesome. Mainstream media is making this field hip. People like what computer vision can do!.” But the next moment I thought “But… People expect these algorithms to work perfectly out of the box.” I and other computer vision researchers may have to go hide ourselves somewhere, because the field is still evolving and things work in a cleverly constrained setup. Problems, such as finding shoes for Penny, are still difficult…
We are seeing some very interesting CV-based applications that have come out in recent times (Google goggles, kooaba, SnapTell, and now Amazon.com app, thanks to A9). You would be amazed how these apps (with some differences among each other) can effortlessly recognize the cover of a book at any angle you snap it, or a CD/DVD cover, billboard ads, print ads/pages, and in case of Goggles, art, paintings, and so on.
If you are not in the CV field, you are wondering what works and what doesn’t, right? Why can kooaba recognize book covers, but not shoes? Why would like.com allow pattern and color search for shoes, but still not totally recognize the exact shoe you like (that you saw someone wearing on the street)? Why can SnapTell recognize a shoe ad on a billboard, but not on a street? For the answers, you’ll have to wait for my upcoming posts, where I’ll cover what works and what doesn’t today in Computer Vision.
There is no denying that there are exciting times ahead for mobile computer vision! This is a new dawn for computer vision, a field that has remained out of consumer limelight for two decades now. Watch out for apps that will take your breath away!
