Ambitions in A.I. - Google Photos, why now?

❝❞
Contents

During the last Google I/O, Google I/O 2015, Google announced the free photo storage solution Google Photos. Some of the limitations of the services are related to quality. The upper bound is set at 16 Megapixels resolution for photos and 1080p resolution for videos, but this is far within acceptable bounds for storage of photos and videos for home use.

So why does Google, seemingly out of nowhere and with its timing off in terms of the current hypes, start this free photo storage service? Some speculate on Google’s desire to have more information about users. Although there is obviously some truth to that, others have noted that there are other applications.

Google is working on artificial intelligence as we all know. They have for some time, for a number of different applications. The most obvious one is for image recognition. This started with CAPTCHAs for simple, relatively small samples. The CAPTCHA challenges were pieces from books or articles that are hard to read using Google’s state of the art OCR capabilities of that time. They also ventured in the area of speech recognition, using recordings from Google Talk and (probably) Google Hangout as training data. This resulted in fine support for speech recognition for voice commands in various systems, such as Chrome. They also extended into more advanced feats of computer vision, such as recognizing traffic signs and other relevant aspects of traffic situations, as this is required for driving autonomous vehicles, such as Google’s driverless car experiments.

Others have noted this and it is not too hard to see what Google could do with users’ photos. However, I do not believe that the effort is limited to getting a bit more insight into the user. I think that is too short sighted for a company like Google, who have shown us that they think in quite ambitious scales. (Although I do not contest that the gained user information would not have its uses.)

It is obvious to see how Google could use speech recognition and computer vision to discover more information about the user. Discovering information in general may turn out to be far more important. Speech, conversations, can be used for a computer to learn to understand a conversation in context. It is also a way to gather data off the conversation itself. However, in itself is not enough to understand many aspects of the physical world. By venturing into the world of complex visual data, visual representations of the world as we experience it, we can let a computer learn far more. Also note that this is in addition to the capability of understanding audio and other data that is already available.

Google Photos may prove to be a quite significant missing link in training an artificial intelligence. Photos provide (almost always) clear and unmodified visual data. The resolution of photos that is now common, provides a detailed enough view that it is possible to distinguish any object that is also visible to the naked eye. Furthermore, the metadata attached to photos provides location data, approximate date and time, parameters of how the photo was taken, and other useful information.

Google Photos already offers a number of “easy A.I. tricks” for improving photos for the user. Using intelligent choices for enhancing sharpness, brightness, color balance and other properties of photos so users do not have to figure it out manually. But again, that is just a gimmick compared to feats such as synthesizing a video based on a number of photos.

I expect that quite soon Google will be able to demonstrate some impressive multi-disciplinary intelligent operations. For example: Imagine how many times the Eiffel Tower is captured on photo. Let’s make a conservative guess, say at least 1 million times. Now all of these pictures get on Google Photos. So what could we do with this?

It is quite amazing what we can derive given sufficient data. Data on sound and vision, position in space (i.e. where on the earth) and time. This is just an arbitrary example that I thought of. By no means anything exceptional. If anything, it is probably quite modest. Would it be a complex problem to create for an A.I. a natural motivation to understand and a drive to pursue, as opposed to a static one that is hardcoded in by hand? Things such as an unexplained shadow might provide a sufficient initial trigger, but some form of curiosity in the A.I. is needed to actually start looking.

Note that this is my own opinion, based on guessing, extrapolation of known applications, a bit of imagination and my own impression of past efforts and capabilities of Google. I have not been in contact with Google for any of this.