Facebook’s Artificial Intelligence Has The Ability to Search Photos by Content

The term artificial intelligence was coined 60 year ago. But now its starting to deliver. Lumos’s computer vision platform was initially used to improve the experience for visually impaired members of the Facebook community. Lumos is now powering image content search for all users. What does this means to you? You can now search for images on Facebook with key words that describe the contents of a photo, rather than being limited by tags and captions.

How does this work? It starts with the huge task of computational training. For the object recognition used in Facebook’s image search, the artificial intelligence (AI) system started with a small set of 130,000 public photos shared on Facebook. Using the annotated photos the system could learn which pixel patterns correspond to particular subjects. It then went on to use the tens of millions of photos on Facebook. So what this means is that the caption-reading technology trained a deep neural network though public photos shared on Facebook. The model essentially matches search descriptors to features pulled from photos with some degree of probability. You can now search for photos based on Facebook AI’s assessment of their content, not just based on how humans happened to describe the photos with text when they posted them.

How could this be used? Say you were searching on a dress you really liked in a video. Using the search it could be related back to something on Marketplace or even connect you directly with an ad-partner to improve customer experiences while keeping revenue growth afloat. So it seems it can help both customers, customer experience and companies selling things as well as ad partners.

What else is new? Facebook released the text-to-speech tool last April for visually impaired users so they could use the tools to understand the contents of photos. Then, the system could tell you that a photo involved a stage and lights, but it wasn’t very good at relating actions to objects. But now the Facebook team has improved that painstakingly labeling 130,000 photos pulled from the platform. Facebook trained a computer vision model to identify 12 actions happening in the photos. So for instance, instead of just hearing it was “a stage,” the blind person would hear “people playing instruments” or “people dancing on a stage” or “people walking” or “people riding horses.” This provides contextually relevancy that was before not possible.

You could imagine one day being able to upload a photo of your morning bagel and this technology could identify the nutritional value of that bagel because we were able to detect, segment, and identify what was in the picture.

So it seems the race is on for services not just for image recognition, but speech recognition, machine-driven translation, natural language understanding, and more. What’s your favorite AI vendor?

@Drnatalie, VP, Program Executive, Salesforce ITC

Share