fonctionnement Google

How Google Ranks Images in Search

Google has its own PageRank for images. Discover how image search ranking works (alt text, proximity to text, frequency, image size, functionality levels) and how to optimize your images for Google Images.

Published on février 19, 2023 Reading 20 min By Stan De Jesus Oliveira

Définition de Comment Google classe les images dans la recherche

How are images ordered in the way they appear on Google Images?

A search engine tends to rely on the text associated with these images to rank them in image search. This can include alternative text linked to the image, a caption, or other text appearing on a page near the image.

Certain other information can also be used to rank these images, such as the relevance of the page on which an image appears for the searched query term, and the quantity and quality of links pointing to that page.

If you are looking for a practical guide to optimising your image SEO on Google, refer to the article optimising images for SEO.

Machine Perception

Machine Perception, or Machine Perception in English, is the science behind helping machines understand images, videos, and sounds.

Just like information retrieval — the science behind search engines — this science aims to make machines understand the aforementioned elements. It is therefore complementary to information retrieval for search engines like Google.

Inception is a good example of this science, particularly for object recognition.

A PageRank for Images

Image from the patent

Google’s “PageRank for Product Image Search” patent concerns the use of Google’s PageRank algorithm to improve product image search results.

The PageRank algorithm is an algorithm used by Google to rank web pages in search results based on their relevance and importance. It works by analysing the inbound and outbound links of a web page to determine its relevance and quality.

Under this patent, Google uses a modified version of the PageRank algorithm to assess the relevance of product images based on the inbound and outbound links of the product pages on which these images appear. As a result, product images appearing on high-quality, relevant product pages will be considered more relevant in search results.

This patent aims to improve product image search results by highlighting the most relevant and useful product images for users. This can help users more easily find the products they are looking for, which benefits businesses selling those products and users seeking to purchase them.

Image Ranking: A Microsoft Patent

Patent for image ranking for search at Microsoft

Image Ranking for Web Image Retrieval

Assigned to Microsoft

A web crawler crawls the web to collect images and text appearing on the same pages as those images. It can store all the text from those pages in a database, or only the text found within a certain distance of the images. Search engines use specific crawlers to scan images on websites.

First, search engines associate keywords with images that could be used as search queries. Although this patent belongs to Microsoft and not Google, it can provide interesting insights for holistic SEO practitioners. It is therefore worth reading.

Some Image Ranking Factors That May Be Used in Image Search

Number of websites containing an identical image

Images that appear on multiple websites may be more relevant for a query term than images appearing on only one website — or they may be considered less relevant.

Determining whether images are identical can mean checking whether images displayed on different pages actually share the same URL.

Identical images that are not at the same URL can be compared by electronically reducing them to a computer-readable hash value and comparing them to each other.

This method is often used to combat duplicate content and improve the relevance of image search results.

Images considered identical can be grouped and ranked together, which can help provide a more consistent and higher-quality user experience.

Number of websites containing a similar image

Following potentially the same reasoning as above, the text associated with similar versions of images on different pages may reinforce the relevance of an image relative to the text, or may make it less relevant depending on the similarity of the text across different pages.

A similar image is one that has been resized to be larger or smaller, cropped to contain only part of another image, or to which a border has been added.

Similarity between images can be determined by comparing their electronic hash values. However, it is important to note that images that are very similar but have different electronic hash values may still be considered distinct images.

Image Size

According to the patent, images containing more pixels could be ranked higher because they are more likely to be clicked by users. However, it is also possible that images with fewer pixels may be ranked higher than images with a large number of pixels.

Frequency of an Image on a Website

The ranking of images for certain keywords can be influenced both positively and negatively by the number of times they are used on the same website, whether across multiple pages or multiple times on the same page. However, if an image is part of the site’s graphic design — such as a list bullet — rather than having its own meaning, it could be ranked lower. Conversely, if the image has its own meaning, such as a logo for the site, it could receive a better ranking.

Image Feature Levels

The characteristics of an image can have an impact on image rankings, such as resolution, format, file size, entropy, and image gradient. Although it is not known precisely how these factors are taken into account for image relevance, it is possible that they are used to measure the quality or importance of the image. This may also relate to the idea that high-quality images can improve the user experience.

One could also argue that an image modified to catch the eye would be beneficial and favoured by algorithms, as this would increase the user experience, and more particularly the click-through rate and potentially the dwell time.

Other Image Ranking Factors

Image ranking can be influenced by various metrics related to their presence on a web page, including:

The total number of images present on the page
The number of images linked to a specific page
The number of thumbnail images present on the same page as the image in question
The number of links pointing to the image’s URL.

Weighting Text Based on Its Distance from an Image

Text that is closer to an image on a web page can be considered more relevant to the subject of the image than text that is further away.

To calculate this distance, various elements can be examined, such as:

The number of words separating the text and the image, b) The number of stop points such as “.” “?” “!” and other punctuation marks between the text and the image
The number of intermediate table data tags (<td>) between the text and the image
The number of intermediate table row tags (<tr>) between the text and the image.

Does Googlebot Read Text in Images?

Google acquired facial and object recognition company Nevenvision in 2006, as well as several other companies capable of recognising images.

In 2007, Google obtained a patent that used OCR (Optical Character Recognition) to verify postal addresses on business listings, and thereby verify these businesses in Google Maps. Database assisted OCR for street scenes and other images.

In 2011, Google published a patent application that used a range of recognition features (object, face, barcodes, landmarks, text, products, named entities linked to the Google Knowledge Graph) focused on searching and understanding visual queries, which appear to have underpinned the Google Goggles application, released in September 2010 — the visual queries patent was filed by Google in August 2010, the proximity in time between the patent filing and the introduction of Google Goggles reinforces the idea that they are related. User Interface for Presenting Search Results for Multiple Regions of a Visual Query

Google Goggles was a mobile image recognition application developed by Google. It was used for searches based on photos taken by portable devices. For example, taking a photo of a famous landmark would search for information about it, or taking a photo of a product barcode would search for information about the product.

Google obtained a similar patent in 2012 that reads signs inside buildings from Street View images. https://patents.google.com/patent/US8280891

Method and apparatus for automatically annotating images - This searches for similar images

2007 Method and apparatus for automatically annotating images — This searches for similar images, and when it finds them, it can then use the text associated with those similar images to create an annotation for the originally searched image.

Clustering queries for image search - An image search can be performed to find similar images

2012: Clustering queries for image search — An image search can be performed to find similar images; the results of this search can be pre-grouped or ranked based on visual and semantic similarity and clustered into groups. Each of these groups can be associated with search terms that could be linked to them to be used as an annotation.

Google patent on analysing text in images with OCR technology

Identifying canonical documents corresponding to a visual query and in accordance with geographic information https://patents.google.com/patent/US20120134590A1/en

This patent indicates that Google is able to perform searches on document images and return corresponding results, where the text on the queried document goes through OCR (Optical Character Recognition), and the words of the document are searched to find corresponding documents on the web (document images), which would mean that Google would start to index images of text on the web.

If that is the case, Google could also start using images of addresses as business locations where they appear as text. It could also begin to understand text from images in navigation and create site links where it had not done so before.

Google Patent for Image Classification When Identifying Landmarks in Image Collections

Google patent for image classification when identifying landmarks in image collections

Google obtained a patent focused on identifying popular landmarks in large collections of digital images.

However, no known system can automatically extract information such as the most popular tourist destinations from these large collections. As many new photographs are added to these digital image collections, it may not be possible for users to manually label the photographs to increase the utility of these digital image collections. Therefore, what is needed are systems and methods capable of automatically identifying and labelling popular landmarks in large collections of digital images.

How Might This Play Into Image Classification and Landmark-Related Search?

Automatic discovery of popular landmarks

The patent is:
Automatic Discovery of Popular Landmarks

Filed: 3 October 2016
Assignee: Google LLC

The patent describes a series of steps to integrate its image classification system into searches, which proceed as follows:

Enhance user queries to retrieve images of landmarks by receiving the user’s query.
Identify one or more keywords in the user’s query.
Select one or more matching tags from a landmark database corresponding to the keyword(s).
Complete the user’s query with the matching tag(s), thereby generating a completed query.

In addition to this, the image classification patent suggests it could also be used to automatically tag new digital images by performing the following actions:

Compare the new digital image with images in a landmark image database, which contains visual clusters of images of one or more landmarks.
Label the new digital image with at least one label based on at least one of these visual clusters.

Abstract

In one embodiment, the present invention is a method of populating and updating a landmark image database, comprising geographically clustering geo-tagged images based on geographic proximity to generate one or more geographic clusters, and visually clustering the geographic cluster(s) according to image similarity to generate one or more visual clusters. In another embodiment, the present invention is a system for identifying landmarks from digital images, comprising the following components: a database of geo-tagged images, a landmark database; a geographic clustering module; and a visual clustering module. In other embodiments, the present invention can enhance user queries to retrieve images of landmarks, or a method of automatically tagging a new digital image with text labels.

Even Smarter Image Classification of Landmarks?

This system appears capable of identifying popular landmarks in photo collections on the web and storing them in a landmark database, with the possibility of grouping them by geographic location. This initiative is interesting to consider. By integrating these landmark images into image search results, Google does not stop there with image classification. Indeed, another patent entitled “methods and systems for classifying images using semantic and aesthetic models” suggests that there are further ways to classify images.

Google’s patent indicates that images are classified according to an ontology linked to the subjects of the images. For example, when performing a Google image search for a landmark like the Washington Monument, several image classification labels appear at the top of the results, which you can click to refine results based on specific aspects of those monuments. Thus, image classification can include specific monuments, as well as even more precise classifications. This allows Google to have a smarter image classification with regard to landmarks, while labelling them in a way that makes them more meaningful.

Disambiguating Image Queries at Google

Google received a patent for displaying image results that identify objects present in photographs and videos

Better Understanding Image Queries
This patent is: Contextual Query Disambiguation

Assignee: Google LLC
Grant: 18 February 2020
Filed: 20 March 2017

Abstract

The invention relates to methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextual query disambiguation. In one aspect, a method comprises receiving an image presented on a screen of a computing device and a transcription of an utterance spoken by a user of the computing device, identifying a particular sub-image that is included in the image, and based on performing image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image. The method also comprises, based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image, based on the transcription, the first labels, and the second labels.

Google received a patent to display image results that identify objects present in photographs and videos. However, search engines can sometimes struggle to understand queries formulated in natural language. The patent therefore focuses on resolving ambiguities in image queries.

The example given in the patent is as follows: a user may ask a question such as “What is this?” about a photograph they are looking at on a computing device. This method can work for image, text, or video queries, or for a combination of these elements.

Thus, to respond to an image identification request, a computing device can capture the image in question, transcribe the question, and transmit this transcription along with the image to a server.

What the Server Can Do with Image Queries

The server can receive the transcription and image from the computing device and then perform the following steps:

Identify the visual and textual content of the image
Generate labels for elements in the image such as places, entities, names, animal types, etc.
Recognise a specific sub-image within the image, which may be a photograph or a drawing.

In a first step, the server can:

Identify a part of the sub-image of main interest to a searcher, such as a historical landmark in the image.
Perform image recognition on the sub-image to generate labels for that sub-image.
Generate labels for text in the image, such as comments on the sub-image, using text recognition on a portion of the image that is not the sub-image.
Create a search query based on the transcription and the generated labels.
Provide this query to a search engine.

The Process Behind Visual Query Disambiguation

The described process involves the following steps:

Receive an image presented on a computing device, or corresponding to a part of its screen
Understand the transcription of a request spoken by a searcher when presenting the image
Recognise a sub-image included in the image by performing image recognition on the sub-image
Determine first labels to show the context of the particular sub-image
Perform text recognition on a portion of the image other than the particular sub-image
Create second labels to show the context of the sub-image based on the transcription, the first and second labels
Compile a search query
Provide the search query as output.

The process behind visual query disambiguation

Other Aspects of Executing Such Image Query Searches May Involve:

The process includes differentiated weighting of the first and second labels. Depending on the terms of the transcription, the search query may replace the first labels or the second. Labels are evaluated in terms of confidence, i.e. the probability that they correspond to the sub-image of main interest to the user. First and second labels are selected based on their respective confidence scores, and the search query is built from the selected labels.

The process also uses historical query data to generate candidate search queries from the transcription and labels. Candidate queries are evaluated in terms of confidence score, i.e. the probability that they precisely match the transcription. A search query is chosen from the candidates by comparing historical query data to the candidate search queries.

Furthermore, the process enables selection of images included in the image by evaluating their confidence score based on their probability of being the image of main interest to the user. A sub-image is then created based on the confidence scores of the images.

The received data includes a selection of control events at the level of the computing device, which identifies the sub-image. This selection can be triggered by the detection of a predefined keyword, which causes the computing device to capture the image and the audio data corresponding to the utterance.

Contextual Query Disambiguation Also Requires

The following process consists of:

Receiving an additional image from the computing device along with an additional transcription of an utterance spoken by a user of the computing device.
Identifying an additional sub-image included in the additional image by executing image recognition on the additional sub-image.
Determining additional first labels that indicate the context of the additional sub-image, by executing text recognition on a portion of the additional image other than the additional sub-image.
Also determining additional second labels that indicate the context of the additional sub-image, based on the additional transcription, the additional first labels, and the additional second labels.

Generate a command and execute the command. Executing the command may include:

Store the additional image in memory.
Save the sub-image in memory.
Upload the additional image to a server.
Send the sub-image to the server.
Embed the additional image in an application on the computing device.
Retrieve the sub-image from the application on the computing device.
Create metadata associated with the sub-image, also using the first labels that indicate the context of the sub-image as well as the metadata associated with the sub-image.

The Benefits of Following the “Image Queries” Process May Encompass:

Facilitating the processing of natural language requests by determining the context of an image corresponding to a part of the display of a computing device
Selecting image and/or text recognition
Rewriting a transcription of a user’s utterance
Recognising that the user is referring to the photo displayed on the computing device
Extracting information about the photo to determine the context of the photo, as well as the context of other parts of the image that do not contain the photo, such as where the photo was taken.

Summary on Image Ranking

Google’s image ranking is a complex process for classifying and displaying image search results to users. It uses sophisticated algorithms to understand the content of images and match them to natural language search queries. Images are ranked according to their relevance to search queries, using criteria such as descriptive text, visual attributes, and user behaviour. The “image queries” process can be used to facilitate the processing of natural language queries by determining the context of an image on the display of a computing device. It can also be used to extract information about the photo, such as where it was taken, in order to better understand the context of the photo and provide more relevant search results to users. If you are still here, you may wish to read the article on advanced image optimisation, which revisits some of the concepts covered here.