How Search Engines Find And Rank Your Pages

For many people, Google sounds like magic.

It indexes the entire Internet and returns the most relevant results in Flash. How does it know what we want to see?

Understanding how Google can accomplish such a might is not too hard to wrap your head around.

First, you have to learn the technique that is used. And once you can understand, you get to know how websites are "indexed".

Additionally, it will help you understand how to build your website so that it is readable by Google bot (and other search engine robots).

You will be able to speak the language of Google, and this will help you generate more traffic to your site.

How the ranking works

Website rankings begin with an army of robots (or computer software).

Google and other search engines send millions of "bots" to "crawl" Internet websites.

When a bot crawls a website, it finds hundreds of data points on a website and stores them. It helps to determine:

A) what is your website
B) More types of information people can find there.

All this information that the bots have collected is then "indexed" or filed in Google's massive filing cabinet.

When a user searches on Google or Bing, the search engine rifles through this relevant filing cabinet to get the most relevant results.

If you type "cat video" for example, Google goes to its cat video filing tab and shows you what is indexed there.

So how do you ensure that your webpage about the CAT video appears in the results?

Google uses more than 200 "ranking factors" - or small pieces of information on your webpage that tell Google what your webpage is.

This includes things like keywords, and titles, URLs and page rank.

All these points matter (more than a few others), and when you use them strategically, you can help the army of robots know what your website offers.

What is crawling?

Army of robots - can make a dangerous mental picture. In fact, this "army of bots" is actually a network of super-computers that receive information from billions of web pages.

These computers scan your webpage and take notes about what it contains.

This is known as "crawling" and after that all the information gathered is "indexed", using algorithm learning, Google and Bing tell their bots which pages to index, how often They have to be indexed, and how many pages should be from your site. Indexed.

When a crawler visits your website, you can help them understand what it is looking for.

In fact, there are two files that you can include in your website's code that help crawl your site:

Robots.txt - is a file that tells Google bot and other crawlers what to do, which pages it can index, and may include instructions on how the bots link should behave.

You can set a crawl delay - to prevent a bot from crawling your site too fast and to slow down your server - or you can limit a particular bot from searching your site (i.e. Google And may allow Bing, but withholding Yahoo).

Search engines use bots to determine whether a website provides the product or information that someone is searching for.

If a website passes the test, it is indexed for that subject. When a website is indexed, it appears as a search result.

A simple example is the discovery of deep dish pizza in Chicago.

The results of this search will be websites such as Yelp and restaurant websites that offer popular food.

The Google bot has already indexed the restaurant to sell the product it is searching for: deep dish pizza in Chicago.

In this way, the search results are almost instant. What does a search engine bot mean to crawl a web page?

These bots search millions of web pages to direct the consumer to the right website.

This is a process called crawling. Google web crawling bots scrape the web for new pages to be added to Google's index.

They also check with web pages that are updated in the index again.

What happens if a webpage is not ready to be indexed because it is still under development?

One way to block bots is by avoiding spoiling a website or being indexed incorrectly.

How to block a bot from crawling a website: Robots.txt is the first access to a bot in order to find a website.

Website owners will place a Robo.tax file on their server to place a bot or crawler in some section of their website.

Other reasons to use Robo.tex file Keep bot from crawling a website or page

Block content from search engines. Duplicate content, private content, admin sections or pages under development are blocked, so bots cannot access them and index them.
Stop bots from crawling ads. Bots can read paid links and advertisements and accidentally index a site for that content. Since these links are not always related to the website niche, a website owner wants to tell the bots that it is an advertiser's link.
The only bots allowed on the site are respected bots. A non-reputable bot is a bot created by hackers looking for websites that answer specific search queries. Unlike Google bots, these bots steal the information they find and use date software and plugins to hack the website. If these non-reputable bots visit the website continuously, they can also significantly reduce page speed.
The above reason requires blocking access but no immediate access to a web server.

When a website is ready to be indexed, there is a way to create a virtual map to make it easier for the bots to read the content.

This virtual map is a sitemap and translates information on a website into an easy-to-read format for the bot.

The simplest version of the robots.txt file is

User-agent: * allow: /
sitemap: http://www.example.com/sitemap.xml

Important note that the URL of your sitemap is included in your robots.txt file.

Many websites fail to do this. This version allows all crawlers and bots to crawl every page on your website.

To be clear, a robots.txt file is only a guide to the crawler and having a robots.txt file allows you to guide the bot to the content you want to index.

However, the crawler / bot can ignore your robots.txt file, and they are not obliged to do as your robots.txt file shows.

So you should always make sure that your website is secure.

Sitemap.xml - Like the robots.txt file, your sitemap is a useful tool for web crawlers. It tells the bot about the organization of your web pages.

A sitemap can also include metadata - small bits of data that help the crawler understand what it is crawling.

Sitemaps are particularly important for large, multi-page websites, new sites without many external links, or sites with many rich types of media.

It can be difficult for a site bot to read a webpage with hundreds of products, as well as keep up to date with the content.

A sitemap incorporates that information into a code called XML.

This coding creates trackable URLs and informs bots how often to check back for new content.

Each sitemap contains information for a single webpage.

For example, a sitemap can be for a landing page, product page, or a page with a blog article.

Sitemaps are presented in search engines, which will be explained later in more detail.

Here's what a sitemap looks like

http://www.example.com/
2018-01-01 Monthly 0.8

The text above should be in every sitemap in bold font.

Text in bold and italics are optional tags that can provide more, but not necessarily, information to search engine bots.

A sitemap does not need to include the date of the last revision, how often their content will be updated, or priority level.

It might look like this:

http://www.example.com/

Since sitemaps help the bot understand website content faster, it is best to provide as much information as possible.

For the same website, the webpage sitemap does not have to be submitted individually.

Submit two or more Sitemaps by placing the following webpage link here with the following information:

http://www.example.com/sitemap1.xml.gz
2018-10-01T18: 23: 17 + 00: 00 http://www.example.com/sitemap2.xml.gz 2018-01-01

The only restriction here is that each Sitemap file can only contain 50,000 URLs.

The above example is a sitemap file with two URLs.

Sitemaps help to answer the questions that search engine bots are asking when they crawl a web page.

To help search engine bots index content, especially updated content more, there are several important factors to focus on.

Label

How Search Engines Find And Rank Your Pages

How the ranking works

What is crawling?

Related Posts

Comments