Search Engine Optimiaztion (SEO)

How does Search Engine Work

It is divided into 3 phases - Crawling, Indexing and Serving list

Crawling

Google will constantly look for new and updated pages and add them to its list of known pages. This process is called "URL discovery". Some pages are known because Google has already visited them. Other pages are discovered when Google follows a link from a known page to a new page: for example, a hub page, such as a category page, links to a new blog post. Still other pages are discovered when you submit a list of pages (a sitemap) for Google to crawl.
Once Google discovers a page's URL, it may visit (or "crawl") the page to find out what's on it. We use a huge set of computers to crawl billions of pages on the web. The program that does the fetching is called Googlebot (also known as a crawler, robot, bot, or spider).
Googlebot doesn't crawl all the pages it discovered. Some pages may be disallowed for crawling by the site owner, other pages may not be accessible without logging in to the site.
During the crawl, Google renders the page and runs any JavaScript it finds using a recent version of Chrome, similar to how your browser renders pages you visit. Rendering is important because websites often rely on JavaScript for content (Client side rendering)

Indexing

After a page is crawled, Google tries to understand what the page is about. This stage is called indexing and it includes processing and analyzing the textual content and key content tags and attributes, such as <title> elements and alt attributes, images, videos, and more.
Google determines if a page is a duplicate of another page on the internet or canonical. nd then we select the one that's most representative of the group. The other pages in the group are alternate versions that may be served in different contexts
Google also collects signals about the canonical page and its contents, which may be used in the next stage, where we serve the page in search results. Some signals include the language of the page, the country the content is local to, and the usability of the page. It may be stored in the Google index, a large database hosted on thousands of computers.

Serving

When a user enters a query, our machines search the index for matching pages and return the results we believe are the highest quality and most relevant to the user's query. Relevancy is determined by hundreds of factors, which could include information such as the user's location, language, and device (desktop or phone)

Registration

In order to let user search your website through search engine , you need to register your website

Sitemap

It is used to tell search engine that the pages that included in your website
Important for the large-scale website
It doesn't affect the ranking , just a map to let search engine know the pages of your website

<?sitemap.xml?>
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<!--  created with Free Online Sitemap Generator www.xml-sitemaps.com  -->
<url>
    <loc>https://petercheng-blog.vercel.app/</loc>
    <lastmod>2021-06-10T18:28:24+00:00</lastmod>
</url>
<url>
    <loc>https://petercheng-blog.vercel.app/project/superwhackamole</loc>
    <lastmod>2021-06-10T18:28:24+00:00</lastmod>
</url>
<url>
    <loc>https://petercheng-blog.vercel.app/project/iefyp</loc>
    <lastmod>2021-06-10T18:28:24+00:00</lastmod>
</url>
<url>
    <loc>https://petercheng-blog.vercel.app/project/friendchat</loc>
    <lastmod>2021-06-10T18:28:24+00:00</lastmod>
</url>
</urlset>

Ranking

There are many factors affecting SEO ranking
Contains popular keywords
Contains linkage with popular / high-ranked website
Mobile user-friendly
Contain linkage with your page
Good title containing popular keywords

Robot.txt

A robots.txt file is a set of instructions telling search engines which pages should and shouldn’t be crawled on a website. Which guides crawler access but shouldn’t be used to keep pages out of Google's index.
It is also prevent from web scraping for some pages

User-agent: *
Disallow: /private/
Allow: /public/

User-agent: Googlebot
Disallow: /no-google/

Sitemap: https://www.example.com/sitemap.xml

References

In-Depth Guide to How Google Search Works | Google Search Central | Documentation | Google for DevelopersGoogle for Developers

從 Sitemap的應用，談SEO的學習 | Harris先生Harris先生

SEO 101: Everything You Need To Know About Metadata - it'seeze

PreviousSingle Page & Multiple Page Application NextWeb bundling & Micro-frontend

Last updated 1 year ago

Was this helpful?

How does Search Engine Work

Crawling

Indexing

Serving

Registration

Tags

Sitemap

Ranking

Robot.txt

References