Web spiders and crawlers search the Internet for new and updated web pages true or False

Stay organized with collections Save and categorize content based on your preferences.

Overview of Google crawlers (user agents)

"Crawler" (sometimes also called a "robot" or "spider") is a generic term for any program that is used to automatically discover and scan websites by following links from one webpage to another. Google's main crawler is called Googlebot. This table lists information about the common Google crawlers you may see in your referrer logs, and how to specify them in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP directives.

The following table shows the crawlers used by various products and services at Google:

The user agent token is used in the User-agent: line in robots.txt to match a crawler type when writing crawl rules for your site. Some crawlers have more than one token, as shown in the table; you need to match only one crawler token for a rule to apply. This list is not complete, but covers most of the crawlers you might see on your website.
The full user agent string is a full description of the crawler, and appears in the HTTP request and your web logs.

Crawlers

APIs-Google

User agent token	APIs-Google
Full user agent string	APIs-Google (+//developers.google.com/webmasters/APIs-Google.html)

AdsBot Mobile Web Android

Checks Android web page ad quality.

User agent token	AdsBot-Google-Mobile
Full user agent string	Mozilla/5.0 (Linux; Android 5.0; SM-G920A) AppleWebKit (KHTML, like Gecko) Chrome Mobile Safari (compatible; AdsBot-Google-Mobile; +//www.google.com/mobile/adsbot.html)

AdsBot Mobile Web

Checks iPhone web page ad quality.

User agent token	AdsBot-Google-Mobile
Full user agent string	Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1 (compatible; AdsBot-Google-Mobile; +//www.google.com/mobile/adsbot.html)

AdsBot

Checks desktop web page ad quality.

User agent token	AdsBot-Google
Full user agent string	AdsBot-Google (+//www.google.com/adsbot.html)

AdSense

User agent token	Mediapartners-Google
Full user agent string	Mediapartners-Google

Googlebot Image

User agent tokens	Googlebot-Image Googlebot
Full user agent string	Googlebot-Image/1.0

Googlebot News

User agent tokens	Googlebot-News Googlebot
Full user agent string	The Googlebot-News user agent uses the various Googlebot user agent strings.

Googlebot Video

User agent tokens	Googlebot-Video Googlebot
Full user agent string	Googlebot-Video/1.0

Googlebot Desktop

User agent token	Googlebot
Full user agent strings	Mozilla/5.0 (compatible; Googlebot/2.1; +//www.google.com/bot.html) Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +//www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36 Googlebot/2.1 (+//www.google.com/bot.html)

Googlebot Smartphone

User agent token	Googlebot
Full user agent string	Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +//www.google.com/bot.html)

Mobile AdSense

User agent token	Mediapartners-Google
Full user agent string	(Various mobile device types) (compatible; Mediapartners-Google/2.1; +//www.google.com/bot.html)

Mobile Apps Android

Checks Android app page ad quality. Obeys AdsBot-Google robots rules.

User agent token	AdsBot-Google-Mobile-Apps
Full user agent string	AdsBot-Google-Mobile-Apps

Feedfetcher

User agent token	FeedFetcher-Google
Full user agent string	FeedFetcher-Google; (+//www.google.com/feedfetcher.html)

Google Read Aloud

User agent token

Google-Read-Aloud

Full user agent strings

Current agents:

Desktop agent:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +//developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)
Mobile agent:
Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +//developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)

Former agent (deprecated):

google-speakr

Google Favicon

User agent token	Googlebot-Image Googlebot
Full user agent string	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 Google Favicon

Google StoreBot

User agent token	Storebot-Google
Full user agent strings	Desktop agent: Mozilla/5.0 (X11; Linux x86_64; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36 Mobile agent: Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36

Google Site Verifier

User agent token	Google-Site-Verification
Full user agent string	Mozilla/5.0 (compatible; Google-Site-Verification/1.0)

A note about Chrome/W.X.Y.Z in user agents

Wherever you see the string Chrome/W.X.Y.Z in the user agent strings in the table, W.X.Y.Z is actually a placeholder that represents the version of the Chrome browser used by that user agent: for example, 41.0.2272.96. This version number will increase over time to match the latest Chromium release version used by Googlebot.

If you are searching your logs or filtering your server for a user agent with this pattern, use wildcards for the version number rather than specifying an exact version number.

User agents in robots.txt

Where several user agents are recognized in the robots.txt file, Google will follow the most specific. If you want all of Google to be able to crawl your pages, you don't need a robots.txt file at all. If you want to block or allow all of Google's crawlers from accessing some of your content, you can do this by specifying Googlebot as the user agent. For example, if you want all your pages to appear in Google Search, and if you want AdSense ads to appear on your pages, you don't need a robots.txt file. Similarly, if you want to block some pages from Google altogether, blocking the Googlebot user agent will also block all Google's other user agents.

But if you want more fine-grained control, you can get more specific. For example, you might want all your pages to appear in Google Search, but you don't want images in your personal directory to be crawled. In this case, use robots.txt to disallow the Googlebot-Image user agent from crawling the files in your personal directory (while allowing Googlebot to crawl all files), like this:

User-agent: Googlebot Disallow: User-agent: Googlebot-Image Disallow: /personal

To take another example, say that you want ads on all your pages, but you don't want those pages to appear in Google Search. Here, you'd block Googlebot, but allow the Mediapartners-Google user agent, like this:

User-agent: Googlebot Disallow: / User-agent: Mediapartners-Google Disallow:

Some pages use multiple robots meta tags to specify directives for different crawlers, like this:

In this case, Google will use the sum of the negative directives, and Googlebot will follow both the noindex and nofollow directives. More detailed information about controlling how Google crawls and indexes your site.

Controlling crawl speed

Each Google crawler accesses sites for a specific purpose and at different rates. Google uses algorithms to determine the optimal crawl rate for each site. If a Google crawler is crawling your site too often, you can reduce the crawl rate.

Retired Google crawlers

The following Google crawlers are no longer in use, and are only noted here for historical reference.

Retired Google crawlers

Duplex on the web

Supported the Duplex on the web service.

User agent token	DuplexWeb-Google
Full user agent string	Mozilla/5.0 (Linux; Android 11; Pixel 2; DuplexWeb-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Mobile Safari/537.36

Web Light

Checked for the presence of the no-transform header whenever a user clicked your page in search under appropriate conditions. The Web Light user agent was used only for explicit browse requests of a human visitor, and so it ignored robots.txt rules, which are used to block automated crawling requests.

User agent token	googleweblight
Full user agent string	Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-12-19 UTC.

[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"Missing the information I need" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"Too complicated / too many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out of date" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"Samples / code issue" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }] [{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]

What is a web crawler and what function does it serve in a search engine quizlet?

A Web crawler is an Internet bot which systematically browses the World Wide Web, typically for the purpose of Web indexing. A Web crawler may also be called a Web spider, an ant, an automatic indexer, or (in the FOAF software context) a Web scutter. .

What is a web spider also known as quizlet?

crawler search engines. Rely on sophisticated computer programs called "spiders," "crawlers," or "bots" that surf the Internet, locating webpages, links, and other content that are then stored in the search engine's page repository.

Is a software program you can use to find web sites web pages and Internet files?

A web browser is a type of software that allows you to find and view websites on the Internet. Even if you didn't know it, you're using a web browser right now to read this page! There are many different web browsers, but some of the most common ones include Google Chrome, Safari, and Mozilla Firefox.

Which is a program sent out by a search engine to crawl the web and gather information?

The program that does the fetching is called Googlebot (also known as a crawler, robot, bot, or spider). Googlebot uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site.

Which is the Web research resource that presents collaboratively written content that can be edited and added to by anyone with a computer or Internet access?

Overview of Google crawlers (user agents)

Crawlers

APIs-Google

AdsBot Mobile Web Android

AdsBot Mobile Web

AdsBot

AdSense

Googlebot Image

Googlebot News

Googlebot Video

Googlebot Desktop

Googlebot Smartphone

Mobile AdSense

Mobile Apps Android

Feedfetcher

Google Read Aloud

Google Favicon

Google StoreBot

Google Site Verifier

A note about Chrome/W.X.Y.Z in user agents

User agents in robots.txt

Controlling crawl speed

Retired Google crawlers

Duplex on the web

Web Light

What is a web crawler and what function does it serve in a search engine quizlet?

What is a web spider also known as quizlet?

Is a software program you can use to find web sites web pages and Internet files?

Which is a program sent out by a search engine to crawl the web and gather information?

zusammenhängende Posts

Toplist

Neuester Beitrag

Stichworte