Tampilkan postingan dengan label crawling and indexing. Tampilkan semua postingan
Tampilkan postingan dengan label crawling and indexing. Tampilkan semua postingan

5 common mistakes with rel=canonical

Webmaster Level: Intermediate to Advanced

Including a rel=canonical link in your webpage is a strong hint to search engines your preferred version to index among duplicate pages on the web. It’s supported by several search engines, including Yahoo!, Bing, and Google. The rel=canonical link consolidates indexing properties from the duplicates, like their inbound links, as well as specifies which URL you’d like displayed in search results. However, rel=canonical can be a bit tricky because it’s not very obvious when there’s a misconfiguration.


While the webmaster sees the “red velvet” page on the left in their browser, search engines notice on the webmaster’s unintended “blue velvet” rel=canonical on the right.

We recommend the following best practices for using rel=canonical:
  • A large portion of the duplicate page’s content should be present on the canonical version.
  • One test is to imagine you don’t understand the language of the content—if you placed the duplicate side-by-side with the canonical, does a very large percentage of the words of the duplicate page appear on the canonical page? If you need to speak the language to understand that the pages are similar; for example, if they’re only topically similar but not extremely close in exact words, the canonical designation might be disregarded by search engines.
  • Double-check that your rel=canonical target exists (it’s not an error or “soft 404”)
  • Verify the rel=canonical target doesn’t contain a noindex robots meta tag
  • Make sure you’d prefer the rel=canonical URL to be displayed in search results (rather than the duplicate URL)
  • Include the rel=canonical link in either the <head> of the page or the HTTP header
  • Specify no more than one rel=canonical for a page. When more than one is specified, all rel=canonicals will be ignored.
Mistake 1: rel=canonical to the first page of a paginated series

Imagine that you have an article that spans several pages:
  • example.com/article?story=cupcake-news&page=1
  • example.com/article?story=cupcake-news&page=2
  • and so on
Specifying a rel=canonical from page 2 (or any later page) to page 1 is not correct use of rel=canonical, as these are not duplicate pages. Using rel=canonical in this instance would result in the content on pages 2 and beyond not being indexed at all.


Good content (e.g., “cookies are superior nutrition” and “to vegetables”) is lost when specifying rel=canonical from component pages to the first page of a series.

In cases of paginated content, we recommend either a rel=canonical from component pages to a single-page version of the article, or to use rel=”prev” and rel=”next” pagination markup.


rel=canonical from component pages to the view-all page


If rel=canonical to a view-all page isn’t designated, paginated content can use rel=”prev” and rel=”next” markup.

Mistake 2: Absolute URLs mistakenly written as relative URLs


The <link> tag, like many HTML tags, accepts both relative and absolute URLs. Relative URLs include a path “relative” to the current page. For example, “images/cupcake.png” means “from the current directory go to the “images” subdirectory, then to cupcake.png.” Absolute URLs specify the full path—including the scheme like http://.

Specifying <link rel=canonical href=“example.com/cupcake.html” /> (a relative URL since there’s no “http://”) implies that the desired canonical URL is http://example.com/example.com/cupcake.html even though that is almost certainly not what was intended. In these cases, our algorithms may ignore the specified rel=canonical. Ultimately this means that whatever you had hoped to accomplish with this rel=canonical will not come to fruition.

Mistake 3: Unintended or multiple declarations of rel=canonical

Occasionally, we see rel=canonical designations that we believe are unintentional. In very rare circumstances we see simple typos, but more commonly a busy webmaster copies a page template without thinking to change the target of the rel=canonical. Now the site owner’s pages specify a rel=canonical to the template author’s site.


If you use a template, check that you didn’t also copy the rel=canonical specification.

Another issue is when pages include multiple rel=canonical links to different URLs. This happens frequently in conjunction with SEO plugins that often insert a default rel=canonical link, possibly unbeknownst to the webmaster who installed the plugin. In cases of multiple declarations of rel=canonical, Google will likely ignore all the rel=canonical hints. Any benefit that a legitimate rel=canonical might have offered will be lost.

In both these types of cases, double-checking the page’s source code will help correct the issue. Be sure to check the entire <head> section as the rel=canonical links may be spread apart.


Check the behavior of plugins by looking at the page’s source code.

Mistake 4: Category or landing page specifies rel=canonical to a featured article

Let’s say you run a site about desserts. Your dessert site has useful category pages like “pastry” and “gelato.” Each day the category pages feature a unique article. For instance, your pastry landing page might feature “red velvet cupcakes.” Because the “pastry” category page has nearly all the same content as the “red velvet cupcake” page, you add a rel=canonical from the category page to the featured individual article.

If we were to accept this rel=canonical, then your pastry category page would not appear in search results. That’s because the rel=canonical signals that you would prefer search engines display the canonical URL in place of the duplicate. However, if you want users to be able to find both the category page and featured article, it’s best to only have a self-referential rel=canonical on the category page, or none at all.


Remember that the canonical designation also implies the preferred display URL. Avoid adding a rel=canonical from a category or landing page to a featured article.

Mistake 5: rel=canonical in the <body>

The rel=canonical link tag should only appear in the <head> of an HTML document. Additionally, to avoid HTML parsing issues, it’s good to include the rel=canonical as early as possible in the <head>. When we encounter a rel=canonical designation in the <body>, it’s disregarded.

This is an easy mistake to correct. Simply double-check that your rel=canonical links are always in the <head> of your page, and as early as possible if you can.


rel=canonical designations in the <head> are processed, not the <body>.

Conclusion

To create valuable rel=canonical designations:
  • Verify that most of the main text content of a duplicate page also appears in the canonical page.
  • Check that rel=canonical is only specified once (if at all) and in the <head> of the page.
  • Check that rel=canonical points to an existent URL with good content (i.e., not a 404, or worse, a soft 404).
  • Avoid specifying rel=canonical from landing or category pages to featured articles as that will make the featured article the preferred URL in search results.
And, as always, please ask any questions in our Webmaster Help forum.

A new opt-out tool

Webmasters have several ways to keep their sites' content out of Google's search results. Today, as promised, we're providing a way for websites to opt out of having their content that Google has crawled appear on Google Shopping, Advisor, Flights, Hotels, and Google+ Local search.

Webmasters can now choose this option through our Webmaster Tools, and crawled content currently being displayed on Shopping, Advisor, Flights, Hotels, or Google+ Local search pages will be removed within 30 days.

We created a first steps cheat sheet for friends & family


Webmaster level: beginner
Everyone knows someone who just set up their first blog on Blogger, installed WordPress for the first time or maybe who had a web site for some time but never gave search much thought. We came up with a first steps cheat sheet for just these folks. It’s a short how-to list with basic tips on search engine-friendly design, that can help Google and others better understand the content and increase your site’s visibility. We made sure it’s available in thirteen languages. Please feel free to read it, print it, share it, copy and distribute it!

We hope this content will help those who are just about to start their webmaster adventure or have so far not paid too much attention to search engine-friendly design. Over time as you gain experience you may want to have a look at our more advanced Google SEO Starter Guide. As always we welcome all webmasters and site owners, new and experienced to join discussions on our Google Webmaster Help Forum.


Posted by Kaspar Szymanski, Search Quality Strategist, Dublin


Configuring URL Parameters in Webmaster Tools

Webmaster Level: Intermediate to Advanced

We recently filmed a video (with slides available) to provide more information about the URL Parameters feature in Webmaster Tools. The URL Parameters feature is designed for webmasters who want to help Google crawl their site more efficiently, and who manage a site with -- you guessed it -- URL parameters! To be eligible for this feature, the URL parameters must be configured in key/value pairs like item=swedish-fish or category=gummy-candy in the URL http://www.example.com/product.php?item=swedish-fish&category=gummy-candy.


Guidance for common cases when configuring URL Parameters. Music in the background masks the ongoing pounding of my neighbor’s construction!

URL Parameter settings are powerful. By telling us how your parameters behave and the recommended action for Googlebot, you can improve your site’s crawl efficiency. On the other hand, if configured incorrectly, you may accidentally recommend that Google ignore important pages, resulting in those pages no longer being available in search results. (There's an example provided in our improved Help Center article.) So please take care when adjusting URL Parameters settings, and be sure that the actions you recommend for Googlebot make sense across your entire site.

Website testing & Google search

Webmaster level: Advanced

We’ve gotten several questions recently about whether website testing—such as A/B or multivariate testing—affects a site’s performance in search results. We’re glad you’re asking, because we’re glad you’re testing! A/B and multivariate testing are great ways of making sure that what you’re offering really appeals to your users.

Before we dig into the implications for search, a brief primer:
Website testing is when you try out different versions of your website (or a part of your website), and collect data about how users react to each version. You use software to track which version causes users to do-what-you-want-them-to-do most often: which one results in the most purchases, or the most email signups, or whatever you’re testing for. After the test is finished you can update your website to use the “winner” of the test—the most effective content.

A/B testing is when you run a test by creating multiple versions of a page, each with its own URL. When users try to access the original URL, you redirect some of them to each of the variation URLs and then compare users’ behaviour to see which page is most effective.

Multivariate testing is when you use software to change differents parts of your website on the fly. You can test changes to multiple parts of a page—say, the heading, a photo, and the ‘Add to Cart’ button—and the software will show variations of each of these sections to users in different combinations and then statistically analyze which variations are the most effective. Only one URL is involved; the variations are inserted dynamically on the page.

So how does this affect what Googlebot sees on your site? Will serving different content variants change how your site ranks? Below are some guidelines for running an effective test with minimal impact on your site’s search performance.
  • No cloaking.
    Cloaking—showing one set of content to humans, and a different set to Googlebot—is against our Webmaster Guidelines, whether you’re running a test or not. Make sure that you’re not deciding whether to serve the test, or which content variant to serve, based on user-agent. An example of this would be always serving the original content when you see the user-agent “Googlebot.” Remember that infringing our Guidelines can get your site demoted or removed from Google search results—probably not the desired outcome of your test.
  • Use rel=“canonical”.
    If you’re running an A/B test with multiple URLs, you can use the rel=“canonical” link attribute on all of your alternate URLs to indicate that the original URL is the preferred version. We recommend using rel=“canonical” rather than a noindex meta tag because it more closely matches your intent in this situation. Let’s say you were testing variations of your homepage; you don’t want search engines to not index your homepage, you just want them to understand that all the test URLs are close duplicates or variations on the original URL and should be grouped as such, with the original URL as the canonical. Using noindex rather than rel=“canonical” in such a situation can sometimes have unexpected effects (e.g., if for some reason we choose one of the variant URLs as the canonical, the “original” URL might also get dropped from the index since it would get treated as a duplicate).
  • Use 302s, not 301s.
    If you’re running an A/B test that redirects users from the original URL to a variation URL, use a 302 (temporary) redirect, not a 301 (permanent) redirect. This tells search engines that this redirect is temporary—it will only be in place as long as you’re running the experiment—and that they should keep the original URL in their index rather than replacing it with the target of the redirect (the test page). JavaScript-based redirects are also fine.
  • Only run the experiment as long as necessary.
    The amount of time required for a reliable test will vary depending on factors like your conversion rates, and how much traffic your website gets; a good testing tool should tell you when you’ve gathered enough data to draw a reliable conclusion. Once you’ve concluded the test, you should update your site with the desired content variation(s) and remove all elements of the test as soon as possible, such as alternate URLs or testing scripts and markup. If we discover a site running an experiment for an unnecessarily long time, we may interpret this as an attempt to deceive search engines and take action accordingly. This is especially true if you’re serving one content variant to a large percentage of your users.
The recommendations above should result in your tests having little or no impact on your site in search results. However, depending on what types of content you’re testing, it may not even matter much if Googlebot crawls or indexes some of your content variations while you’re testing. Small changes, such as the size, color, or placement of a button or image, or the text of your “call to action” (“Add to cart” vs. “Buy now!”), can have a surprising impact on users’ interactions with your webpage, but will often have little or no impact on that page’s search result snippet or ranking. In addition, if we crawl your site often enough to detect and index your experiment, we’ll probably index the eventual updates you make to your site fairly quickly after you’ve concluded the experiment.

To learn more about website testing, check out these articles on Content Experiments, our free testing tool in Google Analytics. You can also ask questions about website testing in the Analytics Help Forum, or about search impact in the Webmaster Help Forum.

New Crawl Error alerts from Webmaster Tools

Webmaster level: All

Today we’re rolling out Crawl Error alerts to help keep you informed of the state of your site.

Since Googlebot regularly visits your site, we know when your site exhibits connectivity issues or suddenly spikes in pages returning HTTP error response codes (e.g. 404 File Not Found, 403 Forbidden, 503 Service Unavailable, etc). If your site is timing out or is exhibiting systemic errors when accessed by Googlebot, other visitors to your site might be having the same problem!

When we see such errors, we may send alerts –- in the form of messages in the Webmaster Tools Message Center –- to let you know what we’ve detected. Hopefully, given this increased communication, you can fix potential issues that may otherwise impact your site’s visitors or your site’s presence in search.

As we discussed in our blog post announcing the new Webmaster Tools Crawl Errors feature, we divide crawl errors into two types: Site Errors and URL Errors.

Site Error alerts for major site-wide problems

Site Errors represent an inability to connect to your site, and represent systemic issues rather than problems with specific pages. Here are some issues that might cause Site Errors:
  • Your DNS server is down or misconfigured.
  • Your web server itself is firewalled off.
  • Your web server is refusing connections from Googlebot.
  • Your web server is overloaded, or down.
  • Your site’s robots.txt is inaccessible.
These errors are global to a site, and in theory should never occur for a well-operating site (and don’t occur for the large majority of the sites we crawl). If Googlebot detects any appreciable number of these Site Errors, regardless of the size of your site, we’ll try to notify you in the form of a message in the Message Center:

Example of a Site Error alert
The alert provides the number of errors Googlebot encountered crawling your site, the overall crawl error connection rate for your site, a link to the appropriate section of Webmaster Tools to examine the data more closely, and suggestions as to how to fix the problem.

If your site shows a 100% error rate in one of these categories, it likely means that your site is either down or misconfigured in some way. If your site has an error rate less than 100% in any of these categories, it could just indicate a transient condition, but it could also mean that your site is overloaded or improperly configured. You may want to investigate these issues further, or ask about them on our forum.

We may alert you even if the overall error rate is very low — in our experience a well configured site shouldn’t have any errors in these categories.

URL Error anomaly alerts for potentially less critical issues

Whereas any appreciable number of Site Errors could indicate that your site is misconfigured, overloaded, or simply out of service, URL Errors (pages that return a non-200 HTTP code, or incorrectly return an HTTP 200 code in the case of soft 404 errors) may occur on any well-configured site. Because different sites have different numbers of pages and different numbers of external links, a count of errors that indicates a serious problem for a small site might be entirely normal for a large site.

That’s why for URL Errors we only send alerts when we detect a large spike in the number of errors for any of the five categories of errors (Server error, Soft 404, Access denied, Not found or Not followed). For example, if your site routinely has 100 pages with 404 errors, we won’t alert you if that number fluctuates minimally. However we might notify you when that count reaches a much higher number, say 500 or 1,000. Keep in mind that seeing 404 errors is not always bad, and can be a natural part of a healthy website (see our previous blog post: Do 404s hurt my site?).

A large spike in error count could be because something has changed on your site — perhaps a reconfiguration has changed the permissions for a section of your site, or a new version of a script is crashing regularly, or someone accidentally moved or deleted an entire directory, or a reorganization of your site causes external links to no longer work. It could also just be a transient spike, or could be because of external causes (someone has linked to non-existent pages), so there might not even be a problem; but when we see an unusually large number of errors for your site, we’ll let you know so you can investigate:

Example of a URL Error anomaly alert
The alert describes the category of web errors for which we’ve detected a spike, gives a link to the appropriate section of Webmaster Tools so that you can see what pages we think are problematic, and offers troubleshooting suggestions.

Enable Message forwarding to send alerts to your inbox

We know you’re busy, and that routinely checking Webmaster Tools just to check for new alerts might be something you forget to do. Consider turning on Message forwarding. We’ll send any Webmaster Tools messages to the email address of your choice.

Let us know what you think, and if you have any comments or suggestions on our new alerts please visit our forum.

Recommendations for building smartphone-optimized websites

Webmaster level: All

Every day more and more smartphones get activated and more websites are producing smartphone-optimized content. Since we last talked about how to build mobile-friendly websites, we’ve been working hard on improving Google’s support for smartphone-optimized content. As part of this effort, we launched Googlebot-Mobile for smartphones back in December 2011, which is specifically tasked with identifying such content.

Today we’d like to give you Google’s recommendations for building smartphone-optimized websites and explain how to do so in a way that gives both your desktop- and smartphone-optimized sites the best chance of performing well in Google’s search results.

Recommendations for smartphone-optimized sites

The full details of our recommendation can be found in our new help site, which we now summarize.

When building a website that targets smartphones, Google supports three different configurations:

  1. Sites that use responsive web design, i.e. sites that serve all devices on the same set of URLs, with each URL serving the same HTML to all devices and using just CSS to change how the page is rendered on the device. This is Google’s recommended configuration.

  2. Sites that dynamically serve all devices on the same set of URLs, but each URL serves different HTML (and CSS) depending on whether the user agent is a desktop or a mobile device.

  3. Sites that have a separate mobile and desktop sites.

Responsive web design

Responsive web design is a technique to build web pages that alter how they look using CSS3 media queries. That is, there is one HTML code for the page regardless of the device accessing it, but its presentation changes using CSS media queries to specify which CSS rules apply for the browser displaying the page. You can learn more about responsive web design from this blog post by Google's webmasters and in our recommendations.

Using responsive web design has multiple advantages, including:

  • It keeps your desktop and mobile content on a single URL, which is easier for your users to interact with, share, and link to and for Google’s algorithms to assign the indexing properties to your content.

  • Google can discover your content more efficiently as we wouldn't need to crawl a page with the different Googlebot user agents to retrieve and index all the content.

Device-specific HTML

However, we appreciate that for many situations it may not be possible or appropriate to use responsive web design. That’s why we support having websites serve equivalent content using different, device-specific, HTML. The device-specific HTML can be served on the same URL (a configuration called dynamic serving) or different URLs (such as www.example.com and m.example.com).

If your website uses a dynamic serving configuration, we strongly recommend using the Vary HTTP header to communicate to caching servers and our algorithms that the content may change for different user agents requesting the page. We also use this as a crawling signal for Googlebot-Mobile. More details are here.

As for the separate mobile site configuration, since there are many ways to do this, our recommendation introduces annotations that communicate to our algorithms that your desktop and mobile pages are equivalent in purpose; that is, the new annotations describe the relationship between the desktop and mobile content as alternatives of each other and should be treated as a single entity with each alternative targeting a specific class of device.

These annotations will help us discover your smartphone-optimized content and help our algorithms understand the structure of your content, giving it the best chance of performing well in our search results.

Conclusion

This blog post is only a brief summary of our recommendation for building smartphone-optimized websites. Please read the full recommendation and see which supported implementation is most suitable for your site and users. And, as always, please ask on our Webmaster Help forums if you have more questions.

1000 Words About Images

Webmaster level: All

Creativity is an important aspect of our lives and can enrich nearly everything we do. Say I'd like to make my teammate a cup of cool-looking coffee, but my creative batteries are empty; this would be (and is!) one of the many times when I look for inspiration on Google Images.


The images you see in our search results come from publishers of all sizes — bloggers, media outlets, stock photo sites — who have embedded these images in their HTML pages. Google can index image types formatted as BMP, GIF, JPEG, PNG and WebP, as well as SVG.

But how does Google know that the images are about coffee and not about tea? When our algorithms index images, they look at the textual content on the page the image was found on to learn more about the image. We also look at the page's title and its body; we might also learn more from the image’s filename, anchor text that points to it, and its "alt text;" we may use computer vision to learn more about the image and may also use the caption provided in the Image Sitemap if that text also exists on the page.

 To help us index your images, make sure that:
  • we can crawl both the HTML page the image is embedded in, and the image itself;
  • the image is in one of our supported formats: BMP, GIF, JPEG, PNG, WebP or SVG.
Additionally, we recommend:
  • that the image filename is related to the image’s content;
  • that the alt attribute of the image describes the image in a human-friendly way;
  • and finally, it also helps if the HTML page’s textual contents as well as the text near the image are related to the image.
Now some answers to questions we’ve seen many times:


Q: Why do I sometimes see Googlebot crawling my images, rather than Googlebot-Image?
A: Generally this happens when it’s not clear that a URL will lead to an image, so we crawl the URL with Googlebot first. If we find the URL leads to an image, we’ll usually revisit with Googlebot-Image. Because of this, it’s generally a good idea to allow crawling of your images and pages by both Googlebot and Googlebot-Image.

Q: Is it true that there’s a maximum file size for the images?
A: We’re happy to index images of any size; there’s no file size restriction.

Q: What happens to the EXIF, XMP and other metadata my images contain?
A: We may use any information we find to help our users find what they’re looking for more easily. Additionally, information like EXIF data may be displayed in the right-hand sidebar of the interstitial page that appears when you click on an image.


Q: Should I really submit an Image Sitemap? What are the benefits?
A: Yes! Image Sitemaps help us learn about your new images and may also help us learn what the images are about.


Q: I’m using a CDN to host my images; how can I still use an Image Sitemap?
A: Cross-domain restrictions apply only to the Sitemaps’ tag. In Image Sitemaps, the tag is allowed to point to a URL on another domain, so using a CDN for your images is fine. We also encourage you to verify the CDN’s domain name in Webmaster Tools so that we can inform you of any crawl errors that we might find.


Q: Is it a problem if my images can be found on multiple domains or subdomains I own — for example, CDNs or related sites?
A: Generally, the best practice is to have only one copy of any type of content. If you’re duplicating your images across multiple hostnames, our algorithms may pick one copy as the canonical copy of the image, which may not be your preferred version. This can also lead to slower crawling and indexing of your images.


Q: We sometimes see the original source of an image ranked lower than other sources; why is this?
A: Keep in mind that we use the textual content of a page when determining the context of an image. For example, if the original source is a page from an image gallery that has very little text, it can happen that a page with more textual context is chosen to be shown in search. If you feel you've identified very bad search results for a particular query, feel free to use the feedback link below the search results or to share your example in our Webmaster Help Forum.

SafeSearch

Our algorithms use a great variety of signals to decide whether an image — or a whole page, if we’re talking about Web Search — should be filtered from the results when the user’s SafeSearch filter is turned on. In the case of images some of these signals are generated using computer vision, but the SafeSearch algorithms also look at simpler things such as where the image was used previously and the context in which the image was used. 
One of the strongest signals, however, is self-marked adult pages. We recommend that webmasters who publish adult content mark up their pages with one of the following meta tags:

<meta name="rating" content="adult" />
<meta name="rating" content="RTA-5042-1996-1400-1577-RTA" />

Many users prefer not to have adult content included in their search results (especially if kids use the same computer). When a webmaster provides one of these meta tags, it helps to provide a better user experience because users don't see results which they don't want to or expect to see. 

As with all algorithms, sometimes it may happen that SafeSearch filters content inadvertently. If you think your images or pages are mistakenly being filtered by SafeSearch, please let us know using the following form

If you need more information about how we index images, please check out the section of our Help Center dedicated to images, read our SEO Starter Guide which contains lots of useful information, and if you have more questions please post them in the Webmaster Help Forum

How to move your content to a new location

Webmaster level: Intermediate

While maintaining a website, webmasters may decide to move the whole website or parts of it to a new location. For example, you might move content from a subdirectory to a subdomain, or to a completely new domain. Changing the location of your content can involve a bit of effort, but it’s worth doing it properly.

To help search engines understand your new site structure better and make your site more user-friendly, make sure to follow these guidelines:
  • It’s important to redirect all users and bots that visit your old content location to the new content location using 301 redirects. To highlight the relationship between the two locations, make sure that each old URL points to the new URL that hosts similar content. If you’re unable to use 301 redirects, you may want to consider using cross domain canonicals for search engines instead.
  • Check that you have both the new and the old location verified in the same Google Webmaster Tools account.
  • Make sure to check if the new location is crawlable by Googlebot using the Fetch as Googlebot feature. It’s important to make sure Google can actually access your content in the new location. Also make sure that the old URLs are not blocked by a robots.txt disallow directive, so that the redirect or rel=canonical can be found.
  • If you’re moving your content to an entirely new domain, use the Change of address option under Site configuration in Google Webmaster Tools to let us know about the change.
Change of address option in Google Webmaster Tools
Tell us about moving your content via Google Webmaster Tools
  • If you've also changed your site's URL structure, make sure that it's possible to navigate it without running into 404 error pages. Google Webmaster Tools may prove useful in investigating potentially broken links. Just look for Diagnostics > Crawl errors for your new site.
  • Check your Sitemap and verify that it’s up to date.
  • Once you've set up your 301 redirects, you can keep an eye on users to your 404 error pages to check that users are being redirected to new pages, and not accidentally ending up on broken URLs. When a user comes to a 404 error page on your site, try to identify which URL they were trying to access, why this user was not redirected to the new location of your content, and then make changes to your 301 redirect rules as appropriate.
  • Have a look at the Links to your site in Google Webmaster Tools and inform the important sites that link to your content about your new location.
  • If your site’s content is specific to a particular region you may want to double check the geotargeting preferences for your new site structure in Google Webmaster Tools.
  • As a general rule of thumb, try to avoid running two crawlable sites with completely or largely identical content without a 301 redirection or specifying a rel=”canonical”
  • Lastly, we recommend not implementing other major changes when you’re moving your content to a new location, like large-scale content, URL structure, or navigational updates. Changing too much at once may confuse users and search engines.
We hope you find these suggestions useful. If you happen to have further questions on how to move your content to a new location we’d like to encourage you to drop by our Google Webmaster Help Forum and seek advice from expert webmasters.

Video about pagination with rel=“next” and rel=“prev”

Webmaster Level: Beginner to Intermediate

If you’re curious about the rel=”next” and rel=prev” for paginated content announcement we made several months ago, we filmed a video covering more of the basics of pagination to help answer your questions. Paginated content includes things like an article that spans several URLs/pages, or an e-commerce product category that spans multiple pages. With rel=”next” and rel=”prev” markup, you can provide a strong hint to Google that you would like us to treat these pages as a logical sequence, thus consolidating their linking properties and usually sending searchers to the first page. Feel free to check out our presentation for more information:


This video on pagination covers the basics of rel=”next” and rel=”prev” and how it could be useful for your site.


Slides from the pagination video

Additional resources about pagination include:
  • Webmaster Central Blog post announcing support of rel=”next” and rel=”prev”
  • Webmaster Help Center article with more implementations of rel=”next” and rel=”prev
  • Webmaster Forum thread with our answers to the community’s in-depth questions, such as:

    Does rel=next/prev also work as a signal for only one page of the series (page 1 in most cases?) to be included in the search index? Or would noindex tags need to be present on page 2 and on?

    When you implement rel="next" and rel="prev" on component pages of a series, we'll then consolidate the indexing properties from the component pages and attempt to direct users to the most relevant page/URL. This is typically the first page. There's no need to mark page 2 to n of the series with noindex unless you're sure that you don't want those pages to appear in search results.

    Should I use the rel next/prev into [sic] the section of a blog even if the two contents are not strictly correlated (but they are just time-sequential)?

    In regard to using rel=”next” and rel=”prev” for entries in your blog that “are not strictly correlated (but they are just time-sequential),” pagination markup likely isn’t the best use of your time -- time-sequential pages aren’t nearly as helpful to our indexing process as semantically related content, such as pagination on component pages in an article or category. It’s fine if you include the markup on your time-sequential pages, but please note that it’s not the most helpful use case.

    We operate a real estate rental website. Our files display results based on numerous parameters that affect the order and the specific results that display. Examples of such parameters are “page number”, “records per page”, “sorting” and “area selection”...

    It sounds like your real estate rental site encounters many of the same issues that e-commerce sites face... Here are some ideas on your situation:

    1. It’s great that you are using the Webmaster Tools URL parameters feature to more efficiently crawl your site.

    2. It’s possible that your site can form a rel=”next” and rel=”prev” sequence with no parameters (or with default parameter values). It’s also possible to form parallel pagination sequences when users select certain parameters, such as a sequence of pages where there are 15 records and a separate sequence when a user selects 30 records. Paginating component pages, even with parameters, helps us more accurately index your content.

    3. While it’s fine to set rel=”canonical” from a component URL to a single view-all page, setting the canonical to the first page of a parameter-less sequence is considered improper usage. We make no promises to honor this implementation of rel=”canonical.”

Remember that if you have paginated content, it’s fine to leave it as-is and not add rel=”next” and rel=”prev” markup at all. But if you’re interested in pagination markup as a strong hint for us to better understand your site, we hope these resources help answer your questions!

Crawl Errors: The Next Generation

Webmaster level: All

Crawl errors is one of the most popular features in Webmaster Tools, and today we’re rolling out some very significant enhancements that will make it even more useful.

We now detect and report many new types of errors. To help make sense of the new data, we’ve split the errors into two parts: site errors and URL errors.

Site Errors

Site errors are errors that aren’t specific to a particular URL—they affect your entire site. These include DNS resolution failures, connectivity issues with your web server, and problems fetching your robots.txt file. We used to report these errors by URL, but that didn’t make a lot of sense because they aren’t specific to individual URLs—in fact, they prevent Googlebot from even requesting a URL! Instead, we now keep track of the failure rates for each type of site-wide error. We’ll also try to send you alerts when these errors become frequent enough that they warrant attention.

View site error rate and counts over time

Furthermore, if you don’t have (and haven’t recently had) any problems in these areas, as is the case for many sites, we won’t bother you with this section. Instead, we’ll just show you some friendly check marks to let you know everything is hunky-dory.

A site with no recent site-level errors

URL errors

URL errors are errors that are specific to a particular page. This means that when Googlebot tried to crawl the URL, it was able to resolve your DNS, connect to your server, fetch and read your robots.txt file, and then request this URL, but something went wrong after that. We break the URL errors down into various categories based on what caused the error. If your site serves up Google News or mobile (CHTML/XHTML) data, we’ll show separate categories for those errors.

URL errors by type with full current and historical counts

Less is more

We used to show you at most 100,000 errors of each type. Trying to consume all this information was like drinking from a firehose, and you had no way of knowing which of those errors were important (your homepage is down) or less important (someone’s personal site made a typo in a link to your site). There was no realistic way to view all 100,000 errors—no way to sort, search, or mark your progress. In the new version of this feature, we’ve focused on trying to give you only the most important errors up front. For each category, we’ll give you what we think are the 1000 most important and actionable errors.  You can sort and filter these top 1000 errors, let us know when you think you’ve fixed them, and view details about them.

Instantly filter and sort errors on any column

Some sites have more than 1000 errors of a given type, so you’ll still be able to see the total number of errors you have of each type, as well as a graph showing historical data going back 90 days. For those who worry that 1000 error details plus a total aggregate count will not be enough, we’re considering adding programmatic access (an API) to allow you to download every last error you have, so please give us feedback if you need more.

We've also removed the list of pages blocked by robots.txt, because while these can sometimes be useful for diagnosing a problem with your robots.txt file, they are frequently pages you intentionally blocked. We really wanted to focus on errors, so look for information about roboted URLs to show up soon in the "Crawler access" feature under "Site configuration".

Dive into the details

Clicking on an individual error URL from the main list brings up a detail pane with additional information, including when we last tried to crawl the URL, when we first noticed a problem, and a brief explanation of the error.

Details for each URL error

From the details pane you can click on the link for the URL that caused the error to see for yourself what happens when you try to visit it. You can also mark the error as “fixed” (more on that later!), view help content for the error type, list Sitemaps that contain the URL, see other pages that link to this URL, and even have Googlebot fetch the URL right now, either for more information or to double-check that your fix worked.

View pages which link to this URL

Take action!

One thing we’re really excited about in this new version of the Crawl errors feature is that you can really focus on fixing what’s most important first. We’ve ranked the errors so that those at the top of the priority list will be ones where there’s something you can do, whether that’s fixing broken links on your own site, fixing bugs in your server software, updating your Sitemaps to prune dead URLs, or adding a 301 redirect to get users to the “real” page. We determine this based on a multitude of factors, including whether or not you included the URL in a Sitemap, how many places it’s linked from (and if any of those are also on your site), and whether the URL has gotten any traffic recently from search.

Once you think you’ve fixed the issue (you can test your fix by fetching the URL as Googlebot), you can let us know by marking the error as “fixed” if you are a user with full access permissions. This will remove the error from your list.  In the future, the errors you’ve marked as fixed won’t be included in the top errors list, unless we’ve encountered the same error when trying to re-crawl a URL.

Select errors and mark them as fixed

We’ve put a lot of work into the new Crawl errors feature, so we hope that it will be very useful to you. Let us know what you think and if you have any suggestions, please visit our forum!

Using schema.org markup for videos

Webmaster level: All

Videos are one of the most common types of results on Google and we want to make sure that your videos get indexed. Today, we're also launching video support for schema.org. Schema.org is a joint effort between Google, Microsoft, Yahoo! and Yandex and is now the recommended way to describe videos on the web. The markup is very simple and can be easily added to most websites.

Adding schema.org video markup is just like adding any other schema.org data. Simply define an itemscope, an itemtype=”http://schema.org/VideoObject”, and make sure to set the name, description, and thumbnailUrl properties. You’ll also need either the embedURL — the location of the video player — or the contentURL — the location of the video file. A typical video player with markup might look like this:

<div itemscope itemtype="http://schema.org/VideoObject">
  <h2>Video: <span itemprop="name">Title</span></h2>
  <meta itemprop="duration" content="T1M33S" />
  <meta itemprop="thumbnailUrl" content="thumbnail.jpg" />
  <meta itemprop="embedURL"
    content="http://www.example.com/videoplayer.swf?video=123" />
  <object ...>
    <embed type="application/x-shockwave-flash" ...>
  </object>
  <span itemprop="description">Video description</span>
</div>


Using schema.org markup will not affect any Video Sitemaps or mRSS feeds you're already using. In fact, we still recommend that you also use a Video Sitemap because it alerts us of any new or updated videos faster and provides advanced functionality such as country and platform restrictions.

Since this means that there are now a number of ways to tell Google about your videos, choosing the right format can seem difficult. In order to make the video indexing process as easy as possible, we’ve put together a series of videos and articles about video indexing in our new Webmasters EDU microsite.

For more information, you can go through the Webmasters EDU video articles, read the full schema.org VideoObject specification, or ask questions in the Webmaster Help Forum. We look forward to seeing more of your video content in Google Search.

Introducing smartphone Googlebot-Mobile

Webmaster level: All

With the number of smartphone users rapidly rising, we’re seeing more and more websites providing content specifically designed to be browsed on smartphones. Today we are happy to announce that Googlebot-Mobile now crawls with a smartphone user-agent in addition to its previous feature phone user-agents. This is to increase our coverage of smartphone content and to provide a better search experience for smartphone users.

Here are the main user-agent strings that Googlebot-Mobile now uses:

  • Feature phones Googlebot-Mobile:

    • SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
    • DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  • Smartphone Googlebot-Mobile:

    • Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

The content crawled by smartphone Googlebot-Mobile will be used primarily to improve the user experience on mobile search. For example, the new crawler may discover content specifically optimized to be browsed on smartphones as well as smartphone-specific redirects.

One new feature we’re also launching that uses these signals is Skip Redirect for Smartphone-Optimized Pages. When we discover a URL in our search results that redirects smartphone users to another URL serving smartphone-optimized content, we change the link target shown in the search results to point directly to the final destination URL. This removes the extra latency the redirect introduces leading to a saving of 0.5-1 seconds on average when visiting landing page for such search results.

Since all Googlebot-Mobile user-agents identify themselves as a specific kind of mobile, please treat each Googlebot-Mobile request as you would a human user with the same phone user-agent. This, and other guidelines are described in our previous blog post and they still apply, except for those referring to smartphones which we are updating today. If your site has treated Googlebot-Mobile specially based on the fact that it only crawls with feature phone user-agents, we strongly recommend reviewing this policy and serving the appropriate content based on the Googlebot-Mobile’s user-agent, so that both your feature phone and smartphone content will be indexed properly.

If you have more questions, please ask on our Webmaster Help forums.

GET, POST, and safely surfacing more of the web

Webmaster Level: Intermediate to Advanced

As the web evolves, Google’s crawling and indexing capabilities also need to progress. We improved our indexing of Flash, built a more robust infrastructure called Caffeine, and we even started crawling forms where it makes sense. Now, especially with the growing popularity of JavaScript and, with it, AJAX, we’re finding more web pages requiring POST requests -- either for the entire content of the page or because the pages are missing information and/or look completely broken without the resources returned from POST. For Google Search this is less than ideal, because when we’re not properly discovering and indexing content, searchers may not have access to the most comprehensive and relevant results.

We generally advise to use GET for fetching resources a page needs, and this is by far our preferred method of crawling. We’ve started experiments to rewrite POST requests to GET, and while this remains a valid strategy in some cases, often the contents returned by a web server for GET vs. POST are completely different. Additionally, there are legitimate reasons to use POST (e.g., you can attach more data to a POST request than a GET). So, while GET requests remain far more common, to surface more content on the web, Googlebot may now perform POST requests when we believe it’s safe and appropriate.

We take precautions to avoid performing any task on a site that could result in executing an unintended user action. Our POSTs are primarily for crawling resources that a page requests automatically, mimicking what a typical user would see when they open the URL in their browser. This will evolve over time as we find better heuristics, but that’s our current approach.

Let’s run through a few POSTs request scenarios that demonstrate how we’re improving our crawling and indexing to evolve with the web.

Examples of Googlebot’s POST requests
  • Crawling a page via a POST redirect
    <html>
      <body onload="document.foo.submit();">
        <form name="foo" action="request.php" method="post">       <input type="hidden" name="bar" value="234"/>
        </form>
      </body>
    </html>
  • Crawling a resource via a POST XMLHttpRequest
    In this step-by-step example, we improve both the indexing of a page and its Instant Preview by following the automatic XMLHttpRequest generated as the page renders.

    1. Google crawls the URL, yummy-sundae.html.
    2. Google begins indexing yummy-sundae.html and, as a part of this process, decides to attempt to render the page to better understand its content and/or generate the Instant Preview.
    3. During the render, yummy-sundae.html automatically sends an XMLHttpRequest for a resource, hot-fudge-info.html, using the POST method.
      <html>
        <head>
          <title>Yummy Sundae</title>
          <script src="jquery.js"></script>
        </head>
        <body>
          This page is about a yummy sundae.
          <div id="content"></div>
          <script type="text/javascript">
            $(document).ready(function() {
              $.post('hot-fudge-info.html', function(data)
                {$('#content').html(data);});
            });
          </script>
        </body>
      </html>
    4. The URL requested through POST, hot-fudge-info.html, along with its data payload, is added to Googlebot’s crawl queue.
    5. Googlebot performs a POST request to crawl hot-fudge-info.html.
    6. Google now has an accurate representation of yummy-sundae.html for Instant Previews. In certain cases, we may also incorporate the contents of hot-fudge-info.html into yummy-sundae.html.
    7. Google completes the indexing of yummy-sundae.html.
    8. User searches for [hot fudge sundae].
    9. Google’s algorithms can now better determine how yummy-sundae.html is relevant for this query, and we can properly display a snapshot of the page for Instant Previews.
Improving your site’s crawlability and indexability

General advice for creating crawlable sites is found in our Help Center. For webmasters who want to help Google crawl and index their content and/or generate the Instant Preview, here are a few simple reminders:
  • Prefer GET for fetching resources, unless there’s a specific reason to use POST.
  • Verify that we're allowed to crawl the resources needed to render your page. In the example above, if hot-fudge-info.html is disallowed by robots.txt, Googlebot won't fetch it. More subtly, if the JavaScript code that issues the XMLHttpRequest is located in an external .js file disallowed by robots.txt, we won't see the connection between yummy-sundae.html and hot-fudge-info.html, so even if the latter is not disallowed itself, that may not help us much. We've seen even more complicated chains of dependencies in the wild. To help Google better understand your site it's almost always better to allow Googlebot to crawl all resources.

    You can test whether resources are blocked through Webmaster Tools “Labs -> Instant Previews.”
  • Make sure to return the same content to Googlebot as is returned to users’ web browsers. Cloaking (sending different content to Googlebot than to users) is a violation of our Webmaster Guidelines because, among other things, it may cause us to provide a searcher with an irrelevant result -- the content the user views in their browser may be a complete mismatch from what we crawled and indexed. We’ve seen numerous POST-request examples where a webmaster non-maliciously cloaked (which is still a violation), and their cloaking -- on even the smallest of changes -- then caused JavaScript errors that prevented accurate indexing and completely defeated their reason for cloaking in the first place. Summarizing, if you want your site to be search-friendly, cloaking is an all-around sticky situation that’s best to avoid.

    To verify that you're not accidentally cloaking, you can use Instant Previews within Webmaster Tools, or try setting the User-Agent string in your browser to something like:

    Mozilla/5.0 (compatible; Googlebot/2.1;
      +http://www.google.com/bot.html)

    Your site shouldn't look any different after such a change. If you see a blank page, a JavaScript error, or if parts of the page are missing or different, that means that something's wrong.
  • Remember to include important content (i.e., the content you’d like indexed) as text, visible directly on the page and without requiring user-action to display. Most search engines are text-based and generally work best with text-based content. We’re always improving our ability to crawl and index content published in a variety of ways, but it remains a good practice to use text for important information.
Controlling your content

If you’d like to prevent content from being crawled or indexed for Google Web Search, traditional robots.txt directives remain the best method. To prevent the Instant Preview for your page(s), please see our Instant Previews FAQ which describes the “Google Web Preview” User-Agent and the nosnippet meta tag.

Moving forward

We’ll continue striving to increase the comprehensiveness of our index so searchers can find more relevant information. And we expect our crawling and indexing capability to improve and evolve over time, just like the web itself. Please let us know if you have questions or concerns.

View-all in search results

Webmaster level: Intermediate to Advanced

User testing has taught us that searchers much prefer the view-all, single-page version of content over a component page containing only a portion of the same information with arbitrary page breaks (which cause the user to click “next” and load another URL).


Searchers often prefer the view-all vs. paginated content with arbitrary page breaks and worse latency.

Therefore, to improve the user experience, when we detect that a content series (e.g. page-1.html, page-2.html, etc.) also contains a single-page version (e.g. page-all.html), we’re now making a larger effort to return the single-page version in search results. If your site has a view-all option, there’s nothing you need to do; we’ll work to do it on your behalf. Also, indexing properties, like links, will be consolidated from the component pages in the series to the view-all page.

However, high latency can make the view-all less preferred

Interestingly, the cases when users didn’t prefer the view-all page were correlated with high latency (e.g., when the view-all page took a while to load, say, because it contained many images). This makes sense because we know users are less satisfied with slow results. So while a view-all page is commonly desired, as a webmaster it’s important to balance this preference with the page’s load time and overall user experience.

Best practices for a series of content
  1. If your site includes view-all pages

    We aim to detect the view-all version of your content and, if available, its associated component pages. There’s nothing more you need to do! However, if you’d like to make it more explicit to us, you can include rel=”canonical” from your component pages to your view-all to increase the likelihood that we detect your series of pages appropriately.


    rel=”canonical” can specify the superset of content (i.e. the view-all page, in this case page-all.html) from the same information in a series of URLs.

    Why does this work?

    In the diagram, page-2.html of a series may specify the canonical target as page-all.html because page-all.html is a superset of page-2.html's content. When a user searches for a query term and page-all.html is selected in search results, even if the query most related to page-2.html, we know the user will still see page-2.html’s relevant information within page-all.html.


    On the other hand, page-2.html shouldn’t designate page-1.html as the canonical because page-2.html’s content isn’t included on page-1.html. It’s possible that a user’s search query is relevant to content on page-2.html, but if page-2.html’s canonical is set to page-1.html, the user could then select page-1.html in search results and find herself in a position where she has to further navigate to a different page to arrive at the desired information. That’s a poor experience for the user, a suboptimal result from us, and it could also bring poorly targeted traffic to your site.


    However, if you strongly desire your view-all page not to appear in search results: 1) make sure the component pages in the series don’t include rel=”canonical” to the view-all page, and 2) mark the view-all page as “noindex” using any of the standard methods.
  2. If you’d like to surface individual, component pages (or there’s no view-all available)

    It may be the case that one or both of the situations below apply to your site:

    • The view-all page is undesirable as a search result (e.g., load time too high or too difficult for users to navigate).
    • Your users prefer the multi-page experience and to be directed to a component page in search results, rather than the view-all page.

    If so, you can use standard HTML rel=”next” and rel=”prev” elements to specify a relationship between the component pages in your series of content. If done correctly, Google will generally strive to:

    • Consolidate indexing properties, such as links, between the component pages/URLs.
    • Send users to the most relevant page/URL from the component pages. Typically, the most relevant page is the first page of your content, but our algorithms may point users to one of the component pages in the series.

It’s not uncommon for webmasters to incorrectly use rel=”canonical” from component pages to the first page of their series (e.g. page-2.html with rel=”canonical” to page-1.html). We recommend against this implementation because the component pages don’t actually contain duplicate content. Using rel=”next” and rel=”prev” is far more appropriate.

Summary

Because users generally prefer the view-all option in search results, we’re making more of an effort to properly detect and serve this version to searchers. If you have a series of content, there’s nothing more you need to do. If you’d like to hint more to Google how best to serve users your information:
  1. To better optimize your view-all page, you can use rel=”canonical” from component pages to the single-page version; otherwise,
  2. If a view-all page doesn’t provide a good user experience for your site, you can use the rel=”next” and rel=”prev” attributes as a strong hint for Google to identify the series of pages and still surface a component page in results.

Questions?

As always, feel free to ask in our Webmaster Help Forum.

Pagination with rel=“next” and rel=“prev”

Webmaster level: Intermediate to Advanced

Much like rel=”canonical” acts a strong hint for duplicate content, you can now use the HTML link elements rel=”next” and rel=”prev” to indicate the relationship between component URLs in a paginated series. Throughout the web, a paginated series of content may take many shapes—it can be an article divided into several component pages, or a product category with items spread across several pages, or a forum thread divided into a sequence of URLs. Now, if you choose to include rel=”next” and rel=”prev” markup on the component pages within a series, you’re giving Google a strong hint that you’d like us to:
  • Consolidate indexing properties, such as links, from the component pages/URLs to the series as a whole (i.e., links should not remain dispersed between page-1.html, page-2.html, etc., but be grouped with the sequence).
  • Send users to the most relevant page/URL—typically the first page of the series.


The relationship between component URLs in a series can now be indicated to Google through rel=”next” and rel=”prev”.

There’s an exception to the rel=”prev” and rel=”next” implementation: If, alongside your series of content, you also offer users a view-all page, or if you’re considering a view-all page, please see our post on View-all in search results for more information. Because view-all pages are most commonly preferred by searchers, we do our best to surface this version when appropriate in results rather than a component page (component pages are more likely to surface with rel=”next” and rel=”prev”).

If you don’t have a view-all page or you’d like to override Google returning a view-all page, you can use rel="next" and rel="prev" as described in this post.

For information on paginated configurations that include a view-all page, please see our post on View-all in search results.

Outlining your options

Here are three options for a series:
  1. Leave whatever you have exactly as-is. Paginated content exists throughout the web and we’ll continue to strive to give searchers the best result, regardless of the page’s rel=”next”/rel=”prev” HTML markup—or lack thereof.
  2. If you have a view-all page, or are considering a view-all page, see our post on View-all in search results.
  3. Hint to Google the relationship between the component URLs of your series with rel=”next” and rel=”prev”. This helps us more accurately index your content and serve to users the most relevant page (commonly the first page). Implementation details below.

Implementing rel=”next” and rel=”prev”

If you prefer option 3 (above) for your site, let’s get started! Let’s say you have content paginated into the URLs:

http://www.example.com/article?story=abc&page=1
http://www.example.com/article?story=abc&page=2
http://www.example.com/article?story=abc&page=3
http://www.example.com/article?story=abc&page=4

On the first page, http://www.example.com/article?story=abc&page=1, you’d include in the <head> section:
<link rel="next" href="http://www.example.com/article?story=abc&page=2" />

On the second page, http://www.example.com/article?story=abc&page=2:
<link rel="prev" href="http://www.example.com/article?story=abc&page=1" />
<link rel="next" href="http://www.example.com/article?story=abc&page=3" />

On the third page, http://www.example.com/article?story=abc&page=3:
<link rel="prev" href="http://www.example.com/article?story=abc&page=2" />
<link rel="next" href="http://www.example.com/article?story=abc&page=4" />

And on the last page, http://www.example.com/article?story=abc&page=4:
<link rel="prev" href="http://www.example.com/article?story=abc&page=3" />

A few points to mention:
  • The first page only contains rel=”next” and no rel=”prev” markup.
  • Pages two to the second-to-last page should be doubly-linked with both rel=”next” and rel=”prev” markup.
  • The last page only contains markup for rel=”prev”, not rel=”next”.
  • rel=”next” and rel=”prev” values can be either relative or absolute URLs (as allowed by the <link> tag). And, if you include a <base> link in your document, relative paths will resolve according to the base URL.
  • rel=”next” and rel=”prev” only need to be declared within the <head> section, not within the document <body>.
  • We allow rel=”previous” as a syntactic variant of rel=”prev” links.
  • rel="next" and rel="previous" on the one hand and rel="canonical" on the other constitute independent concepts. Both declarations can be included in the same page. For example, http://www.example.com/article?story=abc&page=2&sessionid=123 may contain:

    <link rel="canonical" href="http://www.example.com/article?story=abc&page=2”/>
    <link rel="prev" href="http://www.example.com/article?story=abc&page=1&sessionid=123" />
    <link rel="next" href="http://www.example.com/article?story=abc&page=3&sessionid=123" />

  • rel=”prev” and rel=”next” act as hints to Google, not absolute directives.
  • When implemented incorrectly, such as omitting an expected rel="prev" or rel="next" designation in the series, we'll continue to index the page(s), and rely on our own heuristics to understand your content.

Questions?
More information can be found in our Help Center, or join the conversation in our Webmaster Help Forum!