Cloaking As It Stands Today From Google Viewpoint

Hi everyone,

The idea for posts this week stems from an interview between Eric Enge and John Mueller, a Webmaster Trends Analyst at Google Zurich. It is an eye opener in many ways. The dynamic nature of the search engine optimisation field keeps practitioners on their toes all the time.  Let us visit the concepts of cloaking as it stands today.

Cloaking in John Mueller’s words: The standard definition of cloaking is to show Googlebot something different from what you would show users. So, you could show Googlebot a decent friendly homepage on your site. When users come to your site, they would end up seeing something totally different.

It is a cheap trick that leaves the user fuming. She has clicked on the Google SERPs seeing a nice title and description for your site that proclaims selling beauty related products. But the user sees ads for Viagra and Cialis when she lands on your homepage. This is against the very foundations of Google’s ideology wherein the search results give the user the best experience in terms of targeting and relevance.

Eric then queries about the different levels of cloaking that arise, especially in database driven sites. If you are the owner of an ecommerce site selling polycotton shirts for example, the user can then sort the shirts by price, choose different colours etc. All these various choices at the product level can be done using a session ID or cookie.

These additional parameters do not need to be tacked on to the URL as it can make the URL structure very complex and give rise to duplicate content. Again, if you are using a Content Management System (CMS) that does not have this  provision, then you are going to find it very expensive to fix the problem.

It is important that both users and the Googlebot see the same content for a given URL. You may have to edit the URL that Googlebot sees by removing the unnecessary extra parameters to avoid the problems mentioned above and this is going to cost time and money. John says that the best method to use in this scenario is the “rel=canioncal” link element which avoids redirects and changing URL structure.

So, if a user selects the cheapest black polycotton shirt on your site, then you know that the colour and price are additional parameters. The ideal source page would be a product listing of all your polycotton shirts for which the URL can read www.yoursite.com/clothing/polycotton-shirts   Now in the specific case above, the URL can read www.yoursite.com/clothing/polycotton-shirts?colour=black&price=cheapest

You can now place, in the header of this parameter loaded page, the rel=canonical tag  that canonicalises this specific instance to the www.yoursite.com/clothing/polycotton-shirts  This is a clear signal (not a command) to Googlebot of the source page that needs to be indexed.

It is similar to a 301 redirect really except that it does not take you to that source page . It is a super sweet fix to avoid writing redirects for the programmers and you also avoid content duplication. It saves your time and money as a site owner.

John also mentions a potential problem in that if you put a canonical tag on all pages of your site pointing to the home page of your site, it is like 301 redirecting all the pages of your site to the home page. The net result is that you can completely remove your website from the Google index. But again, it is not exactly a 301 redirect and Google analyses if the home page is really canonical for the other page or the site owner is committing an error. One way of doing this is if the content is nearly identical on both the pages.

John feels that the ideal way to do it would be to have a cookie based session id tracking in the above scenario. But if it is too expensive to implement it, then the canonical tag is a good backup solution.  Also, it is wise to ensure that whatever page you serve, it should be the same to both users and Googlebot. This can arise if you have been serving a certain content on a specific URL to Googlebot and different URL to users.

So, if new users that come to your site through a session id riddled URL they found somewhere on the web, you are best off to serve them a canonicalised page. This would ensure the well being of your site in the long run. John also says that existing sites that are showing neat URLs to Googlebot should not panic as long as their intent is genuine. But for sites undergoing a site redesign or the like, then it is advisable to implement this strategy.

It is true that most CMSs adopt a session based tracking if they cannot use cookies.  If a user visiting your site does not have a cookie, the canonical approach can atleast wean her away from the session id approach. The end result should be that both the user and Googlebot see the same content under any circumstances.

We will go on to see cloaking that arises from use of Javascript and Flash in future posts.

At Netconcepts, we have the ability to buidl squeaky clean search engine friendly ecommerce sites and content management systems (CMS) that adopt the best practices. We have built impressive sites and continue to do so with our list of satisfied customers growing all the time. Feel free to check it out.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>