Wednesday, March 18, 2009

Afraid of being banned by google for 'cloacking?'

Cloacking, basically, is idea of presenting different content to normal users and to bots. This malicious technique is used by bad people to increase their ranking at Google, and get traffic, while presenting to user content, which was not really requested by user. And there are some good news in the end of the article :)

Example of Cloacking (do not do like this!)
Here is just a rough example. Imagine, there's a page on the web with url http://exampledomain.url
If Google crawler gets to this page, it is presented with one version of page - clean, structured, full of text, headings, etc. This version of the page has lots of keywords, say 'free software download'. It is ranked good by Google bot.
It is possible to understand, who is 'knocking on the door' - machine or real user.
So, if an internet user comes to this page, he sees a face of some PR company showing up.

This is considered to be cloacking, and site may be removed by Google personnel from Google indices once they discover such bad behavior.

Problems for good guys
Ok, this anti-cloacking technology makes search indexes more clear, readable and reliable. But what if you have fully-dynamical site? Yes, site which is fully powered by javascript, or flash?
If its navigation done with javascript - you have problems.
If your application is build with the Single Page Interface - there's no correct way to make it indexable. (this is how guys from Redmond understand SPI)

Good news (for good guys only)
"The only hard and fast rule is to show Googlebot the exact same thing as your users."
These is very good news (yes, for me it's news. I know that posting has been published @07). This means, that your site will not go banned, if you show the same content for same URLs, but in different way.
So, some links:
Ok. Gone for now - will post something on this topic soon...

4 comments:

Dmitry said...

Some popular web resources uses cloacking and have not think that it is bad
http://www.seonews.ru/analytics/detail/9078.php

Dmitry said...

Today I have found this on google site:

Quality guidelines - specific guidelines
Avoid hidden text or hidden links.

Don't use cloaking or sneaky redirects.

Don't send automated queries to Google.

Don't load pages with irrelevant keywords.

Don't create multiple pages, subdomains, or domains with substantially duplicate content.

Don't create pages with malicious behavior, such as phishing or installing viruses, trojans, or other badware.

Avoid "doorway" pages created just for search engines, or other "cookie cutter" approaches such as affiliate programs with little or no original content.

If your site participates in an affiliate program, make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first.

Alexander said...

http://news.bbc.co.uk/1/hi/technology/4685750.stm

Google has banned BMW for cloaking (they used Flash version for users and text - for bots and browser not supporting flash).

AFAIK basic idea is to not to go too far in cloaking.

What I am going to do, is browser-inside-browser, let's say so. It would display the same content, but additionally would give good navigation, history control, additional tools, etc.

As for google webmaster guidelines, good words are
"Make pages primarily for users, not for search engines. Don't deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking."

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769

Alexander said...

By the way.

Russian gaming site uses 'Bot-capture link' for this decision.
(cyrillic)
http://www.ag.ru/

In the bottom navigation section, there is link called 'Ловушка для ботов' ('Bot capture')

If you navigate with this link, AG will consider you a bot.