A robots.txt is not an effective way of excluding your site from crawlers.
If you want to exclude your staging site from Google using robots.txt without running the risk to forget deleting the file once you go live, name the file robots.exclude.txt instead.
In your Apache Vhost config, rewrite requests for the staging server only:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^staging\.project\.com$
RewriteRule ^robots\.txt$ /robots.exclude.txt
Your robots.exclude.txt looks like this:
# This file is returned for /robots.txt on staging servers
User-agent: *
Disallow: /
Important Note: If your setup is incorrect and /robots.txt is not accessible, it means there is no protection at all!
Posted by Henning Koch to makandra dev (2010-09-09 07:51)