[Rack] robots.txt
Jeff Tchang
jeff.tchang at gmail.com
Tue Apr 30 21:44:17 UTC 2013
Googlebot (but not all search engines) respects some pattern matching.
- To match a sequence of characters, use an asterisk (*). For instance,
to block access to all subdirectories that begin with private:
User-agent: Googlebot
Disallow: /private*/
So in your example
User-Agent: *
Disallow: /wiki/Special*
Will work for google. I am not sure bingbot obeys it.
On Tue, Apr 30, 2013 at 2:38 PM, Andy Isaacson <adi at hexapodia.org> wrote:
> On Tue, Apr 30, 2013 at 02:31:37PM -0700, Ben Kochie wrote:
> > I added a robots.txt to https://noisebridge.net
> >
> > User-agent: *
> > Disallow: /wiki/Help
> > Disallow: /wiki/MediaWiki
> > Disallow: /wiki/Special:
> > Disallow: /wiki/Template
> > Disallow: /wiki/skins/
> >
> > I noticed bingbot is uselessly crawling the entire contents of
> > Special:RecentChanges.
>
> Is robots.txt a prefix, or a directory based exclusion scheme? Will
> "Disallow: /wiki/Special:" cause bingbot to skip
> "/wiki/Special:RecentChanges"?
>
> -andy
> _______________________________________________
> Rack mailing list
> Rack at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/rack
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.noisebridge.net/pipermail/rack/attachments/20130430/8be8f31a/attachment-0003.html>
More information about the Rack
mailing list