[Rack] robots.txt

Jeff Tchang jeff.tchang at gmail.com
Tue Apr 30 21:44:17 UTC 2013


Googlebot (but not all search engines) respects some pattern matching.

   - To match a sequence of characters, use an asterisk (*). For instance,
   to block access to all subdirectories that begin with private:

   User-agent: Googlebot
   Disallow: /private*/




So in your example

User-Agent: *
Disallow: /wiki/Special*

Will work for google. I am not sure bingbot obeys it.

On Tue, Apr 30, 2013 at 2:38 PM, Andy Isaacson <adi at hexapodia.org> wrote:

> On Tue, Apr 30, 2013 at 02:31:37PM -0700, Ben Kochie wrote:
> > I added a robots.txt to https://noisebridge.net
> >
> > User-agent: *
> > Disallow: /wiki/Help
> > Disallow: /wiki/MediaWiki
> > Disallow: /wiki/Special:
> > Disallow: /wiki/Template
> > Disallow: /wiki/skins/
> >
> > I noticed bingbot is uselessly crawling the entire contents of
> > Special:RecentChanges.
>
> Is robots.txt a prefix, or a directory based exclusion scheme?  Will
> "Disallow: /wiki/Special:" cause bingbot to skip
> "/wiki/Special:RecentChanges"?
>
> -andy
> _______________________________________________
> Rack mailing list
> Rack at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/rack
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/rack/attachments/20130430/8be8f31a/attachment-0001.html>


More information about the Rack mailing list