[Rack] robots.txt

Ben Kochie ben at nerp.net
Tue Apr 30 21:54:25 UTC 2013


The  Disallow: /wiki/Special: came from the mediawiki examples.

I added the additional /wiki/Special*

Also, would someone who likes doing this kind of thing update our 
mediawiki:

http://lists.wikimedia.org/pipermail/mediawiki-announce/2013-April/000127.html
http://lists.wikimedia.org/pipermail/mediawiki-announce/2013-April/000129.html

-ben

On Tue, 30 Apr 2013, Jeff Tchang wrote:

> 
> Googlebot (but not all search engines) respects some pattern matching.
>
>  *  To match a sequence of characters, use an asterisk (*). For instance, to block access to all
>     subdirectories that begin with private:
> 
> User-agent: Googlebot
> Disallow: /private*/
> 
> 
> So in your example
> 
> User-Agent: *
> Disallow: /wiki/Special*
> 
> Will work for google. I am not sure bingbot obeys it.
> 
> On Tue, Apr 30, 2013 at 2:38 PM, Andy Isaacson <adi at hexapodia.org> wrote:
>       On Tue, Apr 30, 2013 at 02:31:37PM -0700, Ben Kochie wrote:
>       > I added a robots.txt to https://noisebridge.net
>       >
>       > User-agent: *
>       > Disallow: /wiki/Help
>       > Disallow: /wiki/MediaWiki
>       > Disallow: /wiki/Special:
>       > Disallow: /wiki/Template
>       > Disallow: /wiki/skins/
>       >
>       > I noticed bingbot is uselessly crawling the entire contents of
>       > Special:RecentChanges.
>
>       Is robots.txt a prefix, or a directory based exclusion scheme?  Will
>       "Disallow: /wiki/Special:" cause bingbot to skip
>       "/wiki/Special:RecentChanges"?
>
>       -andy
>       _______________________________________________
>       Rack mailing list
>       Rack at lists.noisebridge.net
>       https://www.noisebridge.net/mailman/listinfo/rack
> 
> 
> 
>


More information about the Rack mailing list