[Rack] robots.txt

Ben Kochie ben at nerp.net
Tue Apr 30 21:56:15 UTC 2013


One additional amusing thing I found.

I blocked the bingbot IP range with a DROP rule.  Within a min of droppint 
bingbot's IP range, "msnbot" came by and scraped the new robots.txt from 
another IP range.

I removed the block, and it looks to be working correctly now.

-ben

On Tue, 30 Apr 2013, Ben Kochie wrote:

> The  Disallow: /wiki/Special: came from the mediawiki examples.
>
> I added the additional /wiki/Special*
>
> Also, would someone who likes doing this kind of thing update our mediawiki:
>
> http://lists.wikimedia.org/pipermail/mediawiki-announce/2013-April/000127.html
> http://lists.wikimedia.org/pipermail/mediawiki-announce/2013-April/000129.html
>
> -ben
>
> On Tue, 30 Apr 2013, Jeff Tchang wrote:
>
>> 
>> Googlebot (but not all search engines) respects some pattern matching.
>>
>>  *  To match a sequence of characters, use an asterisk (*). For instance, 
>> to block access to all
>>     subdirectories that begin with private:
>> 
>> User-agent: Googlebot
>> Disallow: /private*/
>> 
>> 
>> So in your example
>> 
>> User-Agent: *
>> Disallow: /wiki/Special*
>> 
>> Will work for google. I am not sure bingbot obeys it.
>> 
>> On Tue, Apr 30, 2013 at 2:38 PM, Andy Isaacson <adi at hexapodia.org> wrote:
>>       On Tue, Apr 30, 2013 at 02:31:37PM -0700, Ben Kochie wrote:
>>       > I added a robots.txt to https://noisebridge.net
>>       >
>>       > User-agent: *
>>       > Disallow: /wiki/Help
>>       > Disallow: /wiki/MediaWiki
>>       > Disallow: /wiki/Special:
>>       > Disallow: /wiki/Template
>>       > Disallow: /wiki/skins/
>>       >
>>       > I noticed bingbot is uselessly crawling the entire contents of
>>       > Special:RecentChanges.
>>
>>       Is robots.txt a prefix, or a directory based exclusion scheme?  Will
>>       "Disallow: /wiki/Special:" cause bingbot to skip
>>       "/wiki/Special:RecentChanges"?
>>
>>       -andy
>>       _______________________________________________
>>       Rack mailing list
>>       Rack at lists.noisebridge.net
>>       https://www.noisebridge.net/mailman/listinfo/rack
>> 
>> 
>> 
>


More information about the Rack mailing list