[Rack] robots.txt
Ben Kochie
ben at nerp.net
Tue Apr 30 21:54:25 UTC 2013
The Disallow: /wiki/Special: came from the mediawiki examples.
I added the additional /wiki/Special*
Also, would someone who likes doing this kind of thing update our
mediawiki:
http://lists.wikimedia.org/pipermail/mediawiki-announce/2013-April/000127.html
http://lists.wikimedia.org/pipermail/mediawiki-announce/2013-April/000129.html
-ben
On Tue, 30 Apr 2013, Jeff Tchang wrote:
>
> Googlebot (but not all search engines) respects some pattern matching.
>
> * To match a sequence of characters, use an asterisk (*). For instance, to block access to all
> subdirectories that begin with private:
>
> User-agent: Googlebot
> Disallow: /private*/
>
>
> So in your example
>
> User-Agent: *
> Disallow: /wiki/Special*
>
> Will work for google. I am not sure bingbot obeys it.
>
> On Tue, Apr 30, 2013 at 2:38 PM, Andy Isaacson <adi at hexapodia.org> wrote:
> On Tue, Apr 30, 2013 at 02:31:37PM -0700, Ben Kochie wrote:
> > I added a robots.txt to https://noisebridge.net
> >
> > User-agent: *
> > Disallow: /wiki/Help
> > Disallow: /wiki/MediaWiki
> > Disallow: /wiki/Special:
> > Disallow: /wiki/Template
> > Disallow: /wiki/skins/
> >
> > I noticed bingbot is uselessly crawling the entire contents of
> > Special:RecentChanges.
>
> Is robots.txt a prefix, or a directory based exclusion scheme? Will
> "Disallow: /wiki/Special:" cause bingbot to skip
> "/wiki/Special:RecentChanges"?
>
> -andy
> _______________________________________________
> Rack mailing list
> Rack at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/rack
>
>
>
>
More information about the Rack
mailing list