Wikipedia Account Request System – Password Storage

The current ACC system has some really useless bits which are hard to change, such as the password storage system. At the moment, the database is filled with “securely” stored passwords, such as “5f4dcc3b5aa765d61d8327deb882cf99”. Any quick Google search will quickly tell you exactly how the passwords are currently stored, a simple MD5 hash. This is quite clearly inadequate, so as part of the rewrite I’ve been aiming to store the passwords much more securely.

In all the examples, I’m going to use the password “password”.

At the moment, it’s simple to set a password, just store

md5("password");

into the database. It’s also simple to check the password, just check

md5($suppliedpassword) === $storedpassword

However, I was wanting to store the passwords with a salt, a different salt for each user – hence making cracking the MD5 hash much less feasible.

The function I’m now using to encrypt a password is this:

The $2$ at the front indicates the version of the password hash for later use. For a password “password” and a username “username”, this gives the encrypted result $2$8c6e7b658b4be4bb325870a1764ca4fb

When a password is checked, the code looks at the first three chars of the stored password, and determines if it matches $2$ or not. If it does, the provided password is encrypted with the new hashing function, and compared to the stored password. If they match, it’s the right password.

If the first three chars are not $2$, then it hashes the password using the old method, compares it, and if it matches, takes the provided password, hashes it with the new function, saves it to the database, and returns that it’s the right password.

This has the effect of being transparent to the user, but increasing the security of their password the first time they log in to the new system.

Wikipedia Account Request System

I thought it was about time I did a bit of a technical post on the new Wikipedia Account Request System that’s been sat around slowly being worked on over what’s nearly a year(!) now.

It’s still a long way off, but I’ve not had time to actually buckle down and do work on it, so I’m hoping that I’ll be able to spend a bit more time with it in the near future.

Since the migration to GitHub, I’ve been doing quite a bit of development work on it, and have recently (semi) finalised the database, which will hopefully speed things up a bit, and stop me from saying “ooh, let’s do this with the database”, “nah, nevermind”, “ooh, let’s do this instead”, etc.

The database finalisation comes after writing the conversion script to convert the database from the current format into the new format – there’s roughly 35 operations to be done to make the database sort-of OK, 28 of which are done on one single database table.

I’m taking this opportunity to make these somewhat huge database changes to the core of the system as there’s not much that’s using the database at the moment in the new system, and a huge migration would have to happen in order to swap from one system to another anyway, so I’m not too fussed about making more changes like this.

As the developers of the current system will know, the code is quite frankly shocking. I’m pretty certain that SQL injection and XSS attacks are prevented, but only because we apply about 15000 sanitisation operations to the input data, mangling anything that’s remotely cool such as unicode chars – to cite a recent example: • – into a mess that MIGHT be displayed correctly on the tool, but any other areas just don’t work. In this case, MediaWiki rejected it as a bad title, because it was passed • instead of •.

The new system should hopefully solve some of these issues.

For starters, all the database quote escaping is going – I’m not even going to do database input sanitising – and I’m going to actively reject any change that adds it.

There’s a reason for this, and that is because of the database abstraction layer I’m using for this new system – PDO.

PDO handles all the database connection details for me automatically, and supports both raw SQL queries, and prepared statements. Where the former requires sanitisation to be secure, the latter doesn’t. You simply pass in place-holders (called parameters) to the query where your input goes. You can then bind values or variables to the parameters, and execute the query. Because the query and parameters are passed separately to the server, no sanitisation ever needs to happen because it’s just impossible to inject anything in the first place.

The really cool thing that I’m planning to (ab)use a lot is the ability to retrieve a database row from the database as an instance of a class you’ve previously defined.

The above is an actual excerpt from the User class of WARS at the moment, and the database structure of the acc_user table.

As you can see, the class has a set of fields which exactly match the names of the columns in the table. This is a key part of making the code work – all you need to do is create a query which pulls out all the columns for one row in the database, pass it the parameter which tells it which row to return, and then tell it to fetch an object, telling it which class to instantiate. A simple four-line function dealing with the searching and retrieval from the database, and instantiating a class with the relevant data – it’s actually beautiful! :D

My plan is to use this structure of data access objects for all the other database tables, and then I should be able to deal with the entire system on a purely object-based level, rather than constantly mashing in database queries here and there.

The worst possible way to guard against SQL injections

I shouldn’t need to stress the importance of sanitising user input on web forms. I also shouldn’t need to stress this importance of government websites being secure.

I also shouldn’t need to stress the insecurity of client-side code.

However, it seems Cadw (“the historic environment service of the Welsh Assembly Government”) seems to be stuck a bit too far in the past before people started exploiting websites for fun or profit, as I recently discovered from this tweet:

Now, don’t get me wrong – JavaScript is a really useful way to make websites look better and provide cool interactive experiences.

However, all too often I see JavaScript being used in one or both of the worst possible uses for it:

  1. Security
  2. Adding functionality

Both of these points obviously need exploring further.

Adding functionality

JavaScript is commonly used to add functionality to websites without problems. For example, JavaScript is used to provide most editing toolbars on web-based editors like Google Mail, Wikipedia, and WordPress. This functionality is “extra”, not critical to the operation of the site – you can survive and use the site properly even on a non-JavaScript capable browser (these days, it’s mainly screen readers which fall into this category).

However, when you start adding functionality which doesn’t have a non-JS fallback, such as Facebook (just try using it with JavaScript disabled in your browser), the site becomes completely unusable to some people – a huge discrimination against those who are not as able as others to browse the web (for example, those with screen readers). Therefore, using JavaScript to add critical functionality is a very bad idea if you want your site to be open to all and actually usable.

Security

More importantly, JavaScript code is downloaded to the client computer, and executed there.

The client has the option to execute it or not, or even to modify the code first then execute it (with a little know-how).

This means any security checks you put into the page with JavaScript CANNOT be relied upon to actually work.

As the source code of the above site is sanitising user input with JavaScript in a rather poor way, there are several potential ways around this. Let’s take a look at their web site’s source code:

Several things immediately spring to mind:

  1. SQL keywords and syntax in a “bad list”:
    "select", "drop", ";", "--",
     "insert", "delete", "update", "char(", "`", "varchar"
  2. Weird stuff, possibly passwords or other language constructs?
    "/", ":", "?", "|", "declare", "convert", "@@",
     "2D2D", "4040", "00400040", "[", "]"
  3. “xp_” – perhaps a computer name prefix for systems running Windows XP?

Addressing the points in reverse order, a quick bit of poking (just a standard HTTP HEAD request!) the web server reveals:

Trying 62.254.243.215...
Connected to www.cadw.wales.gov.uk.
Escape character is '^]'.
HEAD / HTTP/1.1
Host: www.cadw.wales.gov.uk

HTTP/1.1 200 OK
Date: Wed, 30 Mar 2011 04:29:53 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 20796
Content-Type: text/html; Charset=ISO8859-1
Set-Cookie: ASPSESSIONID......GDEM; path=/
Cache-control: private

(I’ve removed the actual cookie set :P)

Ooh look! We’re running IIS 6.0 as the web server. This gives is two likely suspects for the operating system of the server: Windows Server 2003 (aka WinXP server edition), or Windows XP Professional x64 Edition. Basically, XP.

With only talking to their web server, I’ve now got a likely prefix on machine names – chances are the names are just numbered after that, and given their network is running Windows servers, it’s likely to be on a Windows domain. That simple knowledge gives me the hostname of a large number of workstations: xp_1.cadw.wales.gov.uk (or maybe xp_01.cadw.wales.gov.uk or xp_001.cadw.wales.gov.uk, or perhaps even xp_01.wales.gov.uk etc). It would be trivial to find out which of these naming schemes existed – probably by just pinging their DNS server.


At this point, this information is getting scary. I’d like to remind my readers that everything I have done so far, I have documented here. I have done nothing else. I’d also like to remind folks that this is a government computer system, and any vulnerabilities I find I am not going to touch, as I don’t have permission to do so. Information I have found so far is either public information that they may or may not have inadvertently published (such as POTENTIAL machine names), or information that would be retrieved by software such as web browsers every time you loaded the site. Getting that information manually by simulating a (poor and slow) browser just happens to be easier than messing around inside my browser (chrome) config at the moment (for firefox users, the extension Firebug will nicely show this information for you). If you choose to use the information I have published here, then you do so at your own risk. My aim in this is to point out bad security practice in the hope that others will heed the warnings and not make the same mistakes.

The weird stuff which makes up my second point could be anything, a bit of googling might tell you why they’re dangerous, or explicitly prevented.

Lastly, the first point. Let’s take a look at the main items from the SQL-specific part of their “naughty words”:

"select", "drop", ";", "--", "insert", "delete", "update", "`"

(I’m going to quickly point out they convert the words to lowercase to check them against the list.)

So, we can’t retrieve or modify the data. We can’t delete data from the table, but truncate table isn’t restricted. We can’t use comments. We CAN use quotes, but not the table-style backtick quotes. We can’t drop tables or columns, but we could add new tables and columns if we wanted.

The fact that only the backtick (`) is in the list could be an indication of a style of quoting, which we could make use of.

Of course, this is all before we make the obvious suggestion: “IT’S JAVASCRIPT! Turn it off!”.

Oops, did we just turn off all protection against SQL injection attacks on your database for ourselves, with a simple checkbox in the browser settings? How inconvenient of me!

Usability for tech noobs

One last point just to round off the whole thing, one on usability.

Let’s say I want to search their site (using their tiny search box) for “how do I select a place to visit?”, my search query gets cut off at “…vis”. Assuming the user is smart enough to realise the computer doesn’t like long searches, they might rephrase to “how do I select a place?”.

This is the error message I get:

Would you understand it if you were a tech noob? I doubt it. After all, what’s wrong with “select”?

Conclusion

YOU’RE DOING IT WRONG.

xkcd.com - Voting Machines

CADW: If you ever see this post, fire your web developer, and take your site offline until it can be fixed by someone who’s actually competent.

Please, please, PLEASE let this be a lesson to other people how sanitising user input is a Good Thing™.

xkcd.com - Exploits of a Mom

I only hope that the Government in Westminster hasn’t made the same mistake, or this could be very costly… it appears someone has filled in the census form in a rather interesting way…

Account security and all that jazz

So, a couple of weeks ago we came across a new user, who seemed to be acting newish, but after a couple of days seemed to be acting much more like an experienced editor, albeit slightly IMHO childish. I first became suspicious when he requested rollback, and claimed to have had rollback before on a different account, to which he lost the password to, and had forgotten the username, and also lost access to his email account.

As “sockpuppetry” (using multiple accounts) isn’t allowed on Wikipedia except in a very select set of circumstances, suspicions quickly arose as to who this person could be. It wasn’t until another editor questioned who it might be and made a suggestion did I start properly looking into it.

Helpmebot’s IRC logs showed that he’d joined IRC a few times without getting a hostname/IP-hiding cloak, so I had a hostname, resolved it to an IP address, and performed a geolocate: Liverpool. The suggested user I happen to know from previous experience is in Arizona.

Eventually, he manages to “remember” the account, a previous antivandalism account with rollback unused for just over a year. Already being suspicious, I jump to the conclusion that he’s claiming an old account to gather trust.

Password resets seem to fail on that account, because it’s going to an email account that appeared to have been compromised, even the security questions had been changed. Sending password-type information such as this to a compromised email account by definition compromises the enwiki account too – something another admin appeared to have a hard time understanding.

Anyway, it turns out he was typing the wrong email address in, and the security questions belonged to a different account. Regaining access to the email account, he regained access to his old account, and we moved stuff over to his new account, which he’s now using.

Frape: short for facebook rape. this is where someone changes someone elses status without them knowing.
urbandictionary.com

On another security note, it appears one of my uni friends isn’t the best at this whole security thing either – he left his laptop unlocked next to me for a while, after logging out of facebook etc (so I couldn’t frape him). He didn’t lock his entire laptop as a secondary precautionary measure, as I was “unable” to get into his account to frape him.

When he came back and deleted the frape I managed to slip in, he spent 5-10 minutes trying to figure out how I did it. When he eventually found that a version of firefox was saving his password, he thought he’d solved it – until I kindly let him know that I didn’t actually find that hole, and that there was another one sat around.

Because he deleted the frape, he also deleted crucial evidence that would have helped him to close the hole a lot quicker – I’d fraped him from TweetDeck, and the deleted frape showed that – but he didn’t realise because he’d deleted the frape before looking at where it came from.

Lesson: don’t delete evidence quickly cos you never know how useful it might be in closing a security hole. Another lesson: don’t assume a system is secure. Logging out of everything you can think of is one thing, but you’ll probably forget something. Maybe another lesson? A second layer of security probably doesn’t hurt.

National Organization for Marriage and some hotlinking

This is just something someone posted in an online chat an hour or two ago, thought I’d write something up about it.

I’ll start off with a bit of background about a couple of things first:

From Wikipedia (link):

The National Organization for Marriage (NOM) is a non-profit organization that seeks to prevent the legal recognition and acceptance of marriage and civil unions for same-sex couples. NOM’s stated mission is “to protect marriage and the faith communities that sustain it.”

So, basically, they’re a homophobic group against gay marriage.

The slightly more techy thing is hotlinking – basically it’s including someone else’s image in your own webpage without taking a copy of it first – basically displaying the image of someone else’s server. This is bad for the server the image is stored on, because it’s using bandwidth that’s not helping the server in any way (ie: the site hosted on that server doesn’t get any traffic for the bandwidth because it’s only showing the image for someone else’s site).

Hotlinking is pretty dangerous, as the owner of the site you’re hotlinking from has complete control over the image, so frequently if hotlinking is detected they’ll move the image or something like that so the image isn’t a valid link any more. However, the owner could even replace the image entirely with something completely different.

In this case, the N.O.M. (http://nomblog.com/ – appears down for maintenance at the moment. Coincidence? I think not :D ) hotlinked an image from http://www.smbc-comics.com/ , and unfortunately for them, the owner of the site was a pro-gay anti-hotlinking kinda guy.

http://twitpic.com/3w7b1j/full

I just had to laugh. :D