Wikipedia Account Request System

I thought it was about time I did a bit of a technical post on the new Wikipedia Account Request System that’s been sat around slowly being worked on over what’s nearly a year(!) now.

It’s still a long way off, but I’ve not had time to actually buckle down and do work on it, so I’m hoping that I’ll be able to spend a bit more time with it in the near future.

Since the migration to GitHub, I’ve been doing quite a bit of development work on it, and have recently (semi) finalised the database, which will hopefully speed things up a bit, and stop me from saying “ooh, let’s do this with the database”, “nah, nevermind”, “ooh, let’s do this instead”, etc.

The database finalisation comes after writing the conversion script to convert the database from the current format into the new format – there’s roughly 35 operations to be done to make the database sort-of OK, 28 of which are done on one single database table.

I’m taking this opportunity to make these somewhat huge database changes to the core of the system as there’s not much that’s using the database at the moment in the new system, and a huge migration would have to happen in order to swap from one system to another anyway, so I’m not too fussed about making more changes like this.

As the developers of the current system will know, the code is quite frankly shocking. I’m pretty certain that SQL injection and XSS attacks are prevented, but only because we apply about 15000 sanitisation operations to the input data, mangling anything that’s remotely cool such as unicode chars – to cite a recent example: • – into a mess that MIGHT be displayed correctly on the tool, but any other areas just don’t work. In this case, MediaWiki rejected it as a bad title, because it was passed • instead of •.

The new system should hopefully solve some of these issues.

For starters, all the database quote escaping is going – I’m not even going to do database input sanitising – and I’m going to actively reject any change that adds it.

There’s a reason for this, and that is because of the database abstraction layer I’m using for this new system – PDO.

PDO handles all the database connection details for me automatically, and supports both raw SQL queries, and prepared statements. Where the former requires sanitisation to be secure, the latter doesn’t. You simply pass in place-holders (called parameters) to the query where your input goes. You can then bind values or variables to the parameters, and execute the query. Because the query and parameters are passed separately to the server, no sanitisation ever needs to happen because it’s just impossible to inject anything in the first place.

The really cool thing that I’m planning to (ab)use a lot is the ability to retrieve a database row from the database as an instance of a class you’ve previously defined.

The above is an actual excerpt from the User class of WARS at the moment, and the database structure of the acc_user table.

As you can see, the class has a set of fields which exactly match the names of the columns in the table. This is a key part of making the code work – all you need to do is create a query which pulls out all the columns for one row in the database, pass it the parameter which tells it which row to return, and then tell it to fetch an object, telling it which class to instantiate. A simple four-line function dealing with the searching and retrieval from the database, and instantiating a class with the relevant data – it’s actually beautiful! :D

My plan is to use this structure of data access objects for all the other database tables, and then I should be able to deal with the entire system on a purely object-based level, rather than constantly mashing in database queries here and there.