Arrival at Zurich – Wikimedia Hackathon 2014

I’ve arrived!

After some hiccups with API, and my shoes setting off the metal detector causing me to have a full body scan et al. by the security guards, the flight here was relatively uneventful thankfully. SwissAir provided free drinks and lemon cake inflight, which was unexpected, but welcome.

The sight of the Alps as the plane was coming in to land in Zurich reminded me how much I love being in the mountains. Although Zurich is not really in the Alps, it’s not too far away, and just the overall feel of Switzerland (France too, but I’m not in France) is really relaxing and calming.

I’m quite glad to be back on the continent again, it’s just a completely different atmosphere. Unfortunately, I’m not going to be able to get out and explore Zurich much while I’m here, as I have a lot of coding to do for the Wikimedia Hackathon!

It’s good to see faces again I’ve not seen for a year – and also to put faces to names I didn’t meet last year in Amsterdam. While the Hackathon hasn’t officially started yet, a fair few people are already getting down to work, including myself. I’ve been making sure that the code I’m going to be working on is in a fit state to have new features added to it, so making sure I’m not building new features on top of existing bugs. I’m also attempting to replicate my working copy of the code that’s sat on my desktop machine onto my laptop – I thought I had all the pieces I needed, then I remembered how long it has been since I last did software development on my laptop!

Power initially proved to be problematic too, while I thought I had a Swiss power adaptor, all of the adaptors I had were in fact only European ones. A mad dash around the shops before I left proved fruitless, so I had to buy one in the airport. UK and EU plugs being larger than Swiss plugs, and with the elegance of the Swiss sockets allowing three plugs to be inserted in a space only slightly larger than the UK and EU plugs means that when an adaptor is plugged directly into a Swiss socket, the ability to use the other two Swiss sockets is severely limited. Thankfully, a fair few people (myself included) have brought extension leads (mainly European), which others can plug into without problems.

The main aims for my time here are to get a framework for supporting OAuth within the account creation tool, allowing us to replace a few key pieces of functionality for which the implementation is not ideal. Firstly, I want to replace our confirmation of on-wiki username, then work towards implementing the welcomer bot to use the account creator’s actual account, rather than faking their identity with a bot. Lastly, if we can allow the account creation tool to actually create the accounts itself too, so we don’t need to redirect to the creation web form, it would be awesome.

I’m hoping to get through as much of that list as possible, with assistance from the development team of the extension, but overall I’m aiming to get far enough with a working framework so that I can complete the rest without assistance. If I can also complete some of the outstanding issues that the extension has too, and learn more about MediaWiki in the process, it will be even better.

Wikimedia Hackathon 2014 – Zurich, Switzerland

In a few weeks time, I’m going to be heading to the Wikimedia Hackathon in Zurich, Switzerland.

I’ve been granted a scholarship by Wikimedia UK to cover my travel costs and accommodation, so I can spend a long weekend hacking on code for Wikipedia-related things in the company of a hundred or so other developers from around the world.

I’m heading there to work on integrating the Account Creation tool with the Wikimedia login system so we can replace our broken welcome bot with posts from the creator an account. In the past we have faked it by making the bot sign as a different user, but we’re hoping to allow the bot to edit as other users using the new OAuth tools.

I’m also hoping to learn a lot more about the internals of MediaWiki, the software which powers sites like Wikipedia, in the hope that I can get a lot more involved in the development of this remarkable piece of software, both the core application itself, and extensions. I’ve already done a bit of work on extensions, but I’ve hardly done anything with core. I’d love to learn more to be able to get much more involved with it.

One of the other things I’m looking forward to is the OpenPGP/GnuPG key signing party that’s been suggested, where people can get together and verify each other’s identities, then go away and sign their keys as being valid. It’ll be the first time that I’ve ever been to something like this, and it will be good to get a few more signatures on my key!

It’s going to be really good to meet people, learn, and importantly hack on code to try and get something worthwhile done!

…and so history repeats itself.

In 2008, two Wikipedia administrators (PeterSymonds and Chet B Long) allowed another user (Steve Crossin) to access their administrator accounts. As expected, this was eventually discovered and trout was used extensively across the board. While those users have learnt their lesson, and a lot more users learnt from their mistake too, it seems not everyone learnt that lesson or studied history to learn from that.

A couple of weeks ago, two high-profile users were uncovered to have done a similar deed – Riley Huntley asked Gwickwire to make an edit from his account while he was unable to do so. While Riley didn’t have administrator permissions on enwiki, he did on Wikidata, and thanks to the Single-User Login system that the WMF wikis use, this would have logged Gwickwire onto Riley’s Wikidata account at the same time.

Aftermath

A fair few people who have been around a long time on Wikipedia have rolled their eyes, having seen this happen before and the results it took. GWickwire and Riley have both left, which I feel is an over-reaction, especially since Peter and Chet both only lost their administrator privileges, and only then for 5 months or so – and Peter is now a steward. Yes, it was a mistake, a foolish thing to do (especially given history), but it was 5 years ago so it was a decent amount of time ago. I think both Riley and GWickwire will probably come back at some point, but under different names.

This whole thing is a shame, cos Wikipedia has lost two good editors, who were good at what they did. ACC and #wikipedia-en-help have suffered a bit with their loss.

ACCBot’s recent breakage

A couple of weeks ago, the IRC bot that we use over at Wikipedia’s Account Creation Assistance project decided it would stop giving notifications to the IRC channel. Previously, it used UDP as the transport between the web interface and the IRC bot. However, for some unknown reason, this stopped working.

After seemingly endless messing around with PHP, netcat, more PHP and a bit of telnet, I came to the conclusion that it was fucked, and there was no way to recover it with any ease.

Previously, the code looked something like this:

For receiving notifications over UDP, that was all we needed – it worked and was semi-secure. However, when it stopped working, I took a more radical approach.

You’ve probably heard of Amazon Web Services (AWS) by now, if you haven’t then I recommend you take a look.

One of the AWS services is something called the Simple Notification Service, which seems to be exactly what I want – a notification system. However, the only notification endpoints are HTTP pings, e-mail, or an SQS endpoint.

SQS is what I chose eventually – it’s another of Amazon’s services, the Simple Queue Service. This has the “advantage” of queuing all the notifications so if you’re offline you will still get them all. However, for our case this isn’t ideal, but the bot isn’t usually down for long if it goes down. So, I decided to go for SNS->SQS over HTTPS as the transport for the notifications, rather than UDP.

Of course, code needed changing – at first I thought drastically, but it turned out to be a much smaller change than I anticipated:

It looks small, just another explicit check to see if we actually received anything. That’s until you realise that I wrote another function to take some of the work off to one side.

There’s not much that’s changed, but it was an interesting technical challenge :P The only thing that has noticeably changed is the lag from notification generation to display on IRC – can be anywhere up to about 5s if you’re unlucky!

EyeInTheSky

EyeInTheSky is one of my newer projects, an idea which I’ve stolen from two other people.

Wikipedia has so many modifications being made to it that it’s just not possible to keep an eye on everything you want to watch. While the MediaWiki software has a feature known as a watchlist, it’s neither flexible nor easy to use in my opinion.

EyeInTheSky is an IRC bot (seems to be my speciality!) which monitors the Wikimedia IRC recent changes feed, compares every entry to a set of regular expressions and reports them to a different network.

It’s possible to set up the bot with an entire XML tree of regular expressions matching on the username, edit summary, and page title. There are also logical constructs which allow more-or-less unlimited regexes to specify what exactly you want to watch.

For example, with this tree, I could specify I wanted to stalk all the edits which are made by someone with “the” in their name, “and” or “or” but not “xor” in the page title, and with “train” in the edit summary:

I also can set a flag, something that I can then set my IRC client to respond to, and it will speak that flag for every stalked edit.

Of course, it’s not just edits that can be stalked this way – log entries are sent to the IRC RC feed in the exact same way. It’s just a case of specifying Special:Log/delete as the page title to get the deletion log, for example. The entire log entry except for the time/user is sent in the edit summary field. This means the same system can be used to stalk log entries as well.

The bot logs all stalked edits, and is capable of emailing the entire log to me, so I can clear the log when I disconnect from IRC, and when I get back on, I can email the log, go through what I’ve missed, and catch up.

I’m planning on making it multi-channel too, with probably multiple people able to command it to email them the log. I can already tell it to not email certain stalks, especially as some of the stalks that have been set up are not things that interest me, but rather interest other people. I just ignore those when it reports them, and have it set not to email me for those stalks.

There’s quite a lot this little bot can do, if you want to learn more, I’d recommend taking a look over the source code, and see what you think!

The source code is available on GitHub here.

Wikipedia Account Request System – Password Storage

The current ACC system has some really useless bits which are hard to change, such as the password storage system. At the moment, the database is filled with “securely” stored passwords, such as “5f4dcc3b5aa765d61d8327deb882cf99”. Any quick Google search will quickly tell you exactly how the passwords are currently stored, a simple MD5 hash. This is quite clearly inadequate, so as part of the rewrite I’ve been aiming to store the passwords much more securely.

In all the examples, I’m going to use the password “password”.

At the moment, it’s simple to set a password, just store

md5("password");

into the database. It’s also simple to check the password, just check

md5($suppliedpassword) === $storedpassword

However, I was wanting to store the passwords with a salt, a different salt for each user – hence making cracking the MD5 hash much less feasible.

The function I’m now using to encrypt a password is this:

The $2$ at the front indicates the version of the password hash for later use. For a password “password” and a username “username”, this gives the encrypted result $2$8c6e7b658b4be4bb325870a1764ca4fb

When a password is checked, the code looks at the first three chars of the stored password, and determines if it matches $2$ or not. If it does, the provided password is encrypted with the new hashing function, and compared to the stored password. If they match, it’s the right password.

If the first three chars are not $2$, then it hashes the password using the old method, compares it, and if it matches, takes the provided password, hashes it with the new function, saves it to the database, and returns that it’s the right password.

This has the effect of being transparent to the user, but increasing the security of their password the first time they log in to the new system.

Wikipedia Account Request System

I thought it was about time I did a bit of a technical post on the new Wikipedia Account Request System that’s been sat around slowly being worked on over what’s nearly a year(!) now.

It’s still a long way off, but I’ve not had time to actually buckle down and do work on it, so I’m hoping that I’ll be able to spend a bit more time with it in the near future.

Since the migration to GitHub, I’ve been doing quite a bit of development work on it, and have recently (semi) finalised the database, which will hopefully speed things up a bit, and stop me from saying “ooh, let’s do this with the database”, “nah, nevermind”, “ooh, let’s do this instead”, etc.

The database finalisation comes after writing the conversion script to convert the database from the current format into the new format – there’s roughly 35 operations to be done to make the database sort-of OK, 28 of which are done on one single database table.

I’m taking this opportunity to make these somewhat huge database changes to the core of the system as there’s not much that’s using the database at the moment in the new system, and a huge migration would have to happen in order to swap from one system to another anyway, so I’m not too fussed about making more changes like this.

As the developers of the current system will know, the code is quite frankly shocking. I’m pretty certain that SQL injection and XSS attacks are prevented, but only because we apply about 15000 sanitisation operations to the input data, mangling anything that’s remotely cool such as unicode chars – to cite a recent example: • – into a mess that MIGHT be displayed correctly on the tool, but any other areas just don’t work. In this case, MediaWiki rejected it as a bad title, because it was passed • instead of •.

The new system should hopefully solve some of these issues.

For starters, all the database quote escaping is going – I’m not even going to do database input sanitising – and I’m going to actively reject any change that adds it.

There’s a reason for this, and that is because of the database abstraction layer I’m using for this new system – PDO.

PDO handles all the database connection details for me automatically, and supports both raw SQL queries, and prepared statements. Where the former requires sanitisation to be secure, the latter doesn’t. You simply pass in place-holders (called parameters) to the query where your input goes. You can then bind values or variables to the parameters, and execute the query. Because the query and parameters are passed separately to the server, no sanitisation ever needs to happen because it’s just impossible to inject anything in the first place.

The really cool thing that I’m planning to (ab)use a lot is the ability to retrieve a database row from the database as an instance of a class you’ve previously defined.

The above is an actual excerpt from the User class of WARS at the moment, and the database structure of the acc_user table.

As you can see, the class has a set of fields which exactly match the names of the columns in the table. This is a key part of making the code work – all you need to do is create a query which pulls out all the columns for one row in the database, pass it the parameter which tells it which row to return, and then tell it to fetch an object, telling it which class to instantiate. A simple four-line function dealing with the searching and retrieval from the database, and instantiating a class with the relevant data – it’s actually beautiful! :D

My plan is to use this structure of data access objects for all the other database tables, and then I should be able to deal with the entire system on a purely object-based level, rather than constantly mashing in database queries here and there.

So, I’m building a contribs tool….

So, I’ve decided to build a contributions tool in a similar style to the popular Huggle anti-vandalism tool.

Initially, I was asked to review the contribs of a specific user who was considering running for adminship. So, my lazy brain decided that I couldn’t be bothered reviewing contrib after contrib using tab after tab, so instead I wrote an app to load contribs, and load a diff for me.

I’ve still not started reviewing the contribs, but it’s pretty cool for an hour or two’s worth of coding and tapping the MediaWiki API
Screenshot of Chronological Contributions Walker

This is just an example setup going through some of Dusti’s contribs, randomly clicking skip and flag until I got a good screenshot, but I’m planning on adding an open in browser option, and an export flagged option too.

Eventually it’ll probably find it’s way into a subversion repo on this server somewhere (kinda surprised it hasn’t already actually), and I’ll probably release it for general use sooner or later. It’s pretty cool though for not much time developing it :)

Account security and all that jazz

So, a couple of weeks ago we came across a new user, who seemed to be acting newish, but after a couple of days seemed to be acting much more like an experienced editor, albeit slightly IMHO childish. I first became suspicious when he requested rollback, and claimed to have had rollback before on a different account, to which he lost the password to, and had forgotten the username, and also lost access to his email account.

As “sockpuppetry” (using multiple accounts) isn’t allowed on Wikipedia except in a very select set of circumstances, suspicions quickly arose as to who this person could be. It wasn’t until another editor questioned who it might be and made a suggestion did I start properly looking into it.

Helpmebot’s IRC logs showed that he’d joined IRC a few times without getting a hostname/IP-hiding cloak, so I had a hostname, resolved it to an IP address, and performed a geolocate: Liverpool. The suggested user I happen to know from previous experience is in Arizona.

Eventually, he manages to “remember” the account, a previous antivandalism account with rollback unused for just over a year. Already being suspicious, I jump to the conclusion that he’s claiming an old account to gather trust.

Password resets seem to fail on that account, because it’s going to an email account that appeared to have been compromised, even the security questions had been changed. Sending password-type information such as this to a compromised email account by definition compromises the enwiki account too – something another admin appeared to have a hard time understanding.

Anyway, it turns out he was typing the wrong email address in, and the security questions belonged to a different account. Regaining access to the email account, he regained access to his old account, and we moved stuff over to his new account, which he’s now using.

Frape: short for facebook rape. this is where someone changes someone elses status without them knowing.
urbandictionary.com

On another security note, it appears one of my uni friends isn’t the best at this whole security thing either – he left his laptop unlocked next to me for a while, after logging out of facebook etc (so I couldn’t frape him). He didn’t lock his entire laptop as a secondary precautionary measure, as I was “unable” to get into his account to frape him.

When he came back and deleted the frape I managed to slip in, he spent 5-10 minutes trying to figure out how I did it. When he eventually found that a version of firefox was saving his password, he thought he’d solved it – until I kindly let him know that I didn’t actually find that hole, and that there was another one sat around.

Because he deleted the frape, he also deleted crucial evidence that would have helped him to close the hole a lot quicker – I’d fraped him from TweetDeck, and the deleted frape showed that – but he didn’t realise because he’d deleted the frape before looking at where it came from.

Lesson: don’t delete evidence quickly cos you never know how useful it might be in closing a security hole. Another lesson: don’t assume a system is secure. Logging out of everything you can think of is one thing, but you’ll probably forget something. Maybe another lesson? A second layer of security probably doesn’t hurt.