Elite: Dangerous stats on WordPress blog posts

One of the things that you may have noticed recently on the Elite: Dangerous posts on this site is the little statistics panels at the bottom of posts – this post is about those:

They are generated by a custom extension to the WordPress theme that I’m currently using, backed by a MySQL database. The database schema I’m using can be found here for those interested.
Continue reading

Internet censorship doesn’t work. At all.

When David Cameron announced that he was planning to force all the ISPs to implement automatic filtering of porn on the internet, a lot of people said it was a good idea in principal. And a lot of people who know how the internet and/or filtering tech works said it’s never going to work.

Let me clear something up. I don’t think kids should be looking at porn. I also don’t think there’s a damn thing (on a technical level) that can be done to stop them.

Filters generally work with blacklists and whitelists, and heuristic patterns. Basically speaking, some sites may never be blocked, others will always be blocked, and the heuristics will likely work with keyword lists, so if a page contains a word like XXX or a phrase like “hot hard-core action”, the filter will probably block that site.

Enter the problem: now this site contains those phrases and isn’t one of the huge well-known sites, it could be blocked by those heuristics. Other sites like LGBT sites, rape support sites, and even teen puberty help sites could find themselves blocked, and not all porn sites will contain those phrases so some will inevitably slip through. What’s more, ISPs could find themselves breaking the law by following the law. The LibDem LGBT site was one of those caught in the crossfire, and blocking the website of a political party around election time could be seen as something like electoral fraud.

Apparently parents can override these filters, but what’s to say that parents don’t let their more knowledgeable kids manage the net connection? Or maybe don’t want to disable them because they’re oblivious to the issues? Or even worse, what if the kid is trying to get support regarding parental sexual abuse?

There is so much that could go (and has gone wrong already) with this, and there’s so much damage that could be done to vulnerable people who are trying to get support. It’s a real kick in the teeth for charitable organisations who are doing their best to help people, and then the government comes along and pushes this through.

This is one of many examples of why I feel so strongly against Governmental intervention in technical matters such as internet governance. If you don’t understand the technology, don’t try and legislate for it. Learn the technology, how it works on a basic level, for example with filtering learn how filtering works, and what sort of things get filtered, advantages and disadvantages, and problems. Don’t assume the industry will work out the problems Mr Cameron.

Google Blog javascript balls

So, I’ve just seen http://googleblog.blogspot.com/2012/01/ipv6-countdown-to-launch.html, if you don’t know what I’m on about, it’s pretty cool! Just mouse-over the balls, and have a play! :D

Unfortunately, they’ve hidden the code to do it, so there’s no easy way of finding out what they’ve done. Looks like each ball is tied to a specific point, and they just elastically bounce around that point. Still looks cool though :D

Front page redesign!

I’ve just redesigned the front page of the site – see what you think!

It replaces the navigation bar with a much more blocky navigation which takes up most of the front page, allowing you to easily get to an interesting bit – it looks cleaner too.

The worst possible way to guard against SQL injections

I shouldn’t need to stress the importance of sanitising user input on web forms. I also shouldn’t need to stress this importance of government websites being secure.

I also shouldn’t need to stress the insecurity of client-side code.

However, it seems Cadw (“the historic environment service of the Welsh Assembly Government”) seems to be stuck a bit too far in the past before people started exploiting websites for fun or profit, as I recently discovered from this tweet:

Now, don’t get me wrong – JavaScript is a really useful way to make websites look better and provide cool interactive experiences.

However, all too often I see JavaScript being used in one or both of the worst possible uses for it:

  1. Security
  2. Adding functionality

Both of these points obviously need exploring further.

Adding functionality

JavaScript is commonly used to add functionality to websites without problems. For example, JavaScript is used to provide most editing toolbars on web-based editors like Google Mail, Wikipedia, and WordPress. This functionality is “extra”, not critical to the operation of the site – you can survive and use the site properly even on a non-JavaScript capable browser (these days, it’s mainly screen readers which fall into this category).

However, when you start adding functionality which doesn’t have a non-JS fallback, such as Facebook (just try using it with JavaScript disabled in your browser), the site becomes completely unusable to some people – a huge discrimination against those who are not as able as others to browse the web (for example, those with screen readers). Therefore, using JavaScript to add critical functionality is a very bad idea if you want your site to be open to all and actually usable.

Security

More importantly, JavaScript code is downloaded to the client computer, and executed there.

The client has the option to execute it or not, or even to modify the code first then execute it (with a little know-how).

This means any security checks you put into the page with JavaScript CANNOT be relied upon to actually work.

As the source code of the above site is sanitising user input with JavaScript in a rather poor way, there are several potential ways around this. Let’s take a look at their web site’s source code:

Several things immediately spring to mind:

  1. SQL keywords and syntax in a “bad list”:
    "select", "drop", ";", "--",
     "insert", "delete", "update", "char(", "`", "varchar"
  2. Weird stuff, possibly passwords or other language constructs?
    "/", ":", "?", "|", "declare", "convert", "@@",
     "2D2D", "4040", "00400040", "[", "]"
  3. “xp_” – perhaps a computer name prefix for systems running Windows XP?

Addressing the points in reverse order, a quick bit of poking (just a standard HTTP HEAD request!) the web server reveals:

Trying 62.254.243.215...
Connected to www.cadw.wales.gov.uk.
Escape character is '^]'.
HEAD / HTTP/1.1
Host: www.cadw.wales.gov.uk

HTTP/1.1 200 OK
Date: Wed, 30 Mar 2011 04:29:53 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 20796
Content-Type: text/html; Charset=ISO8859-1
Set-Cookie: ASPSESSIONID......GDEM; path=/
Cache-control: private

(I’ve removed the actual cookie set :P)

Ooh look! We’re running IIS 6.0 as the web server. This gives is two likely suspects for the operating system of the server: Windows Server 2003 (aka WinXP server edition), or Windows XP Professional x64 Edition. Basically, XP.

With only talking to their web server, I’ve now got a likely prefix on machine names – chances are the names are just numbered after that, and given their network is running Windows servers, it’s likely to be on a Windows domain. That simple knowledge gives me the hostname of a large number of workstations: xp_1.cadw.wales.gov.uk (or maybe xp_01.cadw.wales.gov.uk or xp_001.cadw.wales.gov.uk, or perhaps even xp_01.wales.gov.uk etc). It would be trivial to find out which of these naming schemes existed – probably by just pinging their DNS server.


At this point, this information is getting scary. I’d like to remind my readers that everything I have done so far, I have documented here. I have done nothing else. I’d also like to remind folks that this is a government computer system, and any vulnerabilities I find I am not going to touch, as I don’t have permission to do so. Information I have found so far is either public information that they may or may not have inadvertently published (such as POTENTIAL machine names), or information that would be retrieved by software such as web browsers every time you loaded the site. Getting that information manually by simulating a (poor and slow) browser just happens to be easier than messing around inside my browser (chrome) config at the moment (for firefox users, the extension Firebug will nicely show this information for you). If you choose to use the information I have published here, then you do so at your own risk. My aim in this is to point out bad security practice in the hope that others will heed the warnings and not make the same mistakes.

The weird stuff which makes up my second point could be anything, a bit of googling might tell you why they’re dangerous, or explicitly prevented.

Lastly, the first point. Let’s take a look at the main items from the SQL-specific part of their “naughty words”:

"select", "drop", ";", "--", "insert", "delete", "update", "`"

(I’m going to quickly point out they convert the words to lowercase to check them against the list.)

So, we can’t retrieve or modify the data. We can’t delete data from the table, but truncate table isn’t restricted. We can’t use comments. We CAN use quotes, but not the table-style backtick quotes. We can’t drop tables or columns, but we could add new tables and columns if we wanted.

The fact that only the backtick (`) is in the list could be an indication of a style of quoting, which we could make use of.

Of course, this is all before we make the obvious suggestion: “IT’S JAVASCRIPT! Turn it off!”.

Oops, did we just turn off all protection against SQL injection attacks on your database for ourselves, with a simple checkbox in the browser settings? How inconvenient of me!

Usability for tech noobs

One last point just to round off the whole thing, one on usability.

Let’s say I want to search their site (using their tiny search box) for “how do I select a place to visit?”, my search query gets cut off at “…vis”. Assuming the user is smart enough to realise the computer doesn’t like long searches, they might rephrase to “how do I select a place?”.

This is the error message I get:

Would you understand it if you were a tech noob? I doubt it. After all, what’s wrong with “select”?

Conclusion

YOU’RE DOING IT WRONG.

xkcd.com - Voting Machines

CADW: If you ever see this post, fire your web developer, and take your site offline until it can be fixed by someone who’s actually competent.

Please, please, PLEASE let this be a lesson to other people how sanitising user input is a Good Thing™.

xkcd.com - Exploits of a Mom

I only hope that the Government in Westminster hasn’t made the same mistake, or this could be very costly… it appears someone has filled in the census form in a rather interesting way…

National Organization for Marriage and some hotlinking

This is just something someone posted in an online chat an hour or two ago, thought I’d write something up about it.

I’ll start off with a bit of background about a couple of things first:

From Wikipedia (link):

The National Organization for Marriage (NOM) is a non-profit organization that seeks to prevent the legal recognition and acceptance of marriage and civil unions for same-sex couples. NOM’s stated mission is “to protect marriage and the faith communities that sustain it.”

So, basically, they’re a homophobic group against gay marriage.

The slightly more techy thing is hotlinking – basically it’s including someone else’s image in your own webpage without taking a copy of it first – basically displaying the image of someone else’s server. This is bad for the server the image is stored on, because it’s using bandwidth that’s not helping the server in any way (ie: the site hosted on that server doesn’t get any traffic for the bandwidth because it’s only showing the image for someone else’s site).

Hotlinking is pretty dangerous, as the owner of the site you’re hotlinking from has complete control over the image, so frequently if hotlinking is detected they’ll move the image or something like that so the image isn’t a valid link any more. However, the owner could even replace the image entirely with something completely different.

In this case, the N.O.M. (http://nomblog.com/ – appears down for maintenance at the moment. Coincidence? I think not :D ) hotlinked an image from http://www.smbc-comics.com/ , and unfortunately for them, the owner of the site was a pro-gay anti-hotlinking kinda guy.

http://twitpic.com/3w7b1j/full

I just had to laugh. :D

Apple’s horrific code insanity

OK, so I was recently asked to help a newbie to Wikipedia (I’m gonna call them Alex, as an arbitrary name) who was having a few issues accessing Wikipedia articles from the application “Dictionary“, produced by Apple for their Mac OS X operating system. After asking Alex what the problem was, he told me that ever since the “new features” had been enabled on Wikipedia, it hadn’t worked for him. Apparently, he’d tried various articles (eg: http://reviews.cnet.com/8301-13727_7-20006290-263.html ) which suggested fixes, most of which involved disabling the new features and/or reverting to the older “monobook” skin.

At this point, I grew a little concerned. Could this be a MAJOR bug in the new features that actually broke the Wikipedia API (Application Programming Interface), that was specifically designed for external programs/websites to use to ease the interaction with Wikipedia?

Well, apparently not. Alex didn’t appear to be technically capable enough to give me the information to prove or disprove my theory, so I did a little digging and thinking. All of the new “features” actually added zero functionality to MediaWiki, the software that powers sites like Wikipedia. All it is is usability improvements etc. The API was never touched, cos there’s no need to make the API user friendly, cos it’s only of real interest to programmers and developers, who should be able to understand the reasonably straightforward documentation available on the API itself, and the linked documentation on mediawiki.org. It’s really easy to get the raw wikitext of a page, via a simple url such as http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles=Application%20programming%20interface. Appending &format=xml to the end even gets you it in XML format (alternatives include json, php, wddx and yaml, as well as html representations of those to aid debugging). You can get the parsed HTML of the page content even easier with this url: http://en.wikipedia.org/w/api.php?action=parse&page=Application%20programming%20interface. Both of these examples use the Application programming interface article as the requested article. It’s kinda easy, huh?

There’s another two ways of getting the article content.
2) Grab it directly out of the MySQL database behind the software, which is possible for some tools running on the Wikimedia Toolserver, which has a read-only copy of the actual database behind Wikipedia. However, this is impossible for the vast majority of users, so we’ll forget about this one.
3) Screen-scrape the content of the live articles.

I’ll admit, screen scraping is sometimes necessary to extract data from various websites that only have older technologies in them. Most modern sites that developers would care about accessing generally have some means to access the data on them, such as an API (Wikipedia, Facebook, and Twitter are among these), or RSS feeds for changing content (such as bug trackers, news sites, etc). On these sites, there’s no need to screen scrape, and doing so means you have to rely on the user interface staying exactly as it is all the time. Any small change to the design of the site, and your code will likely no longer work. An API/RSS feed is designed to be stable, so changes break stuff, unless absolutely necessary. Screen scraping also means you have to add a metric ton of code just to figure out what you need to parse, where it is, what any special chars mean, etc, and generally takes a long time and a lot of effort (and hence money) to actually write. A documented API which rarely changes should be quick and easy to write code for, and most of the time it is.

I’ve not even mentioned the server side of it either… the API (should) contains none of the user interface code, so all the fancy page rendering stuff won’t even be run, so the API will be quicker to return a result to the client, as well as saving bandwidth and server cpu time, as all of the CSS/JS and most of the HTML code will not have to be transmitted to the client, as well as the information required.

I guess what I’m trying to say is: USING AN API FOR A WEB APP IS BETTER FOR EVERYONE!

Which nicely leads me back to my original point: Apple’s app broke when a user interface changed. A user interface that’s not actually ever involved in the API. They’re screen scraping, and their app suffered quite a major error when the folks on the Usability Initiative team at Wikimedia decided to roll out their UI changes. It looked like Wikipedia had broken, and all the Apple fanatics decided it must be Wikipedia’s fault. However, it’s actually the developers at Apple’s fault for not using the API when it was provided.

So, that’s where I stood until a few hours ago. However, thanks to the weekly Wikipedia newsletter known as the Signpost, I learnt about bug 23602 on Wikimedia’s bugtracker. Apparently, Apple have fixed this breakage now, but the question is, have they learnt their lesson about screen scraping and fixed it properly by using the API?

Apparently not.

GET /w/index.php?title=pagename&useskin=monobook

They’re forcing the skin back to the monobook skin, and continuing to screen scrape.

To all Apple users – the programming quality appears to be ridiculously low. You paid huge sums of money for it. Apparently, they don’t understand the basics of programming with web services such as Wikipedia. And they’re refusing to learn from their mistakes by patching their code with nasty hacks to make it work again instead of fixing it properly.

In case you were wondering, I do have a problem with Apple generally, but even if I try and look at this from a positive light – a company with Apple’s reputation that seriously can’t deal with a simple problem like this properly….. I fear for the future.