I came across a post a while ago, and I have to say that I agree with it – it nicely sums up a lot of the mistakes that people make when designing and developing web sites.
This is something I’ve known for quite a long time, but if you’re ever thinking “it’s not that much extra power to add progress reports”, just make a mental note of exactly how many progress reports you’re making, and try commenting them all out.
BeebMaze generated a new maze in mere seconds, but at the same time, that was seconds too long. As part of refactoring, I took out the progress reports with the intent of putting them back in, since there wasn’t that many.
It now generates a new maze in about the time it takes to paint the screen once. :D
(Download this build from the build server now: Build #18)
I shouldn’t need to stress the importance of sanitising user input on web forms. I also shouldn’t need to stress this importance of government websites being secure.
I also shouldn’t need to stress the insecurity of client-side code.
However, it seems Cadw (“the historic environment service of the Welsh Assembly Government”) seems to be stuck a bit too far in the past before people started exploiting websites for fun or profit, as I recently discovered from this tweet:
The worst possible way to guard against SQL injections: http://www.cadw.wales.gov.uk/ [see source and then cry]
— Tim Dobson (@tdobson) March 28, 2011
- Adding functionality
Both of these points obviously need exploring further.
The client has the option to execute it or not, or even to modify the code first then execute it (with a little know-how).
Several things immediately spring to mind:
- SQL keywords and syntax in a “bad list”:
"select", "drop", ";", "--", "insert", "delete", "update", "char(", "`", "varchar"
- Weird stuff, possibly passwords or other language constructs?
"/", ":", "?", "|", "declare", "convert", "@@", "2D2D", "4040", "00400040", "[", "]"
- “xp_” – perhaps a computer name prefix for systems running Windows XP?
Addressing the points in reverse order, a quick bit of poking (just a standard HTTP HEAD request!) the web server reveals:
Trying 18.104.22.168... Connected to www.cadw.wales.gov.uk. Escape character is '^]'. HEAD / HTTP/1.1 Host: www.cadw.wales.gov.uk HTTP/1.1 200 OK Date: Wed, 30 Mar 2011 04:29:53 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET Content-Length: 20796 Content-Type: text/html; Charset=ISO8859-1 Set-Cookie: ASPSESSIONID......GDEM; path=/ Cache-control: private
(I’ve removed the actual cookie set :P)
Ooh look! We’re running IIS 6.0 as the web server. This gives is two likely suspects for the operating system of the server: Windows Server 2003 (aka WinXP server edition), or Windows XP Professional x64 Edition. Basically, XP.
With only talking to their web server, I’ve now got a likely prefix on machine names – chances are the names are just numbered after that, and given their network is running Windows servers, it’s likely to be on a Windows domain. That simple knowledge gives me the hostname of a large number of workstations: xp_1.cadw.wales.gov.uk (or maybe xp_01.cadw.wales.gov.uk or xp_001.cadw.wales.gov.uk, or perhaps even xp_01.wales.gov.uk etc). It would be trivial to find out which of these naming schemes existed – probably by just pinging their DNS server.
At this point, this information is getting scary. I’d like to remind my readers that everything I have done so far, I have documented here. I have done nothing else. I’d also like to remind folks that this is a government computer system, and any vulnerabilities I find I am not going to touch, as I don’t have permission to do so. Information I have found so far is either public information that they may or may not have inadvertently published (such as POTENTIAL machine names), or information that would be retrieved by software such as web browsers every time you loaded the site. Getting that information manually by simulating a (poor and slow) browser just happens to be easier than messing around inside my browser (chrome) config at the moment (for firefox users, the extension Firebug will nicely show this information for you). If you choose to use the information I have published here, then you do so at your own risk. My aim in this is to point out bad security practice in the hope that others will heed the warnings and not make the same mistakes.
The weird stuff which makes up my second point could be anything, a bit of googling might tell you why they’re dangerous, or explicitly prevented.
Lastly, the first point. Let’s take a look at the main items from the SQL-specific part of their “naughty words”:
"select", "drop", ";", "--", "insert", "delete", "update", "`"
(I’m going to quickly point out they convert the words to lowercase to check them against the list.)
So, we can’t retrieve or modify the data. We can’t delete data from the table, but truncate table isn’t restricted. We can’t use comments. We CAN use quotes, but not the table-style backtick quotes. We can’t drop tables or columns, but we could add new tables and columns if we wanted.
The fact that only the backtick (`) is in the list could be an indication of a style of quoting, which we could make use of.
Oops, did we just turn off all protection against SQL injection attacks on your database for ourselves, with a simple checkbox in the browser settings? How inconvenient of me!
Usability for tech noobs
One last point just to round off the whole thing, one on usability.
Let’s say I want to search their site (using their tiny search box) for “how do I select a place to visit?”, my search query gets cut off at “…vis”. Assuming the user is smart enough to realise the computer doesn’t like long searches, they might rephrase to “how do I select a place?”.
Would you understand it if you were a tech noob? I doubt it. After all, what’s wrong with “select”?
YOU’RE DOING IT WRONG.
CADW: If you ever see this post, fire your web developer, and take your site offline until it can be fixed by someone who’s actually competent.
Please, please, PLEASE let this be a lesson to other people how sanitising user input is a Good Thing.
I only hope that the Government in Westminster hasn’t made the same mistake, or this could be very costly… it appears someone has filled in the census form in a rather interesting way…
This is just something someone posted in an online chat an hour or two ago, thought I’d write something up about it.
I’ll start off with a bit of background about a couple of things first:
From Wikipedia (link):
The National Organization for Marriage (NOM) is a non-profit organization that seeks to prevent the legal recognition and acceptance of marriage and civil unions for same-sex couples. NOM’s stated mission is “to protect marriage and the faith communities that sustain it.”
So, basically, they’re a homophobic group against gay marriage.
The slightly more techy thing is hotlinking – basically it’s including someone else’s image in your own webpage without taking a copy of it first – basically displaying the image of someone else’s server. This is bad for the server the image is stored on, because it’s using bandwidth that’s not helping the server in any way (ie: the site hosted on that server doesn’t get any traffic for the bandwidth because it’s only showing the image for someone else’s site).
Hotlinking is pretty dangerous, as the owner of the site you’re hotlinking from has complete control over the image, so frequently if hotlinking is detected they’ll move the image or something like that so the image isn’t a valid link any more. However, the owner could even replace the image entirely with something completely different.
In this case, the N.O.M. (http://nomblog.com/ – appears down for maintenance at the moment. Coincidence? I think not :D ) hotlinked an image from http://www.smbc-comics.com/ , and unfortunately for them, the owner of the site was a pro-gay anti-hotlinking kinda guy.
I just had to laugh. :D
OK, so I was recently asked to help a newbie to Wikipedia (I’m gonna call them Alex, as an arbitrary name) who was having a few issues accessing Wikipedia articles from the application “Dictionary“, produced by Apple for their Mac OS X operating system. After asking Alex what the problem was, he told me that ever since the “new features” had been enabled on Wikipedia, it hadn’t worked for him. Apparently, he’d tried various articles (eg: http://reviews.cnet.com/8301-13727_7-20006290-263.html ) which suggested fixes, most of which involved disabling the new features and/or reverting to the older “monobook” skin.
At this point, I grew a little concerned. Could this be a MAJOR bug in the new features that actually broke the Wikipedia API (Application Programming Interface), that was specifically designed for external programs/websites to use to ease the interaction with Wikipedia?
Well, apparently not. Alex didn’t appear to be technically capable enough to give me the information to prove or disprove my theory, so I did a little digging and thinking. All of the new “features” actually added zero functionality to MediaWiki, the software that powers sites like Wikipedia. All it is is usability improvements etc. The API was never touched, cos there’s no need to make the API user friendly, cos it’s only of real interest to programmers and developers, who should be able to understand the reasonably straightforward documentation available on the API itself, and the linked documentation on mediawiki.org. It’s really easy to get the raw wikitext of a page, via a simple url such as http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles=Application%20programming%20interface. Appending &format=xml to the end even gets you it in XML format (alternatives include json, php, wddx and yaml, as well as html representations of those to aid debugging). You can get the parsed HTML of the page content even easier with this url: http://en.wikipedia.org/w/api.php?action=parse&page=Application%20programming%20interface. Both of these examples use the Application programming interface article as the requested article. It’s kinda easy, huh?
There’s another two ways of getting the article content.
2) Grab it directly out of the MySQL database behind the software, which is possible for some tools running on the Wikimedia Toolserver, which has a read-only copy of the actual database behind Wikipedia. However, this is impossible for the vast majority of users, so we’ll forget about this one.
3) Screen-scrape the content of the live articles.
I’ll admit, screen scraping is sometimes necessary to extract data from various websites that only have older technologies in them. Most modern sites that developers would care about accessing generally have some means to access the data on them, such as an API (Wikipedia, Facebook, and Twitter are among these), or RSS feeds for changing content (such as bug trackers, news sites, etc). On these sites, there’s no need to screen scrape, and doing so means you have to rely on the user interface staying exactly as it is all the time. Any small change to the design of the site, and your code will likely no longer work. An API/RSS feed is designed to be stable, so changes break stuff, unless absolutely necessary. Screen scraping also means you have to add a metric ton of code just to figure out what you need to parse, where it is, what any special chars mean, etc, and generally takes a long time and a lot of effort (and hence money) to actually write. A documented API which rarely changes should be quick and easy to write code for, and most of the time it is.
I’ve not even mentioned the server side of it either… the API (should) contains none of the user interface code, so all the fancy page rendering stuff won’t even be run, so the API will be quicker to return a result to the client, as well as saving bandwidth and server cpu time, as all of the CSS/JS and most of the HTML code will not have to be transmitted to the client, as well as the information required.
I guess what I’m trying to say is: USING AN API FOR A WEB APP IS BETTER FOR EVERYONE!
Which nicely leads me back to my original point: Apple’s app broke when a user interface changed. A user interface that’s not actually ever involved in the API. They’re screen scraping, and their app suffered quite a major error when the folks on the Usability Initiative team at Wikimedia decided to roll out their UI changes. It looked like Wikipedia had broken, and all the Apple fanatics decided it must be Wikipedia’s fault. However, it’s actually the developers at Apple’s fault for not using the API when it was provided.
So, that’s where I stood until a few hours ago. However, thanks to the weekly Wikipedia newsletter known as the Signpost, I learnt about bug 23602 on Wikimedia’s bugtracker. Apparently, Apple have fixed this breakage now, but the question is, have they learnt their lesson about screen scraping and fixed it properly by using the API?
They’re forcing the skin back to the monobook skin, and continuing to screen scrape.
To all Apple users – the programming quality appears to be ridiculously low. You paid huge sums of money for it. Apparently, they don’t understand the basics of programming with web services such as Wikipedia. And they’re refusing to learn from their mistakes by patching their code with nasty hacks to make it work again instead of fixing it properly.
In case you were wondering, I do have a problem with Apple generally, but even if I try and look at this from a positive light – a company with Apple’s reputation that seriously can’t deal with a simple problem like this properly….. I fear for the future.