I was going to call this definitive form protection, but as soon as you call something definitive someone else writes something much better. And this isn’t really that definitive anyway, it’s just a collection of ideas I’ve picked up about how to protect your forms.
So, to business. You have a form on your website. It does what almost all forms do – allows users to submit data for processing, be it searching a database, sending a message, updating some information etc. However forms by their very nature are pretty insecure, and users – as well all know – are not to be trusted. So we should make sure that what is being selected or typed into the form is safe for use. If we don’t then we risk being wide open for, among other things, SQL injection attacks.
Traditionally web developers have used JavaScript as a way of checking that users are typing what they should into boxes. Using a JavaScript function to check each form field has the right kind of data in when the user submits the form, telling them that they’ve forgotton to enter their name, that their email address doesn’t look right, or that they need to select their favourite colour, for example, is great. It means the user gets instant feedback on any mistakes thay have made. Because, let’s face it, if there is opportunity for a mistake to be made on a website, it will be made by someone. You might think your system is foolproof, but there’s always a better fool than you out there.
The best script I’ve seen to check form input using JavaScript is this one, which not only adds a little warning icon to each field you get wrong, but turns that field red. Oh, and it gives a message back on the screen as well. Fantastic work.
But, there’s always a but, that’s not very secure. People could copy your form, paste it into their own page removing your classy JavaScript and still submit whatever info they want. So what you need is a second line of defence, and a touch of server side processing does the trick quite nicely.
When your form processing script receives the information, you need to run a couple of checks on it. Firstly make sure that the page request type is “POST” (all forms should be sent using POST rather than GET, as it’s a bit safer). Secondly you should check that the form has come from where you expect it to, using the referrer information. If the form data comes from somewhere wierd, or isn’t a POST request, then you can either display a message to the user, stop processing entirely, redirect the user to somewhere else, or create a clever system to jump out of the users CD drive and splat them in the face with a custard pie. That last one might only be available on Internet Explorer, though.
Then, once you think your data is coming from the right place in the right way, you should check each field to make sure it is in the format what you expect. This is where regular expressions are invaluable. Make sure that email addresses have a @ symbol and a full stop. If you offer a select list in your form, make sure the value submitted is one of the available ones. If you offer the user an input text box or textarea, make sure you escape any dodgy characters. PHP has a fantastic range of string functions to allow you to sanitise text in many different ways.
Make sure that any textual input isn’t longer than it needs to be, and don’t create any database connections until you absolutely have to. If in doubt, do not process anything. Don’t allow people to send you a 500 character field value if all they are entering is a username. Your primary concern is with protecting your system, and nicely-formatted messages letting a user know that they entered something wrong can be very helpful. Remember, all browsers have a back button that allow the user to try again.
There’s loads more you can do to protect your forms, but in general implementing these ideas with give you good protection against nasty data.
UPDATE: I’ve recently used, on a high-profile site that’s had problems with internet nasties submitting duff data through forms, an additional system for form validation. The field names for each field have been randomly generated each time the page is requested, and the field names passed through to the receiving page in an obfuscated manner. Obfuscation is the act of making something mixed-up, confused and non-obvious, but in a manner which you can get the real data back. It’s like encryption without encryption.
The receiving page then de-obfuscates the field names and uses those to get the data that the user typed in. That way the field names that will be accepted only exist one time, for one individual user. That will make it much harder for either a spam robot or a nasty person to infiltrate the form. When it comes to web security there is no such thing as a perfect system; the aim is to make things as difficult as possible for someone who is trying to break/break into your website.