404 Error

How to achieve 100% uptime for your web applications

Mark Hall
April 06, 2018

There is never a good time for a web application to be down and the pressure for those involved in developing, hosting and patching them has never been greater. Let’s be honest, it is very frustrating when you go to use a web application only to get a meaningless error message or a notice informing you that it is down for routine maintenance.

I’ve been involved in maintaining IT systems for 30 years and have been involved in the development and hosting of web applications since 1995. Based on my experiences over that time I thought I would share some tips to help you get as close as possible to 100% uptime.

Let’s start with acknowledging that no matter how good your web application has been written and how much you have invested in the hosting and management of it, sooner or later there will be a problem.

My first tip may sound obvious, but in my experience is still one of the most common causes of downtime and that is the deployment of inadequately tested code. With pressure from clients to hit deadlines unfortunately it is the testing phase which can get squeezed which can result in code being made live which still contains bugs. So try and avoid deploying code or updates without carrying out adequate testing.

There is now a huge choice of Companies providing hosting services and some organisations still choose to host their web applications themselves. Managing the infrastructure required to provide a reliable data centre is hard work, ensuring you have diverse power and network feeds, together with cooling and physical security. It now tends to be only larger organisations who can justify running their own data centres. The rise of Cloud hosting in recent years has offered to take away the headache of running your own data centre, however they are not immune to downtime. I will discuss in further detail what you should consider when choosing where to host your web application in a future blog post.

No matter where you choose to host your web application it is important to have plans in place for when things go wrong. If you have your application hosted at a single data centre and they experience a serious outage are you able to get the application up and running quickly at another location? At Oxford Web Applications we host our systems at two completely independent UK based data centres and ensure that our critical applications (and those we host for our clients) are replicated. This means that any web applications which are hosted across the two data centres will only experience a relatively short period of downtime should one of the locations experience a serious problem.

Ensuring that you have adequate backups of your data and that you have access to those backups in the event of an outage is critical. If you have contracted with a single hosting provider who are also providing backup services, I would recommend that you have a copy of the web application stored elsewhere in case the provider or the backups fail. It is also worth considering using some form of offline media, such as tape, to keep a copy of your application where it cannot be compromised. This can be particularly useful if your hosting provider or your application suffers an attack, such as Malware encryption.

Having good and reliable application monitoring in place is essential. If there is a problem with your web application then you and your team should be the first to know. You should also consider monitoring not just your web application home page but any transactional elements which may fail even though the web application appears to be working. I would also strongly recommend that you have a recovery plan or Runbook in place which your on-call team can follow in the event of a problem being detected. No one is at their best at 3am and having a clear plan and instructions to hand to help your team recover the web application outside of normal working hours is essential.

There are a couple of elements to web applications which in our experience often get overlooked but are critical to the overall delivery of a web application. It is very common for organisations to register and host their domain names with one of the large domain name providers. Something we have experienced many times when hosting and managing web applications for our clients is the third-party domain name provider experiencing downtime. Basically if the domain name which is used to deliver the web application to your customers goes down then your application is inaccessible (even though there is nothing technically wrong with it or the hosting provider). In our experience this is one of the most common causes of an application appearing to go down. To mitigate against this risk we would recommend reviewing where your domain name servers are hosted and not relying purely on the default servers offered by the company you used to register the domain name. It may be worth considering using a Content Delivery Network (CDN) such as Cloudflare who also provide resiliency for domain name hosting.

Another common cause of web application downtime is where the secure certificate which is used to encrypt the data between the host server and the user’s browser expires. Although this doesn’t cause your application to necessarily go down, users will be presented with a security warning which may mean that they do not continue. Again, it is common for clients to manage secure certificates separately from their hosting and application support functions, but an expired secure certificate can effectively bring your application down.

Keeping a web application running 100% of the time isn’t easy. In my experience the most important element is the people who are involved, both inside your organisation and with the companies you have chosen to host and support it. With the right team in place you can achieve 100% uptime for the majority of the time and then have a robust recovery plan in place for when things go wrong.

In future posts I plan to provide further details and tips on how to achieve that 100% uptime which users demand.

 

Related Tags: