Introduction


To be honest I wasn't really looking forward to this seminar. It seemed like a good idea when it was booked, but when I re-read the content I began to think it was going to be very dry and dull. Thankfully I was wrong.
This wasn't a technical "how to" seminar, more of a business "why would I want to" but with a definite technical slant. Microsoft run quite a few of these events on different subjects around the country.

So what is High Availability?


A high availability system is one that is needed to have considerable up time because the business relies upon it.

Targets


A target of 99% availability actually means that in a year you can have 87 hours of downtime, 1.5 hours a week. This should be easy to achieve without many special efforts.

A target of 99.9% means that 8.7 hours of downtime are acceptable within a year - so we need better hardware, and some redundancy. But, this is still achievable with good management skills.
A target of 99.99% means 52 minutes of downtime per year. This will require some the use of failover clustering, log shipping or replication as well as the hardware and redunancy mentioned above.
A target of 99.999% (or five nines as it's referred to) means 5 minutes of downtime per year. So, as well as hardware, redundancy and failover clustering, log shipping and replication, you'll be looking at better hardware (SANs etc), large infrastructure redundancy (generators etc) and backup standbys of databases.

Terminology used in this article


  • Hot Standby - the data is copied automatically onto a secondary server and is transactionally consistent, which will be automatically brought up if the primary fails.


  • Warm Standby - the data is copied onto a secondary server, but that data may not be transactionally consistent, and the switch may not happen automatically.

  • Cold Standby - a server is available where the data could be restored if necessary
  • Seminar Structure

    The seminar was broken into 3 main "units":

    • Disaster Prevention

  • Disaster Recovery

  • Maintaining a High Availability System
  • Disaster Prevention


    To prevent disaster you need to identify and manage your risks. This involves a five stage approach.
    1. Identify your risks

  • Analyse them - how often, how probable, cost

  • Plan for them - how to avoid them, how to deal with them

  • Track - analyse the things that do happen, and use this to feed back into your plan and make your plan more encompassing

  • Control

  • There is more risk management information available at Microsoft.com

    In a highly available system, redundant hardware and automated failover to a standby system are considerations. SQL Server 2000 Failover clustering based on top of standard Windows clustering (of up to 4 servers) and is a Hot Standby solution. One thing to bear in mind is that the client application must be cluster aware, and this holds true for all availability models, and not depend on a specific server name. Otherwise, the database will remain available, but the system will not.

    Disaster Recovery


    When a disaster happens, we need to be able to recover from it, and get the system up and available again as soon as possible - preferably instantly.
    To help us in this we need a disaster recovery plan - this needs to contain:
    • resource information (Tools, Hardware and software and key technical staff's contact details)

  • procedural information (Timelines, process to bring the system back up, process to analyse the results of the recovery operation)

  • back up site information (where the hot backup, warm backup, cold backup sites are) scenario information (environmental impacts - hardware failure or natural disaster, SQL

  • Server impacts - corrupt data, related systems impact - application or network failure, cluster failure etc)
  • There is more planning information at Microsoft.com

    Techniques for this are:

    • Log shipping - i.e. getting transactional logs copied to other servers. This can be used with network load balancing triggering the server that the applications should connect to.

  • Replication - not designed for availability, designed for scalability, but does meet some of the requirements. This allows for a master database (Publisher) to publish data (whole tables, or just horizontal or vertical slices of data) to Subscribers.

  • Backup and restore - choosing the right backup strategy is important, and key to the length of time it will take to restore that data.

  • Both Log Shipping and replication are warm standby solutions.
    Backup and restore is a cold standby solution.
    And then of course, keeping that disaster recovery plan a living document - if something goes wrong that you hadn't planned for - add it. If someone moves/leaves then change the document. There is no point having a document that has incorrect information in it if the dreaded disaster does happen.

    Maintaining a High Availability System


    In order to stand a chance of keeping a system available there are some people constraints which will need to be considered - for instance making sure that your database administrators are fully aware of your plans. They need to work as a team, so that everyone knows enough about every system to ensure that the responsibility is shared. They need to ensure that change control is carefully monitored to reduce risk, and when a change is implemented they should always have a backup plan. Good monitoring tools should be employed so that the DBAs are informed of any concerns asap.

    Summary


    Overall I was impressed. This seminar was worth attending, especially as it was free. It provided a useful overview of the main different techniques for achieving a high availability system and I didn't come away brainwashed. Yes, the techniques are specific to SQL Server 2000 but the theory and planning should be sound. I'm definitely going to keep an eye open for future seminars.