Reliability Manager

Job Type:
Job reference:
about 1 month ago

Reliability Manager
Manchester (2 Hour Commute) - Fully Remote

My client is the independent regulator for doctors in the UK. Their focus today is to protect patients, they are adopting a more proactive approach to regulation. Henceforth reducing fitness to practise investigations and building more supportive programmes.

While so much in medicine and society has changed since 1858, the purpose would still be recognisable to one of their founding members. We are trying to protect the public by ensuring good standards of medical education and practice. Or as they said it back then, so that "Person requiring medical aid should be enabled to distinguish qualified from unqualified practitioners".


The Reliability Manager holds a senior role within the section providing leadership to continually improve the stability, reliability, and availability of our supported services. They will lead the Reliability Engineering (RE) team to deliver high quality infrastructure products including a critical mix of cloud, virtualised and physical infrastructure assets and services. The team will keep these services highly available and secure, whilst continually improving our people and the way we work to maintain excellence. Support the IS Operations leadership in assisting with the development and execution of our digital operations strategy. Drive an automation first culture bringing about process efficiency and appropriate tooling to maintain the defence and availability of our constantly changing infrastructure.

Day to day responsibilities

Your day-to-day responsibilities include:

  • Lead a team of Reliability Engineers in their day-to-day activities to deliver cloud and on-premises infrastructure operational services to both internal and external customers. Support the provision of outstanding customer experience by operating, maintaining, and developing a stable, reliable, secure, and available set of services.
  • Accountable for meeting and reporting on service levels, including creating and presenting team performance reports.
  • Use approaches and methodologies such as artificial intelligence for IT Operations (AIOps) to enable more efficient and effective team processes. Develop and utilise operational intelligence to output meaningful insights which will lead to service improvements.
  • Create, maintain, and innovate around automation. Automating redundancy, manual and repetitive tasks to allow your team to focus on innovation will be central to your role. Understanding why things have gone wrong and using that to ensure the same problems don't keep happening is key to us identifying improvement opportunities and delivering operational excellence.
  • Work with all IS Operations teams to define modern operational support practices; run book development, 24x7 on-call support, incident response and root cause analysis processes that make the best use of empowered teams. Improve and implement processes within the RE Team for incident, change, and problem management.
  • Support major incidents as they occur, looking specifically at the technical activities and approaches we can use in our immediate responses. Through involvement and positive contributions to major incident reviews ensure that areas for improvement are openly and constructively discussed leading to lessons and actions as needed

Essential skills & experience:

IT Skills -

  • Strong working knowledge of the Azure ecosystems and infrastructure as code.
  • Experience in managing hybrid infrastructure environments, consisting of on premises and cloud; PaaS, SaaS, IaaS services.
  • Good technical knowledge based on significant practical experience, across at least 3 years, including:
    • Automation/Orchestration
    • Installation, configuration and maintenance of Microsoft environments and tools including Microsoft Windows Active Directory, Exchange, DHCP, DNS, RDS, TCP/IP network, Windows O/S and Office 365
    • SharePoint, Citrix, VMWare, Poly, MS Teams, Solarwinds, MS MECM, Cisco LAN/WAN, Palo Alto
  • Experience in collecting, analysing, and leveraging operational intelligence. Use this to automate IT operations tasks, accelerate delivery and development activities across a complex infrastructure.
  • Awareness of the latest technological developments and can support our section with decision making and strategic recommendations based on this.
  • Experience in working alongside software development teams following a DevOps approach.

Leadership Skills -

  • The post holder must have strong staff management skills to be able to motivate, develop and manage the Reliability Engineering team to ensure objectives are delivered and service level agreement are consistently met.
  • The ability to lead, manage and motivate a team and -
    • Demonstrate and be a role model for positive behaviours and to challenge negative or unacceptable behaviours in line with the GMC's values.
    • Give clear direction to staff on what needs to be done.
    • Give staff an appropriate level of autonomy and freedom to make decisions.

Some great benefit you will find from working at GCM are -

  • 25 days a year, increasing by one day for each year of service up to a maximum of 30 days. You are required to use up to two of these days should the GMC decide to close its offices over Christmas.
  • Life assurance cover. Income Protection Cover. Private medical insurance with AXA Health. Employee assistance programme with People Asset Management (PAM). Cycle to work scheme. Childcare voucher scheme. GMC Discounts Scheme. Eyesight tests. Season ticket loans.

If you are reading this and it seriously jumps out at you and is something that interests, you and you have a lot of IT and managerial experience please give me a call or an email on - 01172840650 +

Back job search
Back to Search Results