Site Reliability Manager
Edinburgh/Glasgow - hybrid x2 days a week
As a Senior Site Reliability Manager you will drive adoption of SRE best practice across our AWS cloud estate. Utilising both your soft skills and technical experience, you will work with teams to ensure our standards and governance are met. By on boarding our services into the cloud, so that in turn our citizen facing applications satisfy all the required operational and security needs for running in production. The Site Reliability Manager will have broad experience of software development, infrastructure engineering and operations and will understand the goals and strategy behind each project from conception through completion. They will support the development and operation of software through tools, environments and practices and are responsible for underpinning good development processes. The role requires a dynamic, highly motivated individual with the ability to work under pressure, and have well developed problem solving skills. You must be an excellent team player, quick learner and self-starter dedicated to maintaining high quality standards.
Essential skills and experience:
- Working knowledge of deploying, developing and managing AWS cloud platform, automation and scripting skills such as Python, Ruby, Groovy proficiency and shell scripting (e.g. BASH), use of GIT for source code version control
- Experience of platform and application automated deployment (CI/CD) technologies such as (Ansible, Terraform, AWX) and exposure to continuous integration and build tools (e.g. Jenkins, Maven, Gradle, or similar).
- Experience in cloud native application - Docker, Kubernetes container management platforms such as EKS and Openshift
- Experience of Linux/Unix/Windows system administration skills good understanding of infrastructure, network, and application security including understanding of OWASP objectives and requirements, configuration of firewalls, load balancers and other network equipment including monitoring of systems and development of appropriate alerting
- Experience in defining and measuring Service Level Objectives and developing metric baselines for service SLAs
- Experience in Implementing automation and scalability techniques for cloud infra
- Should be able to utilise appropriate tools for identifying and correcting complex issues
- Undertake comprehensive analysis of performance trends to identify root cause analysis, progressing opportunities to improve reliability, security, capability of infrastructure, application, and site services
If you feel you have the relevant experience and skills required, please do not hesitate to apply now!