Site reliability Engineer
6 months initially
Ready to work in a dynamic environment alongside talented people who take pride in delivering great results? Apply today. We're happy to talk flexible working.
Software Engineering is a cross functional team that uses cutting edge technology to provide applications that put us ahead of competition across online content, customer service and sales.
We operate reliability across personalisation and big data services for our clients and devices across the world. Being a global capability brings us big challenges, and everything we do, we do at a large scale. To allow us to operate at such a large scale, automation is at the heart of everything we do. We deliver a platform and CI/CD solutions to allow developers to build and deploy safely for systems which support millions of concurrent customers.
Your key responsibilities:
- Building our global platforms across multiple clouds and multiple regions to support development teams working on our personalisation, machine learning and big data services
- Enhancing and supporting a build framework for continuous deployment and platform automation
- Mentoring engineers both SRE and developers to help them design the right solutions and promote a culture of reliability and automation
- Working with development teams to diagnose performance, reliability and security issues in applications and system design
- Working with our technical suppliers to ensure our systems are following best practices and taking advantage of new technologies
- Work within the larger reliability department to ensure we are following similar design patterns and avoiding duplicated work
- Strong experience using infrastructure as code and working with immutable infrastructure and configuration tools such as Terraform and Ansible to achieve end-to-end automation.
- Experience working with CI/CD solutions to enable automation of infrastructure at scale (pipeline design, testing and best practices using Jenkins/Concourse/GoCD/Gitlab)
- Programming and scripting including testing (Go/Python and BASH)
- Cloud Platforms (designing infrastructure deployment patterns within AWS/GCP and using managed services, such as big data tooling effectively)
- Containerisation (docker/containerd)
- Container Orchestration and mesh networking (e.g., Kubernetes and Istio)
- Good working knowledge of database, in particular strong knowledge in at least one of Kafka or Cassandra
- Experience working with Monitoring and logging systems, particularly Prometheus and Grafana
- Strong Linux System administration
- Strong networking knowledge including troubleshooting and security best practices
- Experience with CDN providers such as Akamai would be nice to have
If you are successful in your application for this role, your appointment will be subject to receiving a positive outcome from your Criminal Record Check.