Accessibility Links

Site Reliability Engineer

  • Salary: Negotiable
  • Job type: Contract
  • Location: London
  • Sector: IT
  • Date posted: 09/10/2018
  • Job reference: BBBH89380

Site Reliability Engineer / DevOps Engineer


Site Reliability Engineering is an engineering discipline that combines software, infrastructure and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Our engineers are responsible for ensuring our services are resilient, responsive and have service uptime appropriate to our customers' needs while keeping an ever-watchful eye on capacity and performance. In addition, we continually strive to improve our services in a fast-paced environment.

We build our own creative engineering solutions to operational problems. We consider the big picture of how our systems relate to each other, using a breadth of tools and approaches to solving a wide spectrum of problems. In our day-to-day work, we use automation to limit time spent on operational work and we proactively identify potential risk factors and convert them into actionable improvements.

Your main responsibilities :

  • Design, create and deliver infrastructure, code or services to improve the availability, scalability, latency, and efficiency of our internal or customer-facing services
  • Troubleshooting and problem solving
  • Contribute to the stability and security of the environment
  • Be part of a team that contributes and creates new designs, architectures and standards for large-scale distributed systems
  • Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
  • Define process and best practices for reliable and timely delivery of quality products
  • Communicate work status regularly with a clear articulation of design choices along the way

Desired skill sets :

  • Configuration management (example: Ansible/Puppet/Terraform/Chef )
  • CI/CD ( Jenkins / Concourse / GoCD / GitLab )
  • Networking (entry level Networking: ex. LB,Routing,Switching)
  • Virtualisation (VMWare / Openstack / Xen)
  • Containerisation (entry-level)
  • CloudPlatforms (AWS/GCP/Azure/OPC)
  • Monitoring (Prometheus, Nagios, Icinga)
  • Logging (Splunk/ELK)
  • Scripting / System-Programming (Python,Bash,Go,Java)
  • DataBases (entry-level)
  • System Administration (Patching/Upgrading/RCA/Performance Optimisation)

Core Competency:

  • Config management ( Testing and Automation )
  • CI/CD ( Understanding PipeLine design for CD/CI )
  • Networking ( deeper level knowledge of Networking, ex. FireWall, Security, VLANs, L3/L4/L7 ... etc )
  • Storage Mgmt ( Storage Provisioning, LVM, Raid-levels, NFS, Object-Storage, Block-Storage etc )
  • Virtualisation (VMWare / Openstack / Xen)
  • Containerisation (Deep understanding of Docker-compose)
  • Container Orchestration ( Kubernetes/Swarm/Mesos )
  • CloudPlatforms (architecture design on Cloud-Platform)
  • Monitoring (Implementation of Monitoring Solutions / Metric Driven alerting ... )
  • Logging (Splunk/ELK)
  • Scripting / System-Programming (Python,Bash,Go,Java)
  • DataBases (HA/Clustering, Optimisation)
  • System Administration (Patching/Upgrading/RCA/Performance Optimisation)
  • Security ( SSH / SSL / TLS / HMac / IPS / IDS )
  • IT Architectural (Low-Level design)
  • Microservices / Immutable / IaaC
Similar jobs
View more similar jobs