Site Reliability Engineering Manager

Position Ref: SREM0519BW

Staffordshire

Salary

Competitive

Closing date

June 28, 2019

Description

bet365, one of the world’s leading online gambling companies, is a driving force in the development of enterprise and Internet technology. We have rapidly grown into a global operation, delivering an unrivalled online experience to more than 45 million customers in 19 languages.

The DevOps department is a new function within bet365’s technology sector. As a key part of this function we are creating a Site Reliability Engineering (SRE) team and require a Site Reliability Engineering Manager to help build this team and its processes. The DevOps function will introduce many valuable changes to our working processes and tooling and the SRE team will have a pivotal role within this evolution.

The SRE team will work with Development, Platform Delivery (networks, database and infrastructure) and other teams in the DevOps function to determine aspects of applications that should be monitored, alerts that should be raised and what tooling or automation should be put in place to aid issue resolution and capacity planning. It will then produce dashboards, automation and tooling making these available to the appropriate teams and supporting them during usage.

The primary goal of the team is to utilise system data to help the technology business make more informed decisions regarding capacity requirements and application health in the production estate.

As the Site Reliability Engineering Manager you will build and lead the SRE team, working closely with the Head of DevOps and collaborators across the technology sector to recruit and coach engineers, devise team process, approaches and technologies, and to agree a prioritised backlog of work. You will challenge yourself, your team, peers and collaborators to continuously improve thereby delivering increasing levels of value to the department.

Requirements

Main Responsibilities:

• Recruiting Site Reliability Engineers into the team.
• Devising the role of the Site Reliability Engineer.
• Evolving team process and approaches.
• Working with Development and Platform Delivery teams to determine the most valuable ways for SRE to add value.
• Working with IT Operations to provide and support the use of tooling that will enable them to offer increasing levels of value to the business.
• Offering support to other parts of DevOps, Platform Delivery and IT Services regarding software engineering practices.
• Agreeing and managing a backlog of work.
• Governing the quality of the teams output.
• Ensuring best use of technology within the team.
• Delivering SRE software to agreed timescales.
• Devising suitable architecture for SRE software.
• Proposing and overseeing POCs for the SRE toolset, bespoke and 3rd party.
• Ensuring that training, mentoring and coaching is in place for the team.
• Taking appropriate input from outside bet365 regarding best practice.

Essential Skills, Experience and Attributes:

• Strong SRE or software engineering background, as an engineer, not simply as a manager.
• Team management – recruitment, training, 1-2-1s.
• Stakeholder management and ability to collaborate with people from different disciplines.
• Process reengineering and continuous improvement.
• Working knowledge of contemporary monitoring, analytics tooling and best practice.
• Excellent issue investigation and diagnosis abilities.
• Keen interest in industry trends, particularly DevOps.

Apply For This Job

If you believe you possess the skills and experience necessary for this role then please email your CV and Covering Letter quoting the Position Reference SREM0519BW. Alternatively you can send the application by post to Human Resources Department, Hillside (Shared Services 2018) Limited, bet365 House, Media Way, Stoke-on-Trent, England, ST1 5SZ.

By applying to us you are agreeing to share your Personal Data in accordance with our Recruitment Privacy Policy.