Engineering Manager, Site Reliability Engineering (Service Operations)

As Engineering Manager, you will be responsible for supporting the engineers developing our infrastructure platform and the applications and services that depend on it, used by hundreds of millions of people around the world.

You are responsible for:

Managing one to two globally distributed teams within Wikimedia’s Site Reliability Engineering organization

Recruiting, hiring, and helping onboard new team members

Working with team members to set individual performance goals, and supporting them in meeting and evolving their goals and career path

Triaging incoming workload, maintaining focus on priorities, and setting realistic expectations for both peers and team members

Coordinating and communicating with other members of the Wikimedia engineering teams on relevant projects, and contributing to the organizational strategy

Continuously developing the roadmap of the team in alignment with other SRE and Technology teams, and helping to draft and execute the team’s annual and quarterly plans

Project managing new and existing initiatives

Leading the definition, refinement, and execution of the processes through which the team manages and performs work

Leading incident response, diagnosis, and follow-up on system alerts and outages across Wikimedia’s production infrastructure

Facilitate the definition and establishment of Service Level Objectives and track Error Budgets with service owners and stakeholders

Skills and Experience:

Prior experience managing teams

Prior hands-on experience with reliability engineering or software (within the last 3 years preferred)

Aptitude for automation and streamlining of tasks

Communicate effectively in both spoken and written English

Ability to work independently, as an effective part of a globally distributed team

Willing and able to travel several times a year for occasional in-person meetings

B.S. or M.S. in Computer Science or the equivalent in related work experience

Qualities that are important to us:

Commitment to the mission of the organization and our values

Commitment to our guiding principles

Ability to disagree in a respectful manner and yet work towards a solution even when you disagree

Good at asynchronous communication

Solutions-focused. The Wikimedia ecosystem is complex, resources are limited, and our guiding principles are ambitious. We want you to work to find solutions embracing these factors.

Self motivated with an ability to navigate through ambiguity and bring a project to completion with limited directions

Curiosity and commitment to learn

Additionally, we’d love it if you have:

Experience working in a distributed, largely remote environment

Experience contributing to open source projects

Experience working with a Kubernetes based platform to deploy services