This job might no longer be available.

Senior Reliability Engineer, Unannounced Project (Mobile)

4 years ago

Hail, adventurer! Blizzard Entertainment is seeking a seasoned and hearty Senior Reliability Engineer to join us on the quest of a lifetime: a brand-new, unannounced game project! This individual will be responsible for building a highly-available, resilient, and performant infrastructure to support a global player base of millions of avid gamers.

Our team values creativity, collaboration, and commitment to quality. We believe that every voice matters and each person is empowered to bring new ideas to the table regardless of their role or status. Do you strive to promote modern DevOps principles? To gain observability into every nook and cranny of your infrastructure? To automate all the things? Here is your chance!

The ideal candidate for this position is a person who understands distributed systems architecture and how applications interact with these systems at scale. They are comfortable investigating software performance, inspecting network traffic, and running live incidents.

If you are excited by the above description, if you are passionate about the intersection of games and technology, if you are curious about the challenges that lie ahead, we invite you to answer the call! Join us, and together we shall write a new chapter in the history of Blizzard!

Responsibilities

Work in cross-discipline teams to ensure service reliability, availability, and performance
Collaborate with our Server Engineering and Site Reliability Engineering teams to architect and maintain live services
Plan and forecast service capacity and demand, analyze software performance, and tune systems and software
Solve mission-critical service issues and build automation to prevent problem recurrence; automate away all toil
Identify root causes of production issues, and recommend permanent solutions for them
Setup and improve monitoring (metrics, logs, alerts, etc) to identify issues quickly
Develop effective documentation, tooling, and alerts to identify and address risks
Participate in on-call rotation with other members of Site Reliability Engineering team

Requirements

The ability to read/write code fluently in C# or Python
Deep understanding of Software Life Cycle; including git-based CI and CD pipelines
Five years of experience working with Linux systems and related tooling (kernel, shell, system libraries, file systems, client-server protocols, etc)
Networking: experience with network theory and protocols, e.g. TCP/IP, UDP, DNS, HTTP, TLS, and load balancing.
Understanding of cloud orchestration frameworks (terraform, Kubernetes, ansible, spinnaker, etc) and their role in IT transformation
Experience with public and private clouds: OpenStack, AWS, Azure, and/or GCP
Strong experience in distributed systems architectures – layered, event-driven, data-centered, service mesh, etc.
Familiarity with distributed message buses such as Kafka and RabbitMQ
Familiarity with service configuration and deployment tools, such as Ansible, Consul, Jenkins, Puppet, Terraform, and Vault
Experience with Linux container technologies (Docker, Kubernetes)
Strong interpersonal and communication skills

Pluses

BS/MS in Computer Science or related field, or equivalent experience
Experience with running a product at global scale including multiple regions
Passion for mobile games, and/or game industry experience

Create Your Profile — Game companies can contact you with their relevant job openings.

Apply