This job might no longer be available.
Senior Reliability Engineer, Unannounced Project (Mobile)
4 years ago
Hail, adventurer! Blizzard Entertainment is seeking a seasoned and hearty Senior Reliability Engineer to join us on the quest of a lifetime: a brand-new, unannounced game project! This individual will be responsible for building a highly-available, resilient, and performant infrastructure to support a global player base of millions of avid gamers.
Our team values creativity, collaboration, and commitment to quality. We believe that every voice matters and each person is empowered to bring new ideas to the table regardless of their role or status. Do you strive to promote modern DevOps principles? To gain observability into every nook and cranny of your infrastructure? To automate all the things? Here is your chance!
The ideal candidate for this position is a person who understands distributed systems architecture and how applications interact with these systems at scale. They are comfortable investigating software performance, inspecting network traffic, and running live incidents.
If you are excited by the above description, if you are passionate about the intersection of games and technology, if you are curious about the challenges that lie ahead, we invite you to answer the call! Join us, and together we shall write a new chapter in the history of Blizzard!
Responsibilities
- Work in cross-discipline teams to ensure service reliability, availability, and performance
- Collaborate with our Server Engineering and Site Reliability Engineering teams to architect and maintain live services
- Plan and forecast service capacity and demand, analyze software performance, and tune systems and software
- Solve mission-critical service issues and build automation to prevent problem recurrence; automate away all toil
- Identify root causes of production issues, and recommend permanent solutions for them
- Setup and improve monitoring (metrics, logs, alerts, etc) to identify issues quickly
- Develop effective documentation, tooling, and alerts to identify and address risks
- Participate in on-call rotation with other members of Site Reliability Engineering team
Requirements
- The ability to read/write code fluently in C# or Python
- Deep understanding of Software Life Cycle; including git-based CI and CD pipelines
- Five years of experience working with Linux systems and related tooling (kernel, shell, system libraries, file systems, client-server protocols, etc)
- Networking: experience with network theory and protocols, e.g. TCP/IP, UDP, DNS, HTTP, TLS, and load balancing.
- Understanding of cloud orchestration frameworks (terraform, Kubernetes, ansible, spinnaker, etc) and their role in IT transformation
- Experience with public and private clouds: OpenStack, AWS, Azure, and/or GCP
- Strong experience in distributed systems architectures – layered, event-driven, data-centered, service mesh, etc.
- Familiarity with distributed message buses such as Kafka and RabbitMQ
- Familiarity with service configuration and deployment tools, such as Ansible, Consul, Jenkins, Puppet, Terraform, and Vault
- Experience with Linux container technologies (Docker, Kubernetes)
- Strong interpersonal and communication skills
Pluses
- BS/MS in Computer Science or related field, or equivalent experience
- Experience with running a product at global scale including multiple regions
- Passion for mobile games, and/or game industry experience
Create Your Profile — Game companies can contact you with their relevant job openings.