This job might no longer be available.

Site Reliability Engineer (SRE)

2 years ago

Who We Are:

2K develops and publishes interactive entertainment globally for console systems, handheld gaming devices, and personal computers, including smartphones and tablets. 2K is a leading publisher of today’s most popular gaming genres and most well-known for critically acclaimed game franchises like NBA 2K, WWE 2K, Bioshock, Borderlands, Evolve, XCOM, and the beloved Sid Meier’s Civilization.

About the Team: Site Reliability Engineering (SRE)

The 2K Site Reliability team is responsible for the operations and infrastructure of all consumer-facing production systems and developer-facing systems at 2K Games, including NBA2K game services, customer-facing account services, and websites. This team handles systems and services spanning multiple datacenters both terrestrial and cloud-based.

What We Need:

We are looking for an engineer who is passionate about building multi-datacenter infrastructure and services. Robust systems and problem-solving skills are required as we develop solutions for game studios and support data centers around the world alongside a group of outstanding engineers.

In this role, you will collaborate with network engineers, systems architects, and development staff to support our gamers and the needs of the business.

What We Do:

Build and automate scaled service infrastructure
Own and operate monitoring and alerting services across multiple regions
Define and implement standards that will impact systems, services, and multiple software environments
Diagnose and resolve technical issues from both internal and external customers
Remove out infrastructure toil with automation
Spread SRE and Operational Best Practices to customers and the greater organization
Participate in Site Reliability Engineering’s on-call rotation

Who We Believe Will Be an Outstanding Fit:

You are eager to work in a fast-paced environment with other highly skilled engineers who are passionate about service availability and health! The idea of building data center infrastructure services from greenfield to implementation moves you!

Required Qualifications:

Expertise in scalable production services (config management, monitoring, infrastructure-as-a-code, load balancing, distributed systems)
Experience with Systems Infrastructure, Virtualization, Kubernetes, and many of the following technologies: Helm, Docker, Terraform, Elasticsearch, Prometheus, Puppet, Git, Jenkins
Strong understanding of the SLI, SLO, and SLA concepts
A passion for service health and reliability
Demonstrated ability to decompose sophisticated problems and engage in lateral investigations
Strong coding experience in at least one or more of Python, Ruby, Java, or Go and a good understanding of code management
Experience with Unix/Linux operating systems(tuning and system internals) and TCP/IP Networking Fundamentals

Preferred Qualifications:

Prior hands-on experience working in a highly available environment, scaling to thousands of nodes
Experience mentoring other specialists
Experience working with product owners on service level

Create Your Profile — Game companies can contact you with their relevant job openings.

Apply