This job might no longer be available.

Senior Site Reliability Engineer

1 year ago

To further drive our vision of premier stability and rapid feature delivery, we are looking for a Senior Site Reliability Engineer to join our team. As a Senior SRE, you should feel exceptionally comfortable bringing architectural design proposals to the table for consideration among your colleagues on our platform and infrastructure development teams. You will be one of the principal technical designers helping push our cloud-native platform toward the future. You will be responsible for driving the implementation of flexible cloud architectures with an automation-first emphasis; manual user intervention likely makes you uneasy and maybe even a little twitchy. We would expect a successful candidate for this position to be a self-starter with the ability to complete tasks independently. Though you will have access to technical leadership and senior engineers at your disposal, you should feel well acquainted with tackling complex problems without significant oversight.

Observability is paramount. If we can't measure it, we can't prove it works; if we can't prove it works, it must be assumed it doesn't work. This is a philosophy you hopefully love (and preferably obsess over). If we can't observe how a new feature is behaving, our SRE team is excited to dive into the application code and make the necessary improvements.

Typical Day

Tl;dr : You will be deeply immersed in Go and Python observability stacks; plenty of AWS and Terraform sprinkled in as well.

This is a very hands-on Senior Engineering role where your days will be filled with building solutions to technical challenges in the observability and availability of our SHiFT online services. You will evangelize for and be obsessed with user experience as it relates to the services you support. You will help manage and orchestrate each of these by leaning heavily on technologies like Go, Terraform , Docker , and Bash . On any given day, you should expect to spend at least 80% of your time actively engineering and developing solutions; the rest will be a mixture of planning, reviewing code from your colleagues, participating in design meetings, documentation, and self-development.

This position will eventually require you to carry a company-paid mobile device and participate in 24/7 on-call rotations alongside your engineering colleagues. Don't worry though, our on-call experience doesn't suck.

Core Responsibilities:

Design, engineer, and develop solutions for ensuring the observability and reliability of our online platform
Be a trusted voice in the evangelism of reliability engineering throughout the team with an eagerness for mentoring other developers on the team
Help define and oversee short and mid-term project roadmaps for the future of our SRE team
Participate in after-hours on-call support rotations

Must Have (the non-negotiable parts):

Candidates must have at least 4 years of professional experience instrumenting complex observability stacks in object oriented programming languages, preferably Go.
Proficiency in AWS container management, orchestration, and observability features (ECS, Fargate, Aurora, AppConfig, CloudWatch, etc.)
Professional Experience managing AWS access and security services (IAM, kms, Secrets Manager, WAFv2, etc.)
Professional Experience in Terraform and/or CloudFormation
Minimum of 2 years experience with containers in a professional setting, preferably Docker
Adept understanding of observability stack management (otel, tracing, monitoring, alerting, structured logging, APM, etc.)
Comfortable communicator, able to clearly detail designs and implementations on an individual level and in large group settings

Should Have (some wiggle room):

Extensive hands-on experience with OpenTelemetry
Hands-on experience developing and maintaining CI/CD pipelines, preferably in git/GitLab
Understanding of RESTful and Websocket based APIs
Bachelor's degree in computer science, related field, or equivalent training and professional experience

Now you're just showing off:

Familiarity with Datadog
Familiarity with Atlassian products (OpsGenie, JIRA, Confluence)
Experience working with developers in an agile environment
Experience in the games industry, preferably launching multiple online-enabled AAAs
Knowledge about Gearbox-owned IPs

Gearbox Entertainment believes that all team members should be able to enjoy a work environment free from all forms of discrimination and harassment. We are committed to reflecting the diversity of the world we strive to entertain. As an Equal Opportunity Employer, we provide fair and equal treatment to all team members and applicants. We do not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity or expression, national origin, disability, genetic information, pregnancy or maternity, veteran status, or any other status protected by applicable national, federal, state or local law.

Create Your Profile — Game companies can contact you with their relevant job openings.

Apply