This job might no longer be available.
Senior Site Reliability Engineer for Storage Platform
2 years ago
Every day, tens of millions of people from around the world come to Roblox to play, learn, work, and socialize in immersive digital experiences created by the community. Our vision is to build a platform that enables shared experiences among billions of users. This is what’s known as the metaverse: a persistent space where anyone can do just about anything they can imagine, from anywhere in the world and on any device. Join us and you’ll usher in a new category of human interaction while solving exceptional challenges that you won’t find anywhere else.
As a Sr. Site Reliability Engineer for Storage Platform, you'll support Roblox's storage platform by designing and maintaining our large scale KV store, caching, Kafka and Object Storage infrastructure while contributing to our internal Infrastructure-as-a-Service offerings.
You are:
- Experience designing & operating large-scale distributed systems handling billions of real-time requests per second. Deep Knowledge in one or more following technologies: Caching(Redis), Kafka , Distributed database (CockroachDB), OLAP , Object Storage system
- Experience with system configuration management with familiarity in Automation tools like Chef and Terraform
- Experience building deploy pipeline on top of container orchestrators like Kubernetes or Nomad and service discovery systems like Consul
- Experience with programming languages, like Python or Go
- Experience with telemetry stacks, like Grafana, Prometheus monitoring, AlertManager and Kibana
- Experience with Linux systems and shells
- BS degree (or equivalent professional experience) in Computer Science, with at least 5 years of hands on experience
You will:
- Have a leading role in designing and implementing our internal Infra-as-a-Service offerings on top of a container orchestrator platform
- Provide primary operational support and engineering for multiple large distributed software
- Build automation and frameworks to manage platform infrastructure, services and handle various software or hardware faults
- Measure and optimize system availability, reliability and performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and improving
- Improve service Service level agreement and end-end rollout time of our suite of software solutions
You’ll Love:
- Industry-leading compensation package
- Excellent medical, dental, and vision coverage
- A rewarding 401k program
- Flexible vacation policy
- Roflex - Flexible and supportive work policy
- Roblox Admin badge for your avatar
- At Roblox HQ:
- Free catered lunches
- Onsite fitness center and fitness program credit
- Annual CalTrain Go Pass
Create Your Profile — Game companies can contact you with their relevant job openings.