This job might no longer be available.
Senior Site Reliability Engineer
13 days ago
We are Mythical Games. A Venture-backed next-generation game technology company at the intersection of video games and economics led by industry veterans. Our goal is to lead the industry with the launch of exceptional video game experiences that leverage distributed ledger technology, while also providing a platform of robust tools that will allow any other game developers to do the same.
Our Site Reliability Engineering team is looking for a talented and driven Senior Site Reliability Engineer - Cloud Infrastructure to work with our awesome team based in our Kansas City, MO office. The SRE in this role is cloud infrastructure focused and will work to create a reliable, performant and secure canvas upon which Mythical’s cloud-based applications are built.
The right candidate for this job (is):
- An experienced engineer who has been heavily involved in the design and operation of containerized production systems using Kubernetes, Openshift or similar container orchestration technology
- Passionate about distributed systems and working with highly scalable applications
- Enjoys new technological challenges and is motivated to solve them
- Smart, highly motivated, self-starter who thrives in a bottom-up, fast-paced, highly technical environment
- Effective collaborator, experienced in creating technical partnerships across teams
- An unwavering passion for meeting demands and delivering an epic customer service
This role requires solid experience in scalable infrastructure design, cloud computing environments, and hands-on technical skills. This position is expected to:
- Ensure high availability, performance, and security of APIs and backend services
- Build and maintain tooling to make code and configuration deployments self-serve for the development team
- Collaborate with the development and operations teams to design the infrastructure required for deploying scalable and reliable applications
- Regularly review existing infrastructure for opportunities for service improvement, cost reduction, and increased security
- Collaborate with Engineering and Product Management partners to translate customer, business, and technical requirements into architectural designs and feature releases
- Ensure operational visibility into applications by adding instrumentation and creating dashboards for proactive monitoring and failure resolution
- Perform application load testing to expose bottlenecks and other areas of improvement prior to an application going live
- Participate in an on-call rotation to ensure the success of uptime-critical applications
- 5+ years experience as an Infrastructure, DevOps, Site Reliability or another infrastructure-focused engineering role
- Experience running a production application on Kubernetes is strongly desired
- Prior experience designing infrastructure for distributed microservice applications, with an emphasis on gRPC for communication between services
- Demonstrated proficiency in at least one dynamic scripting language such as Python, Ruby, Groovy, or Bash. Experience with Java applications is a plus.
- Prior experience managing and operating Linux VMs on cloud computing platforms such as GCP or AWS
- Deep knowledge of the full network stack and the ability to maintain an organized and secure network across multiple clusters, projects, and offices
- Ability to operate Web Application Firewalls and managed edge services like Cloudflare or Fastly
- Experience with monitoring systems such as Prometheus, InfluxDB, Sensu, Graphite.
- Ability to build NOC-style dashboards using tools like Grafana
- Working knowledge of relational DBMS such as Postgres and MySQL. Experience with distributed implementations of relational databases such as Spanner, CockroachDB or Aurora is a plus
- Experience with CI\CD orchestration pipelines such as Jenkins, GitHub Actions, CircleCI, as well as familiarity with deployment strategies like blue\green deployment, canary releases, etc…
- Experience with Load Testing and frameworks such as Locust
- Understanding of service meshes like Istio, LinkerD, Consul
- Experience scaling highly-available ElasticSearch clusters