This job might no longer be available.
Senior, Cloud Engineer
3 years ago
Senior, Cloud Engineer
US Remote or Canada
Magic Leap is looking for a senior engineer to focus on live site operations and incident response management.
Job Description
In this role, you will be responsible for day-to-day operations of our production live site systems, coordinate response to an outage and build incident management engineering systems based on industry standards and ITSM principals.
The ideal candidate is very knowledgeable with ITSM and is experienced in IT Incident Management engineering, processes improvement with a proven track record of resolving critical impacting incidents affecting microservice architect-based engineering services.
Responsibilities
- Oversight of 24x7 Major Incident Response
- Continually improve the engineering, efficiency and effectiveness of the Incident Response program
- Develop, measure, and report process performance and functional metrics in order to identify opportunities, measure success, or validate expected outcomes
- Tightly integrate incident management tools & processes with monitoring & observability platforms, production engineering dashboard and other ITSM tools.
- Define SLO & SLA metrics with engineering service owners & work with monitoring team to
- Bring continuous improvement to support and operational practices.
- Handle escalations and communicate clearly and effectively to all stakeholders including senior company leaders
Qualifications
- 5+ years of incident management in a high paced technology company
- Track record of managing complex incident management
- 5+ years of experience in managing production system of build & release tools, large scale public cloud based micro service with 100K+ concurrent users
- Prior experience of working in production engineering w/ regional NOC & SOC
- Prior experience with instrumenting mission critical services on a globally distributed level, using cloud hosting providers like AWS, GCP and more
- Prior experience integrating event management systems such as Pager Duty and other production engineering system
- Prior experience with Cloud Watch, StackDriver, Prometheus, Data Dog, Sumo Logic
Education
- BA/BS in Computer Science or related field and equivalent experience
Additional Information
All your information will be kept confidential according to Equal Employment Opportunities guidelines.
Create Your Profile — Game companies can contact you with their relevant job openings.