This job might no longer be available.

Senior, Site Reliability Engineer

3 years ago

Senior, Site Reliability Engineer

Technology Stack

US Remote or Canada

Magic Leap’s mission is to harmonize people and technology to create a better, more unified world. Magic Leap is a team of creatives and technologists building a personal, spatial computing platform that seamlessly blends the digital and physical worlds. Magic Leap’s Business Technology organization (BT) is seeking a Senior Site Reliability Engineer professional to design and implement core platform solutions for our SAP ERP and Commerce technology stack.

JOB DESCRIPTION

A critical role of Magic Leap's Business Technology (BT) solutions teams, Lead Site Reliability Engineer you will work on CD/CI initiatives critical to Business applications needs with opportunities to projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack for business technologies as we continue to push technology forward.

RESPONSIBILITIES

Maximize system performance by instrumenting and monitoring performance
Monitor application environments taking a holistic view of system health and responding to issues as needed
Troubleshoot and analyze system related problems and outages
Scheduling and testing upgrades to core platform hardware and services
Collaborating with system architects to optimize performance and reliability
Secure compute systems through the enforcement of policies and roles
Reporting system operational status by gathering and prioritizing information
Automate and orchestrate workloads across multiple public cloud providers
Provide primary operational support and engineering for multiple large distributed business technology software applications, such as, SAP ERP, Commerce technology, data, and robotic stacks
Develop tooling and processes to drive and improve customer experience, create playbooks, increase efficiency and reduce incidents
Design, develop, test, deploy, maintain and improve the software with a focus on modern DevOps processes and technologies
Stay up to date on best practices and cutting-edge technology developments
Other responsibilities might include developing automated CI/CD pipelines, support the release of small features very quickly, compare design and technology options with technical leads, and manage individual project priorities, deadlines, and deliverables
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, to push our capabilities forward, getting ahead of customer needs, and innovate to continually improve
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Support internal and external customers on multiple platforms to troubleshoot customer environments to increase user satisfaction
Develop tooling and processes to drive and improve customer experience, create playbooks, increase efficiency and reduce incidents
Work with engineering and IT teams to oversee and manage code releases
Design, develop, test, deploy, maintain and improve the software with a focus on modern DevOps processes and technologies
Stay up to date on best practices and cutting-edge technology developments
Balance feature development speed and reliability with well-defined service level objectives

REPORTING RELATIONSHIPS

As Lead, Site Reliability you will report to our Manager, Business Technology, Core Platform.

QUALIFICATIONS

Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
Sound fundamentals in Linux-based systems including proficiency with Linux tools
A solid understanding of networking and core Internet protocols (e.g., TCP/IP, DNS, TLS, SMTP, HTTP)
Strong programming skills in a modern language. Go, Python, Node.js, etc.
Ability to script in a shell language (Bash or POSIX Shell)
Experience with public cloud providers (AWS, Google Cloud Platform, etc.)
Experience working with container runtimes (Docker, containerd, etc.)
Experience working with container-orchestration systems (Kubernetes, ECS, etc.)
Comfort with frequent, incremental code testing and deployment.
Strong grasp of automation tools (Terraform, Gitlab CI, Concourse CI, etc.)
Comfort with collaboration, open communication and reaching across functional borders.
Ability to remain calm under pressure and take command of a recovery effort
8+ years of experience working in a software engineering or development role.

EDUCATION

Education and/or Experience: Bachelor's degree (B.A.) from four-year college or university in Engineering, Computer Science or another highly technical, scientific discipline or equivalent practical experience

ADDITIONAL INFORMATION

All your information will be kept confidential according to Equal Employment Opportunities guidelines

#LI-Remote

Create Your Profile — Game companies can contact you with their relevant job openings.

Apply