This job might no longer be available.
Senior, Site Reliability Engineer
3 years ago
Senior, Site Reliability Engineer
Technology Stack
US Remote or Canada
Magic Leap’s mission is to harmonize people and technology to create a better, more unified world. Magic Leap is a team of creatives and technologists building a personal, spatial computing platform that seamlessly blends the digital and physical worlds. Magic Leap’s Business Technology organization (BT) is seeking a Senior Site Reliability Engineer professional to design and implement core platform solutions for our SAP ERP and Commerce technology stack.
JOB DESCRIPTION
A critical role of Magic Leap's Business Technology (BT) solutions teams, Lead Site Reliability Engineer you will work on CD/CI initiatives critical to Business applications needs with opportunities to projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack for business technologies as we continue to push technology forward.
RESPONSIBILITIES
- Maximize system performance by instrumenting and monitoring performance
- Monitor application environments taking a holistic view of system health and responding to issues as needed
- Troubleshoot and analyze system related problems and outages
- Scheduling and testing upgrades to core platform hardware and services
- Collaborating with system architects to optimize performance and reliability
- Secure compute systems through the enforcement of policies and roles
- Reporting system operational status by gathering and prioritizing information
- Automate and orchestrate workloads across multiple public cloud providers
- Provide primary operational support and engineering for multiple large distributed business technology software applications, such as, SAP ERP, Commerce technology, data, and robotic stacks
- Develop tooling and processes to drive and improve customer experience, create playbooks, increase efficiency and reduce incidents
- Design, develop, test, deploy, maintain and improve the software with a focus on modern DevOps processes and technologies
- Stay up to date on best practices and cutting-edge technology developments
- Other responsibilities might include developing automated CI/CD pipelines, support the release of small features very quickly, compare design and technology options with technical leads, and manage individual project priorities, deadlines, and deliverables
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, to push our capabilities forward, getting ahead of customer needs, and innovate to continually improve
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Support internal and external customers on multiple platforms to troubleshoot customer environments to increase user satisfaction
- Develop tooling and processes to drive and improve customer experience, create playbooks, increase efficiency and reduce incidents
- Work with engineering and IT teams to oversee and manage code releases
- Design, develop, test, deploy, maintain and improve the software with a focus on modern DevOps processes and technologies
- Stay up to date on best practices and cutting-edge technology developments
- Balance feature development speed and reliability with well-defined service level objectives
REPORTING RELATIONSHIPS
As Lead, Site Reliability you will report to our Manager, Business Technology, Core Platform.
QUALIFICATIONS
- Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
- Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Sound fundamentals in Linux-based systems including proficiency with Linux tools
- A solid understanding of networking and core Internet protocols (e.g., TCP/IP, DNS, TLS, SMTP, HTTP)
- Strong programming skills in a modern language. Go, Python, Node.js, etc.
- Ability to script in a shell language (Bash or POSIX Shell)
- Experience with public cloud providers (AWS, Google Cloud Platform, etc.)
- Experience working with container runtimes (Docker, containerd, etc.)
- Experience working with container-orchestration systems (Kubernetes, ECS, etc.)
- Comfort with frequent, incremental code testing and deployment.
- Strong grasp of automation tools (Terraform, Gitlab CI, Concourse CI, etc.)
- Comfort with collaboration, open communication and reaching across functional borders.
- Ability to remain calm under pressure and take command of a recovery effort
- 8+ years of experience working in a software engineering or development role.
EDUCATION
Education and/or Experience: Bachelor's degree (B.A.) from four-year college or university in Engineering, Computer Science or another highly technical, scientific discipline or equivalent practical experience
ADDITIONAL INFORMATION
All your information will be kept confidential according to Equal Employment Opportunities guidelines
#LI-Remote
Create Your Profile — Game companies can contact you with their relevant job openings.