This job might no longer be available.
Senior Site Reliability Engineer (Contract)
1 year ago
Why PlayStation?
PlayStation isn’t just the Best Place to Play — it’s also the Best Place to Work. Today, we’re recognized as a global leader in entertainment producing The PlayStation family of products and services including PlayStation®5, PlayStation®4, PlayStation®VR, PlayStation®Plus, acclaimed PlayStation software titles from PlayStation Studios, and more.
PlayStation also strives to create an inclusive environment that empowers employees and embraces diversity. We welcome and encourage everyone who has a passion and curiosity for innovation, technology, and play to explore our open positions and join our growing global team.
The PlayStation brand falls under Sony Interactive Entertainment, a wholly-owned subsidiary of Sony Corporation.
Senior Site Reliability Engineer
Los Angeles, CA
This is position is a contract role with SIE
As a member of the operations SRE team within the platform technology group, you will carry the responsibility of keeping key user experiences on the platform available, resilient, and impactful, while continually enabling our service teams to deliver new and exciting products and technical features. Our team strives to iteratively learn, improve and automate our processes every single day, which continually sets the standard for operational excellence within our organization. You will be empowered to drive and lead technical initiatives, helping identify and proactively drive improvements in both process and technology supporting millions of users.
Responsibilities:
- Application operations and production support of internal and public-facing services within an AWS cloud environment, ensuring availability, resiliency, scalability, and performance.
- Provision, automate and ensure the production readiness of all new services and features introduced.
- Identify areas for operational process improvement and automation. Drive the hands-on development of scripts and tools to automate these processes within our environment.
- Increase observability on our platform by implementing robust monitoring and alerting patterns across our services. Develop rich, informative dashboards/reports on our services that provide valuable insight and meaningful alerting to drive down the MTTD and MTTR on platform incidents.
- Collaborate and partner with other SRE teams that specialize in areas such as data services, CICD, and platform hosting to inspire changes and ensure optimal end-to-end system performance and resiliency across all back-end services within PlayStation.
- Iteratively drive performance and capacity validation analysis for our services. Apply AWS patterns and technologies such as spot instances, dynamic auto-scaling and EKS to optimize resource usage and AWS spend.
- Conduct, document and present root cause analysis documents to share incident insights and findings with our broader engineering organization.
- Provide rotational on-call support where you’ll respond, detect, triage and resolve production incidents.
Required Skills:
- More than 4 years of experience with Cloud Technologies preferably with AWS
- Person who operates with mindset where Security should be adopted on various levels of application and infrastructure lifecycle
- Building and deploying Infrastructure as Code: CloudFormation/Terraform
- Hands-on working experience with containerization technology like Docker
- Expert level knowledge and hands on implementation experience with Container Orchestrations like Kubernetes, specifically with EKS
- Good experience with large scale microservice architecture and also moving monolith to microservices
- Extensive working experience with Serverless Technology like AWS lambda
- AWS systems and network protocols (ie: ALB, R53, API-Gateway, TCP/IP, HTTP/HTTPS, DNS)
- Building continuous integration and continuous delivery (CICD) pipelines in Jenkins, Spinnaker, GIT Actions, Git Labs or similar
- Proficient with one of scripting language including Go, Python and or Shell
- High desire to use Config Management with one of Ansible, Puppet, Chef, Salt etc
- Strong understanding of Linux, kernel and networking protocol
- Understanding of networking basics and cloud networking concepts
- Experience with On call rotation and supported production environments
- Operating and supporting large scale and/or critical customer-facing production services or applications
Nice to have:
- Experience with hosting and CDN technologies like Akamai and Cloudflare
- Monitoring and Alerting solutions including Datadog and Prometheus
- Logging and log aggregation solutions like Splunk, ElasticSearch and AWS CloudWatch Logs
- Tracing & debugging
- Managed Databases like RDS with mySql and postgres
- Certifications in Linux, AWS, Docker, Kubernetes
Experience:
- BS or MS degree in Computer Science, Software Engineering, or related technical area
- 7+ years professional experience
- 4+ years AWS Cloud experience
- 4+ years operating and supporting services in production environment at scale
Equal Opportunity Statement:
Sony is an Equal Opportunity Employer. All persons will receive consideration for employment without regard to gender (including gender identity, gender expression and gender reassignment), race (including colour, nationality, ethnic or national origin), religion or belief, marital or civil partnership status, disability, age, sexual orientation, pregnancy or maternity, trade union membership or membership in any other legally protected category.
We strive to create an inclusive environment, empower employees and embrace diversity. We encourage everyone to respond.
PlayStation is a Fair Chance employer and qualified applicants with arrest and conviction records will be considered for employment.
Create Your Profile — Game companies can contact you with their relevant job openings.