This job might no longer be available.
Site Reliability Engineer(SRE)
4 years ago
Monitoring & Incident Management:
- Improve the studio’s reliability through monitoring, rapid response, communication and coordination.
- Develop and manage the deployment architecture for the application, develop the monitoring architecture and implement monitoring agents, dashboards, critical issues and alerts.
- Routinely identifies problems by observing and studying system architect, functionality and performance results. Fixing procedures with the overall studio architect and investigating surfaced issues, and handling incidents.
- Identifies operational priorities by assessing operational objectives; determining project objectives, such as, efficiency, cost savings, energy conservation, operator convenience, safety, environmental quality; estimating relevance, time, and costs.
- Development & Data Analyzing:
- Develops operational solutions by defining, studying, estimating, and screening alternative solutions; calculating economics; determining impact on total system.
- Build new tools to facilitate automated monitoring of operational environment.
- Anticipates operational problems by studying operating targets, modes of operation, unit limitations; monitoring unit performance.
- Improves operational quality results by studying, evaluating, and recommending process re crafting, implementing changes, contributing information and opinion to unit design and modification teams.
- Provides operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.
- Updates job knowledge by participating in educational opportunities; reading professional publications; maintaining personal networks; participating in professional organizations.
- Accomplishes engineering and organization mission by completing related results as needed.
Skills and Qualifications:
- Deep understanding of Linux and Networking administration
- Solid grasp of systems engineering and troubleshooting skills
- Shell scripting (BASH & PHP)
- Strong TCP/IP understanding and ability to produce detailed documentation
- Write up new and maintain technical documentation
- Ability to administer networking firewalls, routers, and switches
- S3 Maintenance, Apache maintenance, Load Balancer Management
- Puppet Management
- Cloud Management
- AWS Expertise (VPC, RDS, Route53 Integration (DNS))
Database fundamentals
- Administer MySQL and other opensource databases
- Write and perform basic queries to evaluate database stability, integrity and performance
- Large/Big Data Management
- Administer and maintain Aurora infrastructure
Monitoring Systems
- System Level (Nagios, Munin, Check_MK)
- Writing checks & scripts
- Log/Application Level (Splunk, Elastic Searching, Apache)
- Ability to diagnose infrastructure as a whole!
Extra Credit to have:
- Java
- C++
- Elasticache
- Vertica
What we offer you:
- Work in a studio that has complete P&L ownership of games
- Competitive salary, discretionary annual bonus scheme and Zynga RSUs
- Full medical, accident as well as life insurance benefits
- Catered breakfast, lunch and evening snacks
- Child care facilities for women employees and discounted facilities for male employees
- Well stocked pantry
- Generous Paid Maternity/Paternity leave
- Employee Assistance Programs
- Active Employee Resource Groups - Women at Zynga
- Frequent employee events
- Additional leave options for most employees
- Flexible working hours on many teams
- Casual dress every single day
- Work with cool people and impact millions of daily players!
Create Your Profile — Game companies can contact you with their relevant job openings.