Staff Site Reliability Engineer, Big Data
Palo Alto, CA
9 months ago
Build the future of mobile games with MZ!
As a global leader in mobile gaming, we’re dedicated to developing games the world can’t wait to experience. Games like Final Fantasy XV: A New Empire, Mobile Strike, and Game of War: Fire Age.
We build massive mobile games that break down linguistic and geographic barriers by uniting an unprecedented number of global players in one gaming world. Our team pushes the boundaries of innovation in a player-driven ecosystem.
As a studio, we are masters of our own destiny, untethered by the traditional publisher model. Every update and feature creates amazing experiences for millions of players!
Machine Zone is seeking an experienced Site Reliability Engineer to join our engineering team and own Big Data within the company. You should have experience with large scale big data environments and a knack for problem solving and optimization. Qualified candidates will be responsible to lead a team of highly skilled engineers that push the limits of scalability through our specialized single-shard technology and cross-functional collaboration with all engineering groups at Machine Zone. You should be well-versed and hands-on with different Big Data frameworks, and will be responsible for architecture and design. You will be viewed as an expert in your field and should have experience leading or mentoring a team. As part of this role you will be responsible for helping deliver high quality online, mobile MMOs, and your work will be seen by tens of millions of global players in the future!
What you'll be doing
- Hadoop / HBase Administration - maintaining, developing and implementing policies and procedures for ensuring the security and integrity of the clusters.
- Administer, manage and scale multiple Storm, Kafka and Druid clusters
- Monitoring and resolving performance, and capacity issues.
- Assist Engineering teams with troubleshooting and fine-tuning Spark and MapReduce jobs.
- Automation is in everything we do. Ability to create scripts and programs that would automate daily tasks
- Provisioning of new servers for existing clusters and making sure they are monitored accordingly.
- Investigating new versions of Hadoop and other data stores, as needed
- Work closely with the Engineering teams in ensuring good practices are followed.
- Investigate and benchmark other Big Data solutions
- Experience with Graphite or other monitoring tools and implementing graphs
Your background and who your are
- Extensive knowledge of Hadoop, HBase and its internals
- Knowledge in Storm and Kafka cluster administration
- Expertise in large scale, high volume operations environments
- Optimize clusters to its peak performance under heavy load
- Experience in automation using bash or python
- Good understanding and knowledge of Linux (CentOS)
- BS in Computer Science or a related field
- 8+ years of experience in the job offered or a related field
- Knowledge in Druid cluster administration
- Knowledge/Experience in Kubernetes, dockers and OpenShift Administration
MZ is an equal opportunity employer and considers qualified applicants without regard to race, gender, sexual orientation, gender identity or expression, genetic information, national origin, age, disability, medical condition, religion, marital status or veteran status, or any other basis protected by law.