Site Reliability EngineerAristocratGdansk 80-241 Poland21 days agoApplySaveSummarySite Reliability Engineers at Product Madness live and breathe in production looking for opportunities to improve reliability through observability and engaging with the technical groups. We are looking for an SRE to join our Infrastructure Group. The ideal candidate will take part in:- creating the next generation of cloud infrastructure,- observability as a service,- creating CI/CD pipelines,- maintaining SRE/DevOps practices & culture inside Product Madness,- keeping all user-facing services and other production systems running smoothly,- applying sound engineering principles, operational discipline, and mature automation to the platform,- using cutting-edge automation and modern technologies.The key to the SRE's success is engagement across teams, contributing to the development and operation of games and services which meet reliability targets.Product Madness is growing fast, which means as an SRE you will have to balance speed with production while focusing on crucial reliability metrics and processes.The SRE will become a primary point responsible for the overall health, performance, and capacity of customer-facing services in cloud infrastructure, maintain back-end services in a cloud environment and assist in the roll-out and deployment of new product features. will be a key and improve our Engineering best practice across the observability.What you'll doDebug production issues across services and levels of the stack.Design, Build and maintain multiple environments on GCP using infrastructure as a code approach.Deploy new product features.Monitor the containerise environment/services and manage them with leading orchestration frameworks (Kubernetes).Develop tools to improve our ability to rapidly recover and effectively monitor custom applications in a large-scale UNIX environment.Design, manage and monitor Product Madness’s auto-scaling mechanism to help us manage Millions of customers worldwide in a modern and scalable way.Support production environments - troubleshooting and root cause analysis.Participate in a 24x7 rotation for second and third-tier escalations.Be on an on-call rotation to respond to incidents that impact availability, and provide support for Cloud Operation Engineers.Interface and work closely with various R&D Groups (Architects, Principal Engineers Developers, and Product Managers).What we're looking forProven experience as a DevOps/SRE/Infra EngineerProven experience with troubleshooting in Unix/LinuxExpertise in Java, Python, Ruby, Bash or experience in another programming languageExperience and knowledge of CI/CD design and practicePublic Cloud, preferably GCP but AWS and Azure are good too!Experience with Cloud Architecture Design principles and Cloud Architect certificationExperience creating infra-as code solutions using tools such as Terraform, Azure ARM templates, Cloudformation - a mustExperience with CI/CD tools and methodologies such as Jenkins, ArgoCD, CircleCI, GitHub Action etc - a mustHands-on implementation of Continuous Integration and Continuous Delivery in complex environments.Proven experience working in a production environment - a mustSolid experience implementing production-grade Kubernetes Clusters with containerised environments and microservices (Docker, Kubernetes)About youGood communication skills in EnglishNaturally curious with a boldness to pursue aspirations as a committed lifelong learnerYou have great interpersonal skills and are able to communicate effectively with your team members and other teams across the businessYou have an eye for detail and can apply logical thinking when managing tasksGood organisational and prioritising skillsYou are flexible, team-oriented and willing to work in a very fast-paced environmentAllergic to manual, repetitive tasks with a desire to remove ToilObsesses over systems performance and values simplicity over complexity Create Your Profile — Game companies can contact you with their relevant job openings. ApplySave