This job might no longer be available.

Site Reliability Engineer I - IT

4 years ago

Monitoring & Incident Management:

Improve the studio’s reliability through monitoring, rapid response, communication and coordination.
Develop and manage the deployment architecture for the application, develop the monitoring architecture and implement monitoring agents, dashboards, escalations and alerts.
Routinely identifies operational problems by observing and studying system architect, functionality and performance results. Troubleshooting procedures with the overall studio architect and investigating surfaced issues, and handling incidents.
Identifies operational priorities by assessing operational objectives; determining project objectives, such as, efficiency, cost savings, energy conservation, operator convenience, safety, environmental quality; estimating relevance, time, and costs.

Development & Data Analyzing:

Develop operational solutions by defining, studying, estimating, and screening alternative solutions; calculating economics; determining impact on total system.
Create new tools to facilitate automated monitoring of the studio’s operational environment.
Anticipates operational problems by studying operating targets, modes of operation, unit limitations; monitoring unit performance.
Improves operational quality results by studying, evaluating, and recommending process re architecting, implementing changes, contributing information and opinion to unit design and modification teams.
Provides operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.
Updates job knowledge by participating in educational opportunities; reading professional publications; maintaining personal networks; participating in professional organizations.
Accomplishes engineering and organization mission by completing related results as needed.

Mastery of Systems Linux and Networking administration

Cloud Management

Database fundamentals

Administer and maintain MySQL and other opensource databases
Write and perform basic queries to evaluate database stability, integrity and performance
Large/Big Data Management
Administer and maintain Aurora infrastructure

Monitoring Systems