Senior Engineer - Site Reliability

  • Sector: Monroe Hospitality
  • Contact: Jorelle Escueta
  • Client: Monroe Consulting Group
  • Location: City of ParaƱaque
  • Salary: Negotiable
  • Expiry Date: 16 September 2025
  • Job Ref: BBBH482501_1752805204
  • Contact Email: jorelle.escueta@monroeconsulting.com.ph

Executive search firm Monroe Consulting Group Philippines is recruiting on behalf of a large integrated resort and entertainment complex located in a major urban area in the Philippines. Our respected client is seeking for a Senior Engineer - Site Reliability that will ensures the stability, efficiency, and availability of applications, services, and infrastructure by utilizing monitoring and observability tools, while also creating and maintaining automation scripts to optimize performance across the technology stack. This position will report in Paranaque, Philippines.

Job Summary:

The Senior Engineer - Site Reliability is responsible for maintaining the health and performance of applications, services, and infrastructure through the use of monitoring and observability tools, and by developing automation scripts to support consistent and optimal system operations across all layers of the technology stack.

Main responsibilities:

  • Configure and maintenance of the enterprise monitoring tool to provide realtime visibility and state of health across the technology stack
  • Design and create dashboards to provide multi-level view based on functional requirement such as executive and tactical views
  • Create and maintain key threshold across all monitoring elements to ensure proactive detection and early detection of impending incident or problem
  • Analyze events and correlate to all observability and monitoring tools to capture trends and behavior patterns to assist in proactive course of actions
  • Design, develop and utilize automation tools and scripts to address repetitive actions and where possible create correction course of action to prevent and/or reduce prolonged outages
  • Work closely with operations team during incident and problem management for quick reaction response as identified using the monitoring tools
  • Regularly review and optimize infrastructure performance using logs, metrics and traces as part of continuous improvements thru adjustment of thresholds and monitoring requirement as environment constantly change
  • Develop and maintain a robust alerting strategy, including integration with on-call tools to ensure timely escalation and resolution of critical issues.
  • Implement and manage end-to-end event lifecycle processes to ensure accurate incident detection and efficient response.


Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or a related field; or equivalent work experience.
  • 2-5+ years of extensive experience as systems and network administrator
  • Hand-on experience managing monitoring tools such as but not limited to Solarwinds, Nagios, etc.
  • Evident understand what Observability and what it does
  • Proficient with major cloud platforms such as AWS, GCP, Azure and Alibaba Cloud
  • Good grasp on Observability platform such as Splunk and Dynatrace
  • Experience with containerization platform such as Docker and Kubernetes
  • Extensive experience with virtualization technology such as VMWare
  • Strong knowledge of networking using collapsed architecture or similar enterprise networking technology
  • Knowledgeable in scripting languages such as Python, Bash, or PowerShell.
  • AWS Certified Solutions Architect, Azure Solutions Architect, or equivalent certification.
  • Certified Kubernetes Administrator (CKA)Solid understanding of disaster recovery and business continuity practices.


All applications will be treated in the strictest of confidence. If you are a suitable match for this position, please send your application