Cloud Deployment Engineer (CDE)

NOC Engineer - Linux & Application Support

Executive recruitment company Monroe Consulting Group's Technology Division is recruiting on behalf of a dynamic technology organization focused on high-performance infrastructure and systems.

Job Summary:

We are seeking a skilled NOC Engineer with a strong focus on Linux system administration and application support. This role involves troubleshooting a range of issues, including database performance, network connectivity, and deployment failures. The ideal candidate will have hands-on experience with compute platforms such as Kubernetes and virtual machines, along with a solid understanding of various storage solutions. We are looking for high-performance engineers who are curious and capable of solving real-world problems.

Job Responsibilities:

  • System Monitoring & Maintenance - Monitor and maintain system performance to ensure the stability and reliability of applications and infrastructure across the environment.

  • Technical Troubleshooting - Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures, including diagnosing problems at the underlying platform level (e.g., Kubernetes, virtual machines).

  • SLA Management - Ensure that issues are resolved within the stipulated Service Level Agreements (SLAs), maintaining high standards of service delivery and customer satisfaction.

  • Performance Optimization - Identify and address performance bottlenecks in applications and infrastructure to ensure optimal system performance.

  • Root Cause Analysis - Conduct root cause analysis for recurring incidents to develop long-term solutions and prevent future occurrences.

  • Proactive Monitoring - Improve monitoring solutions to proactively identify and mitigate issues before they impact services and end users.

  • Deployment Support - Assist in the deployment and configuration of new applications and services, ensuring adherence to best practices and security standards.

  • Automation Development - Develop and maintain scripts for automation of routine tasks and monitoring processes to improve operational efficiency.

  • Incident Response - Participate in on-call rotations and respond to critical incidents as they arise, providing timely resolution and communication.

  • System Analysis - Analyze system logs and metrics to identify trends and potential areas for improvement in system performance and reliability.

  • Capacity Planning - Assist in capacity planning and performance tuning to ensure optimal resource utilization and scalability.

Key Requirements:

  • Linux Administration - Strong expertise in Linux system administration with deep understanding of system operations and troubleshooting.

  • Application Support Experience - Proven experience in troubleshooting application support issues with a focus on performance and connectivity.

  • Scripting Skills - Experience in Bash/Shell scripting or automation for system administration tasks to streamline operations.

  • Database Knowledge - Solid understanding of database management and performance tuning to optimize application performance.

  • Platform Experience - Hands-on experience with Kubernetes and virtual machines in production environments.

  • Technical Problem-Solving - Ability to diagnose and resolve complex technical issues across compute, storage, network, and database components.

  • Analytical Mindset - Strong analytical skills and intellectual curiosity; able to question existing processes and understand their implications.

  • Self-Motivated Learning - Self-motivated learner who can operate autonomously with minimal guidance and continuously develop technical skills.

  • Problem-Solving Abilities - Excellent problem-solving abilities and a proactive approach to identifying and addressing challenges before they escalate.

  • Shift Flexibility - Open to a rotational shift schedule across different time slots, with reasonable schedules shared in advance.

  • Language Skills - Able to communicate effectively in Mandarin would be an added advantage for stakeholder engagement.

Preferred Skills:

  • Monitoring Tools - Familiarity with monitoring tools and performance optimization techniques such as Prometheus, Grafana, Nagios, or similar.

  • Networking Knowledge - Knowledge of networking concepts and troubleshooting methodologies including TCP/IP, DNS, load balancing, and firewalls.

  • Cloud Platforms - Hands-on knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and their services for scalable infrastructure.

  • DevOps Practices - Familiarity with DevOps practices and frameworks, including CI/CD, infrastructure as code, and containerization technologies.

  • Big Data Technologies - Familiarity with Big Data lifecycle (Big Data management / ingestion / processing / visualization) and the corresponding technologies (e.g., HDFS, YARN, Kafka, Spark, Flink, Hive, ELK stack).