Cloud Deployment Engineer (CDE) | Monroe Consulting Group

NOC Engineer - Linux & Application Support

Executive recruitment company Monroe Consulting Group's Technology Division is recruiting on behalf of a dynamic technology organization focused on high-performance infrastructure and systems.

Job Summary:

We are seeking a skilled NOC Engineer with a strong focus on Linux system administration and application support. This role involves troubleshooting a range of issues, including database performance, network connectivity, and deployment failures. The ideal candidate will have hands-on experience with compute platforms such as Kubernetes and virtual machines, along with a solid understanding of various storage solutions. We are looking for high-performance engineers who are curious and capable of solving real-world problems.

Job Responsibilities:

System Monitoring & Maintenance - Monitor and maintain system performance to ensure the stability and reliability of applications and infrastructure across the environment.
Technical Troubleshooting - Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures, including diagnosing problems at the underlying platform level (e.g., Kubernetes, virtual machines).
SLA Management - Ensure that issues are resolved within the stipulated Service Level Agreements (SLAs), maintaining high standards of service delivery and customer satisfaction.
Performance Optimization - Identify and address performance bottlenecks in applications and infrastructure to ensure optimal system performance.
Root Cause Analysis - Conduct root cause analysis for recurring incidents to develop long-term solutions and prevent future occurrences.
Proactive Monitoring - Improve monitoring solutions to proactively identify and mitigate issues before they impact services and end users.
Deployment Support - Assist in the deployment and configuration of new applications and services, ensuring adherence to best practices and security standards.
Automation Development - Develop and maintain scripts for automation of routine tasks and monitoring processes to improve operational efficiency.
Incident Response - Participate in on-call rotations and respond to critical incidents as they arise, providing timely resolution and communication.
System Analysis - Analyze system logs and metrics to identify trends and potential areas for improvement in system performance and reliability.
Capacity Planning - Assist in capacity planning and performance tuning to ensure optimal resource utilization and scalability.

Key Requirements:

Linux Administration - Strong expertise in Linux system administration with deep understanding of system operations and troubleshooting.
Application Support Experience - Proven experience in troubleshooting application support issues with a focus on performance and connectivity.
Scripting Skills - Experience in Bash/Shell scripting or automation for system administration tasks to streamline operations.
Database Knowledge - Solid understanding of database management and performance tuning to optimize application performance.
Platform Experience - Hands-on experience with Kubernetes and virtual machines in production environments.
Technical Problem-Solving - Ability to diagnose and resolve complex technical issues across compute, storage, network, and database components.
Analytical Mindset - Strong analytical skills and intellectual curiosity; able to question existing processes and understand their implications.
Self-Motivated Learning - Self-motivated learner who can operate autonomously with minimal guidance and continuously develop technical skills.
Problem-Solving Abilities - Excellent problem-solving abilities and a proactive approach to identifying and addressing challenges before they escalate.
Shift Flexibility - Open to a rotational shift schedule across different time slots, with reasonable schedules shared in advance.
Language Skills - Able to communicate effectively in Mandarin would be an added advantage for stakeholder engagement.

Preferred Skills:

Monitoring Tools - Familiarity with monitoring tools and performance optimization techniques such as Prometheus, Grafana, Nagios, or similar.
Networking Knowledge - Knowledge of networking concepts and troubleshooting methodologies including TCP/IP, DNS, load balancing, and firewalls.
Cloud Platforms - Hands-on knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and their services for scalable infrastructure.
DevOps Practices - Familiarity with DevOps practices and frameworks, including CI/CD, infrastructure as code, and containerization technologies.
Big Data Technologies - Familiarity with Big Data lifecycle (Big Data management / ingestion / processing / visualization) and the corresponding technologies (e.g., HDFS, YARN, Kafka, Spark, Flink, Hive, ELK stack).