Responsibilities:
Manage daily operations of Huawei Cloud resources, including system deployment, monitoring, troubleshooting, and performance optimization.
Design and implement efficient cluster resource management and scheduling strategies to meet diverse business needs. Monitor system status, respond promptly to and resolve system failures, ensuring system stability and availability.
Develop and maintain automated operation scripts to improve operational efficiency. Collaborate with R&D teams to participate in the deployment and optimization of the product.
Prepare technical documentation, including operation manuals, troubleshooting procedures, and system optimization reports.
Requirements:
Bachelor’s degree or higher in Computer Science, Information Technology, or a related field.1-3 years of working experience in cloud platform operations or a related field.
Proficiency in Linux operating systems with extensive command-line experience. Familiarity with at least one scripting language (e.g., Shell, Python, or go) and the ability to write automated operation scripts.
Experience with mainstream monitoring tools (e.g., Prometheus, Grafana) and configuring various alerts.
Strong system performance analysis skills, with the ability to independently troubleshoot and optimize systems.
Excellent teamwork and communication skills, with the ability to collaborate across departments to resolve issues. Familiarity with containerization technologies such as Docker and Kubernetes.
Knowledge of CI/CD operation processes, proficiency with GitLab, Jenkins, Docker, Harbor, and other components. Ability to set up a continuous delivery software environment from scratch. Experience in deploying and configuring web servers like NGINX and Tomcat to meet diverse business requirements.
Understanding of HTTP protocols, proficiency in configuring HTTPS certificates, and familiarity with website filing procedures. Experience using caching technologies to accelerate web access and configuring CDN (e.g., Tencent Cloud CDN) for global content distribution and acceleration.
Knowledge of web application security protection, including the ability to identify and mitigate common security threats.
Preference will be given to local candidates in Hong Kong with proficiency in Cantonese, Mandarin, and English.
Bonus Points: Experience supporting pre-training, fine-tuning, and evaluation of large language models.
Note: This position may require on-call duty (standby).













