Job Description
7 days ago
Join Doo Group – Explore α Better Future
Doo Group is a global financial services group with FinTech as its core. With our 10 major business lines, spanning Brokerage, Wealth Management, Property, Payment & Exchange, FinTech, Financial Education, Health Care, Consulting, Cloud, and Digital Marketing, Doo Group has seamlessly provided clients with comprehensive products and services across the world. Through a one-stop approach, Doo Group remains committed to helping our clients achieve an ideal financial life while moving towards a better future together!
Looking for a New Challenge? Join Us as We Expand Globally!
As we continue our global business expansion, we’re on the lookout for talented individuals who are motivated to support our strategic goals and initiatives. Join a team that values innovation and growth.
DOO you have what it takes?
We are seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join our team. The successful candidate will be responsible for ensuring the high availability, reliability, and performance of our financial services. This role involves deep collaboration with cross-functional teams to identify and resolve system weaknesses, enhance monitoring systems, and drive automation in operations.
What you’ll be working on:
High Availability Management:
• Deeply understand the business and manage the high availability of financial services, continuously improving the business SLA.
• Identify system weaknesses through comprehensive data operations (availability metrics, historical incidents, resource utilization) and implement improvement projects.
• Continuously refine and enhance the monitoring system to improve efficiency and shorten fault localization time.
System Performance and Stability:
• Ensure efficient and stable operation of IaaS and PaaS infrastructure.
• Continuously improve operation and maintenance standards and refine standard operating procedures.
• Monitor and review system architecture, process logic, system performance, and stability to drive issue resolution with project and business teams.
Incident Management:
• Respond promptly to production faults, coordinating with development, operation, maintenance, and product teams to troubleshoot and resolve issues.
• Responsible for fault response time and MTTR.
Automation and Efficiency:
• Guide SRE foundational operations towards automation, platformization, and intelligence to improve overall management efficiency.
• Develop and implement automated operation and maintenance tools, enhancing operational efficiency.
Best Practices and Documentation:
• Accumulate best practices in operations and provide guidance for business architecture design and component selection.
• Produce and maintain operation and maintenance technical documentation.
• Regularly share technical and management achievements with team members.
Additional Responsibilities:
• Perform other related tasks as required.
What we’re looking for:
• Bachelor's degree in Computer Science or a related field.
• Over 7 years of experience in system operation and maintenance/SRE for medium to large-scale internet/financial industry.
• At least 3 years of experience in maintaining production environments of messaging middleware, caching, K8s, and databases.
Job Highlights:
• High availability management of key financial technology business lines.
• Continuous improvement of business SLA through incident, quality, and risk operations.
• Construction and refinement of automated operation and maintenance systems, enhancing operational efficiency.
Technical Skills:
• Proficient in Shell scripting and skilled in 1-2 programming languages among Golang, Java, and Python.
• Strong knowledge of computing, storage, networking, security, and computer architecture.
• Solid knowledge and experience with IAAS and PAAS technology stacks.
• Familiarity with basic network principles, TCP/UDP networks, HTTP, Socket, CDN, and other technologies.
• Skilled in middleware/databases such as Nginx, LVS, Redis, Kafka, MySQL, Elasticsearch.
• Proficient in Docker/K8s container platforms and related underlying technologies and principles.
• Deep understanding of internet technology architecture, network communication protocols, application servers, load balancing, and microservices architecture.
• Extensive experience in service operation and maintenance or middleware operation and maintenance troubleshooting.
Additional Skills:
• Familiar with CI/CD tools such as Jenkins and GitLab, with practical experience in CI/CD process formulation and integration.
• Capable of 24/7 fault response and handling, with strong stress tolerance, good service awareness, and teamwork spirit.
• Outgoing personality with excellent cross-team communication skills, strong sense of responsibility, and outstanding driving force.
• Detail-oriented, good at thinking, with strong data analysis and problem-solving abilities.
Technology Stack:
• Databases: MySQL, PostgreSQL, Elasticsearch, Redis, MongoDB, etcd, OceanBase, ClickHouse
• Middleware: Nacos, Kafka, Zookeeper, RabbitMQ, RocketMQ, Apisix, Nginx
• Containerization: K8s, Rancher
• Storage: NAS, Ceph
• Network/Load Balancing: CDN, HAProxy, frp, OpenVPN-AS, Apisix
• CI/CD: Confluence, JIRA, GitLab, Harbor
• Languages: Go, Java, Python
Preferred Qualifications:
• Experience in assisting remote projects across regions.
• Experience in technical work related to securities, futures companies, or blockchain.
• Experience in developing automated operation and maintenance tools.
What we offer:
• Seeking to expand your regional work experience? Work alongside industry-leading professionals from around the globe in an environment filled with opportunities for continuous learning and growth.
• We reward our best employees with quarterly employee recognition awards in USD.
• Feeling drowsy after lunch? Take advantage of our smart pantry access and weekly tea break/lucky draw.
Life as DOOers
At Doo Group, we embrace a culture where continuous growth, collaboration, and creativity are at the heart of everything we do. As a DOOer, you'll collaborate with top professionals from around the globe, dive into exciting projects, and play a pivotal role in shaping the future of finance.
Unlock your potential with Doo Group. Apply now and step into a role where your impact is celebrated!
Discover your potential with Doo Group. Apply now and be part of our success story!
#DooBeyondLimit #TogetherWeDooMore #SucceedYourCareerWithDoo
Doo Group is a global financial services group with FinTech as its core. With our 10 major business lines, spanning Brokerage, Wealth Management, Property, Payment & Exchange, FinTech, Financial Education, Health Care, Consulting, Cloud, and Digital Marketing, Doo Group has seamlessly provided clients with comprehensive products and services across the world. Through a one-stop approach, Doo Group remains committed to helping our clients achieve an ideal financial life while moving towards a better future together!
Looking for a New Challenge? Join Us as We Expand Globally!
As we continue our global business expansion, we’re on the lookout for talented individuals who are motivated to support our strategic goals and initiatives. Join a team that values innovation and growth.
DOO you have what it takes?
We are seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join our team. The successful candidate will be responsible for ensuring the high availability, reliability, and performance of our financial services. This role involves deep collaboration with cross-functional teams to identify and resolve system weaknesses, enhance monitoring systems, and drive automation in operations.
What you’ll be working on:
High Availability Management:
• Deeply understand the business and manage the high availability of financial services, continuously improving the business SLA.
• Identify system weaknesses through comprehensive data operations (availability metrics, historical incidents, resource utilization) and implement improvement projects.
• Continuously refine and enhance the monitoring system to improve efficiency and shorten fault localization time.
System Performance and Stability:
• Ensure efficient and stable operation of IaaS and PaaS infrastructure.
• Continuously improve operation and maintenance standards and refine standard operating procedures.
• Monitor and review system architecture, process logic, system performance, and stability to drive issue resolution with project and business teams.
Incident Management:
• Respond promptly to production faults, coordinating with development, operation, maintenance, and product teams to troubleshoot and resolve issues.
• Responsible for fault response time and MTTR.
Automation and Efficiency:
• Guide SRE foundational operations towards automation, platformization, and intelligence to improve overall management efficiency.
• Develop and implement automated operation and maintenance tools, enhancing operational efficiency.
Best Practices and Documentation:
• Accumulate best practices in operations and provide guidance for business architecture design and component selection.
• Produce and maintain operation and maintenance technical documentation.
• Regularly share technical and management achievements with team members.
Additional Responsibilities:
• Perform other related tasks as required.
What we’re looking for:
• Bachelor's degree in Computer Science or a related field.
• Over 7 years of experience in system operation and maintenance/SRE for medium to large-scale internet/financial industry.
• At least 3 years of experience in maintaining production environments of messaging middleware, caching, K8s, and databases.
Job Highlights:
• High availability management of key financial technology business lines.
• Continuous improvement of business SLA through incident, quality, and risk operations.
• Construction and refinement of automated operation and maintenance systems, enhancing operational efficiency.
Technical Skills:
• Proficient in Shell scripting and skilled in 1-2 programming languages among Golang, Java, and Python.
• Strong knowledge of computing, storage, networking, security, and computer architecture.
• Solid knowledge and experience with IAAS and PAAS technology stacks.
• Familiarity with basic network principles, TCP/UDP networks, HTTP, Socket, CDN, and other technologies.
• Skilled in middleware/databases such as Nginx, LVS, Redis, Kafka, MySQL, Elasticsearch.
• Proficient in Docker/K8s container platforms and related underlying technologies and principles.
• Deep understanding of internet technology architecture, network communication protocols, application servers, load balancing, and microservices architecture.
• Extensive experience in service operation and maintenance or middleware operation and maintenance troubleshooting.
Additional Skills:
• Familiar with CI/CD tools such as Jenkins and GitLab, with practical experience in CI/CD process formulation and integration.
• Capable of 24/7 fault response and handling, with strong stress tolerance, good service awareness, and teamwork spirit.
• Outgoing personality with excellent cross-team communication skills, strong sense of responsibility, and outstanding driving force.
• Detail-oriented, good at thinking, with strong data analysis and problem-solving abilities.
Technology Stack:
• Databases: MySQL, PostgreSQL, Elasticsearch, Redis, MongoDB, etcd, OceanBase, ClickHouse
• Middleware: Nacos, Kafka, Zookeeper, RabbitMQ, RocketMQ, Apisix, Nginx
• Containerization: K8s, Rancher
• Storage: NAS, Ceph
• Network/Load Balancing: CDN, HAProxy, frp, OpenVPN-AS, Apisix
• CI/CD: Confluence, JIRA, GitLab, Harbor
• Languages: Go, Java, Python
Preferred Qualifications:
• Experience in assisting remote projects across regions.
• Experience in technical work related to securities, futures companies, or blockchain.
• Experience in developing automated operation and maintenance tools.
What we offer:
• Seeking to expand your regional work experience? Work alongside industry-leading professionals from around the globe in an environment filled with opportunities for continuous learning and growth.
• We reward our best employees with quarterly employee recognition awards in USD.
• Feeling drowsy after lunch? Take advantage of our smart pantry access and weekly tea break/lucky draw.
Life as DOOers
At Doo Group, we embrace a culture where continuous growth, collaboration, and creativity are at the heart of everything we do. As a DOOer, you'll collaborate with top professionals from around the globe, dive into exciting projects, and play a pivotal role in shaping the future of finance.
Unlock your potential with Doo Group. Apply now and step into a role where your impact is celebrated!
Discover your potential with Doo Group. Apply now and be part of our success story!
#DooBeyondLimit #TogetherWeDooMore #SucceedYourCareerWithDoo
More jobs like this
Data Scientist/Software Developer – Risk Management
Societe Generale
Central and Western, Hong Kong, China
🎉 Got an interview?










