Detailed requirement description

Experience: 3+ years of hands-on experience in operations/engineering for big data platforms or related fields.
Hive Expertise: In-depth understanding of Hive architecture and proven experience in performance tuning for complex Hive SQL queries. Familiarity with execution engines like Tez or Spark.
Databricks Experience: Practical, hands-on experience in administering Databricks platforms, including deep knowledge of their architecture, cluster management, Workspace, and job execution.
Cloud Proficiency: Solid experience with at least one major cloud provider (AWS, Azure, or GCP) and its core services (EC2/VMs, S3/Blob Storage, IAM, VPC, etc.).
Problem-Solving: Excellent problem-solving skills, a strong sense of ownership, and effective communication abilities.

Scope of Work

Platform Operations: Operate, monitor, and ensure the high availability of our big data platform (Hadoop/Spark ecosystem) and cloud infrastructure (Azure). Perform troubleshooting and performance tuning.
Hive Cluster Management: Manage and optimize Hive data warehouses, including metadata management, Hive SQL performance tuning, resource queue configuration, access control, and user support.
Databricks Administration: Oversee the deployment, configuration, upgrading, and daily operations of Databricks workspaces (Databricks on Azure Databricks). Manage cluster policies, job scheduling, cost optimization, and user permissions.
Monitoring & Alerting: Design, implement, and maintain a comprehensive monitoring and alerting system (using tools like Prometheus/Grafana/Datadog) to ensure platform health and rapid incident response.
User Support & Documentation: Provide technical support to data developers and analysts. Create and maintain high-quality operational documentation and runbooks.

Junior Intelligent Operation Engineer

Apply for this position