Detailed requirement description
- Experience: 3+ years of hands-on experience in operations/engineering for big data platforms or related fields.
- Hive Expertise: In-depth understanding of Hive architecture and proven experience in performance tuning for complex Hive SQL queries. Familiarity with execution engines like Tez or Spark.
- Databricks Experience: Practical, hands-on experience in administering Databricks platforms, including deep knowledge of their architecture, cluster management, Workspace, and job execution.
- Cloud Proficiency: Solid experience with at least one major cloud provider (AWS, Azure, or GCP) and its core services (EC2/VMs, S3/Blob Storage, IAM, VPC, etc.).
- Problem-Solving: Excellent problem-solving skills, a strong sense of ownership, and effective communication abilities.
Scope of Work
- Platform Operations: Operate, monitor, and ensure the high availability of our big data platform (Hadoop/Spark ecosystem) and cloud infrastructure (Azure). Perform troubleshooting and performance tuning.
- Hive Cluster Management: Manage and optimize Hive data warehouses, including metadata management, Hive SQL performance tuning, resource queue configuration, access control, and user support.
- Databricks Administration: Oversee the deployment, configuration, upgrading, and daily operations of Databricks workspaces (Databricks on Azure Databricks). Manage cluster policies, job scheduling, cost optimization, and user permissions.
- Monitoring & Alerting: Design, implement, and maintain a comprehensive monitoring and alerting system (using tools like Prometheus/Grafana/Datadog) to ensure platform health and rapid incident response.
- User Support & Documentation: Provide technical support to data developers and analysts. Create and maintain high-quality operational documentation and runbooks.

