Role Overview:
We are seeking a skilled Data Engineer to join our team and play a crucial role in the development and maintenance of our recommendation system. As a Data Engineer, you will be responsible for designing and implementing the data infrastructure that powers the system, ensuring the seamless flow of data and the availability of high-quality information to drive personalized content recommendations.
Key Responsibilities:
- Design and implement a scalable, fault-tolerant, and highly available data pipeline to capture, process, and store user engagement data in real-time.
- Develop efficient data storage solutions, including the collisionless embedding table, to effectively represent and retrieve user data and content metadata.
- Optimize data processing and transformation workflows to enable the continuous training and adaptation of the recommendation model.
- Ensure the reliability, performance, and scalability of the data infrastructure to handle the growing volume and velocity of user interactions.
- Collaborate with the machine learning engineering team to understand their data requirements and provide the necessary data products to support the development and deployment of the recommendation system.
- Implement robust data monitoring, alerting, and troubleshooting mechanisms to maintain the overall health and reliability of the data ecosystem.
- Continuously explore and evaluate new data technologies, tools, and techniques to enhance the efficiency and capabilities of the data infrastructure.
- Document data pipelines, processes, and best practices to maintain system transparency and enable cross-team knowledge sharing.
Required Qualifications:
- Bachelor's or Master's degree in Computer Science, Data Engineering, or a related technical field
- 3+ years of experience in designing and implementing large-scale data pipelines and data infrastructure
- Proficient in Python and familiarity with data processing frameworks like Apache Spark, Apache Flink, or Apache Kafka
- Strong understanding of data modeling, data warehousing, and distributed data storage solutions (e.g., Hadoop, Hive, Cassandra)
- Experience with real-time data processing and stream processing architectures
- Solid understanding of data engineering best practices, including data quality, data security, and data governance
- Ability to work collaboratively in a cross-functional team environment and communicate technical concepts to stakeholders
Desired Skills: