- Posted 05 September 2025
- LocationParis
- Job type Permanent
- Reference222382
Back to jobs
Data Engineer
Job description
We are working with an innovative organisation applying cutting-edge AI and data science to healthcare and life sciences. The company develops advanced data-driven platforms that harness multimodal information to accelerate therapeutic discovery and improve translational research.
This is an opportunity to join a collaborative, science-led team working at the intersection of AI, data engineering, and biology.
This role will offer you:
- The opportunity to design and maintain robust, scalable data systems that power AI-driven discoveries.
- A central role in managing diverse datasets including biological, clinical, and real-world evidence data.
- Collaboration with a multidisciplinary team of scientists, engineers, and AI researchers to ensure seamless integration of complex data into machine learning workflows.
- A chance to contribute to impactful healthcare innovations by enabling reproducible, high-quality science through world-class data engineering.
Responsibilities:
- Design, implement, and optimise ETL/ELT pipelines for large-scale scientific datasets.
- Build systems to harmonise and integrate diverse data types while preserving scientific and experimental context.
- Implement automated data quality frameworks, including validation, anomaly detection, and monitoring.
- Maintain data catalogues, metadata management systems, and lineage tracking to ensure reproducibility.
- Develop APIs and data services that provide seamless access to datasets for ML training, research, and applications.
You will bring:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Bioinformatics, or a related field.
- 5+ years’ experience in data engineering with strong skills in Python, SQL, and modern data orchestration tools.
- Experience with at least one major cloud platform (AWS, GCP, or Azure) including storage and compute services.
- Proficiency with SQL and NoSQL databases, ideally including time-series or graph databases.
How to stand out:
- Experience working with scientific or healthcare datasets, such as omics or other high-dimensional data.
- Familiarity with data workflows and pipelines used in computational science.
- Experience building data pipelines for machine learning, including feature stores and training infrastructure.
- A high degree of adaptability and motivation to work in a fast-moving environment.
- A collaborative, solution-oriented mindset with strong ownership and problem-solving skills.