Job description

We are working with an innovative organisation applying cutting-edge AI and data science to healthcare and life sciences. The company develops advanced data-driven platforms that harness multimodal information to accelerate therapeutic discovery and improve translational research.

This is an opportunity to join a collaborative, science-led team working at the intersection of AI, data engineering, and biology.

This role will offer you:

The opportunity to design and maintain robust, scalable data systems that power AI-driven discoveries.
A central role in managing diverse datasets including biological, clinical, and real-world evidence data.
Collaboration with a multidisciplinary team of scientists, engineers, and AI researchers to ensure seamless integration of complex data into machine learning workflows.
A chance to contribute to impactful healthcare innovations by enabling reproducible, high-quality science through world-class data engineering.

Responsibilities:

Design, implement, and optimise ETL/ELT pipelines for large-scale scientific datasets.
Build systems to harmonise and integrate diverse data types while preserving scientific and experimental context.
Implement automated data quality frameworks, including validation, anomaly detection, and monitoring.
Maintain data catalogues, metadata management systems, and lineage tracking to ensure reproducibility.
Develop APIs and data services that provide seamless access to datasets for ML training, research, and applications.

You will bring:

Bachelor’s or Master’s degree in Computer Science, Data Engineering, Bioinformatics, or a related field.
5+ years’ experience in data engineering with strong skills in Python, SQL, and modern data orchestration tools.
Experience with at least one major cloud platform (AWS, GCP, or Azure) including storage and compute services.
Proficiency with SQL and NoSQL databases, ideally including time-series or graph databases.

How to stand out:

Experience working with scientific or healthcare datasets, such as omics or other high-dimensional data.
Familiarity with data workflows and pipelines used in computational science.
Experience building data pipelines for machine learning, including feature stores and training infrastructure.
A high degree of adaptability and motivation to work in a fast-moving environment.
A collaborative, solution-oriented mindset with strong ownership and problem-solving skills.

Data Engineer

Consultant

Natasha Cole

Business Manager - KAM