Back to jobs

Data Engineer

Job description

We are working with an innovative organisation applying cutting-edge AI and data science to healthcare and life sciences. The company develops advanced data-driven platforms that harness multimodal information to accelerate therapeutic discovery and improve translational research.

This is an opportunity to join a collaborative, science-led team working at the intersection of AI, data engineering, and biology.

This role will offer you:

  • The opportunity to design and maintain robust, scalable data systems that power AI-driven discoveries.
  • A central role in managing diverse datasets including biological, clinical, and real-world evidence data.
  • Collaboration with a multidisciplinary team of scientists, engineers, and AI researchers to ensure seamless integration of complex data into machine learning workflows.
  • A chance to contribute to impactful healthcare innovations by enabling reproducible, high-quality science through world-class data engineering.

Responsibilities:

  • Design, implement, and optimise ETL/ELT pipelines for large-scale scientific datasets.
  • Build systems to harmonise and integrate diverse data types while preserving scientific and experimental context.
  • Implement automated data quality frameworks, including validation, anomaly detection, and monitoring.
  • Maintain data catalogues, metadata management systems, and lineage tracking to ensure reproducibility.
  • Develop APIs and data services that provide seamless access to datasets for ML training, research, and applications.

You will bring:

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, Bioinformatics, or a related field.
  • 5+ years’ experience in data engineering with strong skills in Python, SQL, and modern data orchestration tools.
  • Experience with at least one major cloud platform (AWS, GCP, or Azure) including storage and compute services.
  • Proficiency with SQL and NoSQL databases, ideally including time-series or graph databases.

How to stand out:

  • Experience working with scientific or healthcare datasets, such as omics or other high-dimensional data.
  • Familiarity with data workflows and pipelines used in computational science.
  • Experience building data pipelines for machine learning, including feature stores and training infrastructure.
  • A high degree of adaptability and motivation to work in a fast-moving environment.
  • A collaborative, solution-oriented mindset with strong ownership and problem-solving skills.