Senior Data Engineer

Colombo, Sri Lanka

Apply Now!

Job Title: Senior Data Engineer

Employment Type: Full-Time

Location: Hybrid

About the role:

We are looking for a Data Engineer to take ownership of an existing data pipeline and help evolve it into a more robust, scalable, and maintainable system. You will begin by developing a thorough understanding of the current architecture and workflows, then work to restructure and improve the pipeline in a systematic and considered way.

This is a hands-on engineering role suited to someone who is comfortable navigating existing codebases, improving what's there, and applying engineering best practices to real production systems.

Key Responsibilities:

Develop a working understanding of the existing data pipeline architecture, data flows, and business logic before driving changes.
Redesign and extend Apache Airflow DAG structures to properly orchestrate the end-to-end pipeline, replacing manual and ad-hoc processes.
Consolidate, harden, and productionise existing Python scripts and notebooks used for ETL processing.
Maintain and improve data outputs across PostgreSQL and MongoDB.
Contribute to pipeline reliability, observability, and documentation.
Work within AWS-hosted infrastructure, ensuring pipelines are stable and appropriately monitored.

Required Skills:

3+ years of experience in data engineering or a closely related discipline.
Strong Python development skills, including experience writing production-quality ETL scripts and working across data-focused libraries (Pandas, etc.).
Hands-on experience with Apache Airflow or a comparable DAG-based orchestration tool (Prefect, Dagster) — including designing and managing DAG workflows in a production environment.
Demonstrated experience designing, building, and maintaining ETL/ELT pipelines.
Proficiency with PostgreSQL — query optimisation, schema design, and general database management.
Working experience with MongoDB for semi-structured data storage and retrieval.
Familiarity with AWS services relevant to data pipelines — including but not limited to S3, EC2, Lambda, RDS, and CloudWatch.
Ability to read, understand, and incrementally improve an inherited codebase.

Nice to Have:

Experience with distributed computing frameworks such as Dask or Apache Spark for parallelised data processing workloads.
Familiarity with containerisation and basic DevOps practices — Docker, CI/CD pipelines, environment and dependency management.
Experience with pipeline testing, data quality validation, and observability tooling.
Exposure to data modelling and warehouse design principles.

Apply Now!