Senior Data Engineer
Colombo,
Sri Lanka
Colombo,
Sri Lanka
Job Title: Senior Data Engineer
Employment Type: Full-Time
Location: Hybrid
About the role:
We are looking for a Data Engineer to take ownership of an existing data pipeline and help evolve it into a more robust, scalable, and maintainable system. You will begin by developing a thorough understanding of the current architecture and workflows, then work to restructure and improve the pipeline in a systematic and considered way.
This is a hands-on engineering role suited to someone who is comfortable navigating existing codebases, improving what's there, and applying engineering best practices to real production systems.
Key Responsibilities:
- Develop a working understanding of the existing data pipeline architecture, data flows, and business logic before driving changes.
- Redesign and extend Apache Airflow DAG structures to properly orchestrate the end-to-end pipeline, replacing manual and ad-hoc processes.
- Consolidate, harden, and productionise existing Python scripts and notebooks used for ETL processing.
- Maintain and improve data outputs across PostgreSQL and MongoDB.
- Contribute to pipeline reliability, observability, and documentation.
- Work within AWS-hosted infrastructure, ensuring pipelines are stable and appropriately monitored.
Required Skills:
- 3+ years of experience in data engineering or a closely related discipline.
- Strong Python development skills, including experience writing production-quality ETL scripts and working across data-focused libraries (Pandas, etc.).
- Hands-on experience with Apache Airflow or a comparable DAG-based orchestration tool (Prefect, Dagster) — including designing and managing DAG workflows in a production environment.
- Demonstrated experience designing, building, and maintaining ETL/ELT pipelines.
- Proficiency with PostgreSQL — query optimisation, schema design, and general database management.
- Working experience with MongoDB for semi-structured data storage and retrieval.
- Familiarity with AWS services relevant to data pipelines — including but not limited to S3, EC2, Lambda, RDS, and CloudWatch.
- Ability to read, understand, and incrementally improve an inherited codebase.
Nice to Have:
- Experience with distributed computing frameworks such as Dask or Apache Spark for parallelised data processing workloads.
- Familiarity with containerisation and basic DevOps practices — Docker, CI/CD pipelines, environment and dependency management.
- Experience with pipeline testing, data quality validation, and observability tooling.
- Exposure to data modelling and warehouse design principles.