- PrimaryBidSenior Data EngineerDIGITAL & ITApril 2022 - December 2023 (1 year and 8 months)Industry: FinTech Environment: AWS & GCP Key Responsibilities:•Builtan end-to-end ETL pipeline from MySQL database to Redshift warehouse using Python.•Extracteddata from various sources, primarily MongoDB, using Fivetran.•Performeddata transformation inside Redshift using dbt-cloud.•Developedcustom ELT pipeline in Python for MySQL in AWS.•Builtcustom dbt API triggers using Python asyncio module.•ImplementedGitLab CI/CD pipeline and deployed dockerized ELT in Kubernetes cluster as a cron job.•UtilisedDBT for building data models based on SQL queries and Jinja macros.•Implementeddata tests using dbt test and great-expectations style tests from dbt hub.•UsedLooker to create data models and dashboards with important metrics.•Successfullymigrated data from AWS to GCP.•Implementedthe latest data governance tools by GCP, such as Dataplex, DLP, and taxonomy. Brief description of the project: PrimaryBid is a technology platform that provides individual investors with fair access to public companies raising capital. As a Senior Data Engineer, I was responsible for building and maintaining the data pipeline and data models that supported key business decisions
- BabylonData EngineerSeptember 2019 - April 2022 (2 years and 7 months)Industry: Healthcare Environment: GCP Key Responsibilities:•Builtself-serve tools using Python.•Designedand created an ETL pipeline using BigQuery and Airflow.•UsedDBT for building data models based on SQL queries and Jinja macros.•Createda test framework for unit testing using the Pytest module.•DevelopedAirflow DAGs using Python for building and orchestrating the ETL pipeline.•UtilisedKubernetes to run containerized applications triggered through Airflow DAG.•UsedSQL to write transformation logic in BigQuery.•Designedand created CI/CD pipeline using Jenkins.•Containerizedself-serve products using Docker.•UtilisedLooker to create data models and expose them as self-serve products.•WroteAPIs to ingest data from various sources.•Developeda framework to handle schema evolution for breaking and non-breaking changes.•Extractedand loaded data from S3 parquet files to BigQuery and Redshift using Lambda functions. Brief description of the project: Babylon Health is a health service provider that offers remote consultations with doctors and healthcare professionals through its mobile application. As a Senior Data Engineer, I was part of the Babylon DAP team that worked on creating an enterprise data warehouse to support data analytics and business intelligence.
- SantanderHadoop Data EngineerNovember 2016 - August 2019 (2 years and 9 months)Industry: Retail Banking Environment: Cloudera Distributed Hadoop(CDH) Key Responsibilities:•CreatingData Ingestion framework for different sources like files and relational databases, including frameworks for DB2 data unload to HDFS, files (fixed/delimited/JSON/XML), and generic frameworks for data ingestion from different relational databases such as DB2, Oracle, SQL Server, PostgreSQL.•Experienceworking with columnar structured data files like .orc & .parquet files.•Writingnew scripts for different data ingestion processes in Python and shell script, ensuring efficient and reliable data ingestion.•Analysingperformance issues and applying performance tuning techniques to optimise data ingestion processes for better efficiency and performance.•Supportingdifferent workflows using Oozie and Control-M, including bug fixing, analysis, and resolution of job failures to ensure smooth data ingestion operations.•UtilisingHive, Impala, and Spark to analyse data in the Hadoop data lake after data ingestion, performing data transformations using PySpark to prepare data for outbound jobs for Machine Learning applications and other analytics engines.•Settingup real-time data ingestion jobs using Flume and Kafka Streaming engine to enable real-time data processing and analysis.•Collaboratingwith cross-functional teams to understand data requirements, data quality, and data integration needs, and ensuring timely and accurate data ingestion for various data processing and analytics tasks.
- Bachelor of EngineeringRajiv Gandhi Prodyogiki Vishwavidyalaya2011Bachelor of Engineering, Electronics & Telecommunication