Muhammad Sohaib

Data Engineer | Data Scientist
Rawalpindi, PK.

About

Driven Data Science graduate with a robust foundation in Data Engineering and Machine Learning, eager to contribute to innovative data engineering teams. Proven ability to design and implement end-to-end ETL/ELT pipelines, data warehousing solutions, and predictive models, leveraging expertise in Python, SQL, Apache Spark, and Databricks. Committed to optimizing data workflows and delivering actionable insights, with a long-term interest in advancing towards data architecture roles.

Education

COMSATS University
Islamabad, Islamabad Capital Territory, Pakistan

Bachelor of Science

Data Science

Courses

Data Science Fundamentals

Platform/Architecture for DS

Big Data Analytics

Statistical Methods in DS

Database Management System

Data Storytelling & Visual.

Data Mining

Data Structures & Algorithm

Work

Fiverr
|

Freelance Data Engineer/Data Scientist

Remote, N/A, N/A

Summary

Led end-to-end Data Science and Machine Learning projects for diverse clients, delivering tailored data pipelines and predictive models that solved complex business challenges and drove data-driven decision-making.

Highlights

Designed and developed comprehensive Data Science and ML pipelines for clients, encompassing data preprocessing, feature engineering, model selection, and hyperparameter tuning.

Built robust predictive models for classification, regression, and forecasting using scikit-learn and advanced statistical techniques, driving data-driven decision-making for clients.

Created automated Exploratory Data Analysis (EDA) reports and interactive dashboards, providing clear, actionable insights for non-technical stakeholders.

Performed data storytelling through dynamic visualizations, KPI design, correlation analysis, and hypothesis testing to extract critical business insights for clients.

Tenx
|

Data Science Intern

Islamabad, Islamabad Capital Territory, Pakistan

Summary

Engaged in end-to-end data science projects, focusing on data processing, machine learning model development, and collaborative data analysis to drive real-world decision-making and generate actionable recommendations.

Highlights

Executed end-to-end data science projects, including data wrangling, feature engineering, exploratory data analysis, and Machine Learning model development.

Designed and built automated data pipelines for ingesting, transforming, and loading data from multiple sources into centralized data warehouses, enhancing data accessibility.

Developed Data Science Web Applications enabling in-depth Statistical Analytics, A/B Testing, Forecasting, and Machine Learning functionalities.

Collaborated closely with the team and line manager to analyze complex datasets and generate data-driven recommendations for critical real-world decision-making.

Systems Limited
|

Data Engineering Intern

Islamabad, Islamabad Capital Territory, Pakistan

Summary

Designed and implemented robust data warehousing solutions and optimized data processing workflows, gaining practical experience with large-scale data technologies to enhance data retrieval efficiency.

Highlights

Designed and implemented data warehousing solutions using SQL and PostgreSQL, focusing on schema design, indexing, and performance optimization to enhance data retrieval efficiency.

Applied expertise in Python, SQL, Apache Spark, Databricks, and Hadoop to process large-scale data, understand distributed computing, and orchestrate complex data pipelines.

Managed structured and semi-structured data, performing transformations, normalization, and data quality checks to ensure data integrity and readiness for analytics.

Projects

BizNexusAI: No-Code Automated BI Tool

Summary

Addressed the challenge of businesses struggling with data confidentiality and outsourcing, developing a solution for automated, domain-guided, end-to-end data pipelines for analysis and decision-making.

DataLumea: Data/AI Analytics platform

Summary

Created a solution for non-technical users facing barriers in performing full-cycle data analysis and visualization without programming knowledge.

Document Processing Pipeline

Summary

Designed an intelligent document processing pipeline to automatically extract structured attributes from unstructured German insurance documents.

DeepFlowSense: No-code Distributed big data processing tool

Summary

Tackled the problem of traditional data analytics tools lacking scalability for large datasets and real-time interactivity for ML workflows.

ETL, Data Warehousing and BI Reporting

Summary

Architected and implemented a data warehouse solution to support comprehensive analytics on customer loyalty and sales data.

Bike Buyers Prediction using Data Mining Techniques

Summary

Analyzed consumer data to identify key demographic and behavioral factors influencing bike purchases and derive actionable marketing strategies.

Certificates

Data Engineering Professional Certificate

Issued By

IBM (Coursera)

Skills

Programming

Python, SQL, R, Java, Scala, C++.

Data Engineering

ETL/ELT Pipelines, Data Modeling (Star/Snowflake Schema), Data Warehousing, Data Quality & Validation, Schema Design.

Big Data & Distributed Systems

Apache Spark (PySpark), Hadoop, Apache Kafka, Databricks.

Databases

MySQL, PostgreSQL, SQL Server, MongoDB, Oracle SQL Developer.

Data Integration & Orchestration

SSIS, Apache Prefect, Apache Airflow.

Data Visualization

Power BI, Tableau, Grafana, seaborn, Matplotlib, Plotly.

Additional Skills

BI Reporting, Data Analytics, Forecasting, Machine Learning.