Projects
⭐Latest
Machine Learning
Classification-Predictive models, Big data
Fraudulent Transaction Detection
Performed Exploratory Data Analysis(EDA) on timeseries data to analyze attributes contributing to Fraudulent transactions
Conducted SMOTE and Under Sampling to handle imbalance in target class. Reduced dimensions using correlation and transformations
Implemented Logistic Regression, Decision Trees, Naïve Bayes models over 92% AUC with Cross Validation and Hyperparameter tuning
Hotel Churn Analysis
Integrated R Shiny app with in-depth research analysis on reservations database with 110,000+ rows, 56 attributes
Applied Association Rules Mining, SVM, and Linear Regression to detect patterns after conducting EDA to understand trends
Provided a report with KPI visualizations and statistical business solutions to increase revenue by 20% and reduce cancellations
Movie-Reviews Sentiment Classification
Implemented NLP based machine learning models using SVM, Random Forests & Naïve Bayes to classify movie reviews
Performed feature extractions and selections using unigrams and sentiment lexicons over 93% model accuracy
Generated report with all comparisons
Recommender System with Customer Sentiment
Created an Alternative Least Squares Recommendation system for products with 1.8 million user ratings and text reviews
Implemented dimensionality reduction Principle Component Analysis(PCA) to reduce modelling error by 15%, increased f-score by 0.5
Performed positive/negative sentiment analysis on textual reviews using Logistic regression and Elastic Net Regularization 85% accuracy
Data Pipelines
Data Engineering and ETL
FileInsights
A cloud-native, event-driven metadata tracking system for file operations on AWS, deployed via CDK
It captures and manages metadata for file uploads and deletions in real-time, providing full traceability, analytics, and operational insight across a file's lifecycle
This system leverages AWS S3, EventBridge, SQS, Lambda, StepFunctions, DynamoDB to process, categorize, and store file metadata with high precision and scalability
Data Warehouse for Business Intelligence
Tracked yearly sales by performing Extract Transformation and Load (ETL) operations using SSIS package
Populated Fact and dimension tables by using non-volatile and time-variant data from two different OLAP data sources in MS SQL server
Developed interactive PowerBI dashboards and provided data-driven reports to increase overall sales and revenue
Soccer Transfers Pipeline
Created an application using complex SQL queries over a generated transfer database in SQL Server with multiple key dependencies
Programmed conditions with triggers and Views to transfer player between teams when current team bids higher than previous bids
Designed Power Apps for frontend UI with conditional modeling and backend connected to MS SQL server
Statistics
Statistical testing, Hypothesis and Inferential
Statistical Analysis for Vaccination Rates
Performed EDA and feature engineering to remove skewness in data. Used Chi-square test to validate the reporting rates of 700 Schools
Conducted Hypothesis testing using Dickey-Fuller-test and Pearson’s correlation on samples of data through ANOVA
Used Logistic and Linear regression to identify significant features contributing to 70% vaccination rates through summary of models