Projects

⭐Latest

Machine Learning

Classification-Predictive models, Big data

Fraudulent Transaction Detection
  • Performed Exploratory Data Analysis(EDA) on timeseries data to analyze attributes contributing to Fraudulent transactions

  • Conducted SMOTE and Under Sampling to handle imbalance in target class. Reduced dimensions using correlation and transformations

  • Implemented Logistic Regression, Decision Trees, Naïve Bayes models over 92% AUC with Cross Validation and Hyperparameter tuning

Hotel Churn Analysis
  • Integrated R Shiny app with in-depth research analysis on reservations database with 110,000+ rows, 56 attributes

  • Applied Association Rules Mining, SVM, and Linear Regression to detect patterns after conducting EDA to understand trends

  • Provided a report with KPI visualizations and statistical business solutions to increase revenue by 20% and reduce cancellations

Movie-Reviews Sentiment Classification
  • Implemented NLP based machine learning models using SVM, Random Forests & Naïve Bayes to classify movie reviews

  • Performed feature extractions and selections using unigrams and sentiment lexicons over 93% model accuracy

  • Generated report with all comparisons

Recommender System with Customer Sentiment
  • Created an Alternative Least Squares Recommendation system for products with 1.8 million user ratings and text reviews

  • Implemented dimensionality reduction Principle Component Analysis(PCA) to reduce modelling error by 15%, increased f-score by 0.5

  • Performed positive/negative sentiment analysis on textual reviews using Logistic regression and Elastic Net Regularization 85% accuracy

Data Pipelines

Data Engineering and ETL

FileInsights
  • A cloud-native, event-driven metadata tracking system for file operations on AWS, deployed via CDK

  • It captures and manages metadata for file uploads and deletions in real-time, providing full traceability, analytics, and operational insight across a file's lifecycle

  • This system leverages AWS S3, EventBridge, SQS, Lambda, StepFunctions, DynamoDB to process, categorize, and store file metadata with high precision and scalability

Data Warehouse for Business Intelligence
  • Tracked yearly sales by performing Extract Transformation and Load (ETL) operations using SSIS package

  • Populated Fact and dimension tables by using non-volatile and time-variant data from two different OLAP data sources in MS SQL server

  • Developed interactive PowerBI dashboards and provided data-driven reports to increase overall sales and revenue

Soccer Transfers Pipeline
  • Created an application using complex SQL queries over a generated transfer database in SQL Server with multiple key dependencies

  • Programmed conditions with triggers and Views to transfer player between teams when current team bids higher than previous bids

  • Designed Power Apps for frontend UI with conditional modeling and backend connected to MS SQL server

Statistics

Statistical testing, Hypothesis and Inferential

Statistical Analysis for Vaccination Rates
  • Performed EDA and feature engineering to remove skewness in data. Used Chi-square test to validate the reporting rates of 700 Schools

  • Conducted Hypothesis testing using Dickey-Fuller-test and Pearson’s correlation on samples of data through ANOVA

  • Used Logistic and Linear regression to identify significant features contributing to 70% vaccination rates through summary of models