Dixitha Kasturi

Showcasing my journey in Data Science, AI, and Machine Learning.

Data Science Analyst II

🎓 MS Applied Data Science, Syracuse University, NY

🎓BE Information Technology, CBIT, India

Quench your Curiosity

Who is behind this page?

Hello 👋🏽, I’m Dixitha — welcome to my corner of the internet.
As a data enthusiast, I see the world in numbers, patterns, and probabilities. For me, data isn’t just rows and columns; it’s a living language - one that can narrate hidden stories, forecast the future, and guide meaningful decisions. I thrive on turning complex datasets into clear, actionable insights, and finding the signal in the noise. For me, data is not just a profession - it’s a craft. Whether it’s building models, streamlining data pipelines, or uncovering insights that change the course of decisions, I’m here to ensure every dataset has its story told, and told well.

What do I do?

By day, I’m a Senior Data Analyst (Data Scientist I) at PrivacySolutions by Datavant. My work revolves around healthcare data - vast, varied, and full of potential. I’m a pattern-spotter and risk miner, navigating everything from data cleaning and quality control to statistical risk assessments, tool creation, and model building. Over the past three years, I’ve worked with fast-paced, cross-functional teams (including collaborations with Snowflake) and high-demand clients. Adaptability is my forte - whether it’s learning on the fly, tackling unfamiliar tools, or becoming the “jack of all trades, master of data” my colleagues rely on.

Am I on the Engineering Spectrum?

I’m an engineer at heart. I build tools with and for data. I have a Bachelors in Information Technology and a Master's in Data Science. In 2022, I began my post-graduation career as a software developer with data science skills at Amazon, where I contributed to Alexa AI - enabling smarter, data-driven decisions and worked on Amazon’s Choice recommendation system, focusing on extensive data analysis and marketplace expansion. That experience gave me a solid foundation in marrying engineering precision with analytical creativity - a combination I continue to refine every day.

Technical Skills

Programming Languages: Python, R, PySpark ,SQL

Databases & Cloud: AWS (EMR, IAM, Lambda, VPC, S3, Sagemaker, DynamoDB, Glue, EC2, Athena, Quicksight, Redshift), Snowflake

Machine Learning: Regression, Classification, Clustering, Ensemble (XGBoost, Boosting, Bagging), (CNN,LSTM)

Statistics : Hypothesis Testing, Bayesian Inferencing, Statistical Modeling

Packages: boto3, s3fs, NLTK, NumPy, Pandas, Scikit-learn, Keras, Tensorflow, Matplotlib, GGPlot2, Seaborn

Visualizations & Others: Tableau, PowerBI, Linux, Jupyter, GIT, Microsoft Excel

📋Research Paper : “Object Detector for the visually impaired with distance calculation for humans” (IJEAT), April 2020, DOI_link

Professional Experience

150+

Master's in Data Science

Data Analyst

Data Analyst II (Privacy)

Datavant, Seattle WA

I specialize in transforming complex, high-volume healthcare data into actionable insights, with a focus on data privacy, quality, and optimization. I’ve led cross-company collaborations, architected Snowflake performance optimizations, and developed automated analytics tools that cut risk analysis turnaround by 60%. My work spans structured and unstructured data - from HIPAA-compliant redaction and obfuscation using transformer models to representative sampling for a $2M unstructured data product, leveraging sentence ranking and quantile discretization across modalities. I create statistical dashboards, design sampling strategies, and implement advanced anomaly detection to enhance data integrity. With a strong foundation in Python, PySpark, SQL, AWS, and Snowflake, I bring technical precision and innovative thinking to every stage of the data lifecycle -from curation and preprocessing to risk assessment and client-facing delivery.

As a Software Development Engineer with data science expertise at Amazon, I designed and implemented high-performance solutions for Alexa AI and Amazon’s Choice recommendation systems. I engineered cost-optimized AWS architectures, reducing long-running Elastic MapReduce (EMR) cluster expenses by 60% through AWS CDK-driven scheduling and automation. I optimized a REST API Gateway with OAuth and AWS Lambda, reducing Alexa’s NLU model recommendation platform runtime by 90%. Leveraging Java-based metric filtering, I increased Amazon’s Choice product recommendation relevance by 20%, directly influencing purchasing decisions. My A/B testing analysis of multi-million-record traffic datasets projected a 35.33% boost in customer–product engagement. In collaboration with Applied Scientists, I streamlined NLP experimentation by curating large-scale datasets into S3 and DynamoDB, enabling rapid prototyping and innovation in NLU(AGI)

Software Development Engineer

Amazon, Seattle WA

Jun 22 - Jan 23

Data Science Research Intern

United Nations, office of RCO, NY

Dec 21 - May 22

Collaborated with QCRI and the UN Kenya team on a high-impact social media analytics initiative aimed at preventing violence during the 2022 Kenya elections. Designed and deployed a real-time ETL pipeline on AWS to process over 1,000 tweets per hour with a 99% accuracy rate, leveraging automated Python scripts for scalability and reliability. Curated twitter data into encrypted S3 buckets and integrated AWS Lambda for multi-label sentiment classification ahead of the elections. Optimized SQL operations in AWS Athena by 40% to accelerate exploratory data analysis and data curation workflows. Built an NLP feature extraction pipeline supporting SVM, Random Forest, and Naïve Bayes models, achieving 93% classification accuracy. Applied advanced clustering techniques to identify violence-triggering events and developed Tableau dashboards for actionable intelligence, enabling early detection and mitigation of polarization and potential unrest.

Data Science Research Assistant

Syracuse University, Syracuse, NY

Mentored over 100 graduate students in R, Python, and data-driven problem-solving, guiding them in applying analytics to real-world business challenges. Designed and integrated an R Shiny application for deep-dive analysis of a hotel and resort reservations dataset containing 110,000+ rows and 56 attributes. Conducted exploratory data analysis (EDA) and applied Association Rule Mining, Support Vector Machines (SVM), and Linear Regression to uncover booking trends and cancellation patterns. Delivered a stakeholder-facing business report with statistical strategies projected to reduce cancellations by 20% and improve customer retention, translating analytical insights into tangible business impact.

Aug 21 - Dec 21

Mar 23 - Present

Hands on Deck

Explore my work in Data Science and Machine Learning

Data Pipelines and Cloud

Projects focused on Data Engineering

Machine Learning

Projects focused on predictive modeling and algorithms.

Publications on ML and AI

AI

Deployments focused on LLMs and AI agents

Statistics

Hypothesis tests, Bayesian and Inferential

Research Paper