Work Projects

  • Customer Lifetime Value Models
  • Loyalty Churn Model
  • Comments Ingestion NLP Pipeline
  • Omnichannel Propensity Model
  • Large Language Model prototype
  • EComm Customer Segmentation
  • Retail Store Clustering Analysis
  • Brand Health Customer Survey Analysis
  • Email NLP Analysis
  • Attribution Model
  • Customer Acquistion Model
  • MLDevOps Dashboard
  • Customer Survival Analysis
  • Cox Proportional Hazard Model
  • Snowpark CLTV POC

Covid-19 Dashboard

Utilizing the Johns Hopkins Covid-19 data repository, I designed a dashboard to track cases and deaths in the United States on a national and local level. Data was sourced and cleaned using the Pandas library in Python, while R-rate was calculated using the EpiEstim library in R studio. The webpage was built with Dash, graphics were built with Plotly, and the web app was deployed and is hosted using Google Cloud Platform. This website is no longer active but the repository and screenshots of the dashboard can still be viewed.

Regression Inference for Life Expectancy

Using the ordinary least squared model in the statistics library, feature selection is done on the world data set to identify key features that are impacting the life expectancy. p-values for all feature's coefficients are observed to identify what is a good predictor for life expectancy.

Facial recognition and mask detection

This project takes in an image, identifies individual faces in the image, and feeds a cropped image of just the face into a CNN to identify whether the individual is wearing a mask or not. CV2 is utilized for facial recognition, while Keras and ImageNet are used to build the CNN for mask identification.

UK Used Car Dashboard

This dashboard was developed using Plotly/Dash and deployed using Saleforce's heroku free cloud services. The data was collected from over 5,000 autotrader.co.uk web results using the BeautifulSoup and cleaned using Pandas. The data set has over 44,000 individual used cars that are currently for sale in the UK.

Anomaly Detection using Machine Learning

This project was designed to explore the applications of machine learning for anomaly detection, using unsupervised and supervised techniques. For the unsupervised part of the project, 4 data sets from Numenta Anomaly Benchmark are used, applying k-means, Gaussian distribution, isolation forests, SVM, and LSTM neural networks. For the supervised portion of this project a credit card dataset is used and multiple models are built and tested. The best performing model was then tuned using a grid search algorithm.

UFC Fight Predictions

Using a data set from 1993-2019 of over 5000 fights, multiple machine learning models are created and evaluated. In the final step the models are used to predict the 5 main fights in UFC 259. Numpy, Pandas, SKLearn, XGBoost, and Keras libraries were all used in this project

Clustering for Business Investments in London

Used the k-means clustering algorithm on demographic and income data, identifying similar boroughs within London. Once the clusters had been established each borough was compared to the cluster it fell within, identifying potential business venue opportunities. Demographic data was web scraped using BeautifulSoup and pandas, Scikit-learn was utilized for k-means, venue data was collected using the FourSquare API, and visuals were generated using matplotlib and folium libraries.

Visualizing Wages in the US

Utilizing the plotly library in python, interactive visuals are created using data collected from the Federal Reserve Bank of New York. College majors are looked at specifically for annual wages along with unemployment and underemployment. The dot com boom, the 2008 financial crisis, and the 2020 Covid-19 pandemic are all looked at for their effects on the job market as a whole.

San Fransisco Housing

Using the plotly library, visuals are created to understand the rising cost of housing in the bay area. Data was collected from the Federal Reserve Bank of St. Louis. The house price index, workers index, rent index, income index, and consumer price index were all utilized. The final part of the notebook uses Facebook's Prophet to make a prediction on the cost of housing in the coming years.

CNN and Transfer Learning

Implementation of two Convolutional Neural Networks for face mask detection using Keras. A Kaggle image dataset was used for training. Transfer learning is used in the second part to demostrate the usefulness of pre-trained models. InceptionResNetV2 pre-trained model is used for transfer learning and Matplotlib is used for visualization of training over time.

Visualizing Covid-19 with Choropleth Maps

A Kaggle Notebook Using Plotly choropleth maps to visualize diagnosed Covid-19 cases and related deaths. Geojson maps of the US by state and countries by continent and globally are looked at over time since the start of the pandemic.

Multiple models on Titanic Dataset

A Kaggle Notebook using Python, I apply multiple machine learning algorithms to model surival of passengers on the Titanic Dataset. Gaussian Naive Bayes, Logistic Regression, Decision Tree, KNN, Random Forest, Support Vector Classifier, XGBoost, and Neural Networks are all applied. Sci-kit learn and keras are the libraries that I used for this project. k-fold cross Validation is used to access each models performance.

Covid-19 Bar Chart Race

A bar chart race of the states with the highest diagnosed cases of Covid-19 in the United States. Data goes from end of January to September and uses Pandas for data storage/manipulation along with Matplotlib for data visualization.

Traveling Salesman

Solving the traveling salesman problem using a random search algorithm to find the shortest path available. Tkinter is used for map visualization, identifying the location and order by which each city should be visited.

Contact Me

I am eager to help on any Data Science or Data Visualization projects I can. Please feel free to contact me via email or by phone. I am currently living in London, England but am willing to help no matter where you stay. Please search for me on Linkedin, GitHub, or even Kaggle and upvote my notebooks if you find them helpful or useful. Thank you!

Phone

+44 (738) 401-5147

Address

London
United Kingdom