Daniel Simpson

+44-738-401-5147

Hendon
NW4 1DJ
London, UK

Professional Objectives

Trained as a mathematician in university, I discovered my passion for big data and machine learning algorithms after graduating. After a few years of self-learning I decided to pursue my MSc in Data Science and am now seeking a challenging position fully utilizing my skills in data science, data visualization and machine learning.

Technology Expertise

Expertise using DataBricks, Snowflake, PowerBI, MATLAB and Microsoft Office leveraging Python (PySpark, Scikit-Learn, Keras, TensorFlow, Pandas, Matplotlib, Plotly, Dash, Seaborn, GeoPandas, SciPy and NumPy), SQL and R.

Specializaztion with Deep Learning and various other supervised and unsupervised machine learning techniques in Python and R (decision trees, artificial neural networks, clustering, SVM, random forests, regression analysis, Bayesian networks, and genetic algorithms).

Experience with Google Cloud Platform, Azure, AWS, Spark, C++, JavaScript, HTML, CSS and Java.

Knowledge of Git Control, Scrum, Unix commands and distributed computing.

Education

MSc Data Science - Distinction

Birkbeck, University of London, September 2020
Dissertation was focused on time-series analysis for stock price predictions using LSTM neural networks for modeling and evolutionary algorithms as an optimization technique.

BSc Mathematics

West Virginia University, May 2013

Work Projects

Customer Lifetime Value Models

Loyalty Churn Model

Comments Ingestion NLP Pipeline

Omnichannel Propensity Model

Large Language Model prototype

Snowflake Cortex LLM Dashboards

EComm Customer Segmentation

Retail Store Clustering Analysis

Brand Health Customer Survey Analysis

Email NLP Analysis

Attribution Modelling

Customer Acquistion Modelling

MLDevOps Dashboard

Customer Survival Analysis/Cox Hazard Modelling

Snowpark POC

Personal Projects

Covid tracking dash app

Working with my old advisor, I designed a web app to track covid-19 within the United States on a county level. The Johns Hopkins repository was used for collecting the covid-19 data and was used in the calculations of the R-rate estimate of the virus. Pandas was used for collecting and cleaning the data, R-rate calculations were generated using the EpiEstim library within R, the dashboard was built using the dash and plotly libraries, and the website was hosted using Google Cloud Platform.

Anomaly Detection with Machine Learning

This project was developed with the aim of testing various unsupervised and supervised machine learning techniques for anamoly detection. The first part of the project was focused on using unsupervised techniques on the NAB (Numenta Anomaly Benchmark) data sets. The second part of the project was focused on using the unsupervised techniques on a fraud data set to compare various metrics (F1, Precision, etc.). The goal was to ensure the Fraud department could capture all cases of fraud without becoming overwhelmed. The last part of this project was then focused on refining that technique as more labeled data was accumulated using supervised techniques.

K-means Clustering for Business Investments

I implemented the k-means clustering algorithm on demographic and income data, identifying similar boroughs within London. Once the clusters had been established each borough was compared to the cluster it fell within, identifying potential business venue opportunities. Demographic data was web scraped using BeautifulSoup and pandas, Scikit-learn was utilized for k-means, venue data was collected using the FourSquare API, and visuals were generated using matplotlib and folium libraries.

Facial recognition and mask detection

This project takes in an image, identifies individual faces in the image, and feeds a cropped image of just the face into a CNN to identify whether the individual is wearing a mask or not. CV2 is utilized for facial recognition, while Keras and ImageNet are used to build the CNN for mask identification.

UFC Fight Predictions

Using a data set from 1993-2019 of over 5000 fights, multiple machine learning models are created and evaluated. In the final step the models are used to predict the 5 main fights in UFC 259. Numpy, Pandas, SKLearn, XGBoost, and Keras libraries were all used in this project

CNNs for face mask detection

I designed and trained multiple convolutional neural networks to be able to identify whether individuals in an image were wearing a face mask or not. Transfer learning was applied to the second CNN to demonstrate the usefulness of utilizing pre-trained models. Keras was used to build the models and matplotlib was used to visual the results.

Visualizing wages in the United States

I built interactive visuals using data taken from the federal reserve bank of New York. Effects of the dot com boom, 2008 financial crisis, and covid-19 pandemic are all visualized and analyzed for their effect on the job market. Pandas and plotly were used for data analysis and visualization.

Multiple models on Titanic Dataset

A Kaggle Notebook using Python, I apply multiple machine learning algorithms to model surival of passengers on the Titanic Dataset. Gaussian Naive Bayes, Logistic Regression, Decision Tree, KNN, Random Forest, Support Vector Classifier, XGBoost, and Neural Networks are all applied. Sci-kit learn and keras are the libraries that I used for this project. k-fold cross Validation is used to access each models performance.

Analysis on San Fransisco Housing

Using the plotly library, visuals are created to understand the rising cost of housing in the bay area. Data was collected from the Federal Reserve Bank of St. Louis. The house price index, workers index, rent index, income index, and consumer price index were all utilized. The final part of the notebook uses Facebook's Prophet to make a prediction on the cost of housing in the coming years.

Visualizing Covid-19 with Choropleth Maps

A Kaggle Notebook Using Plotly choropleth maps to visualize diagnosed Covid-19 cases and related deaths. Geojson maps of the US by state and countries by continent and globally are looked at over time since the start of the pandemic.

Linear Regression using Automobile dataset in R

Using R studio to create a R markdown file, I explore the correlation between various variables in the dataset. Exploring the various variables and predictors for MPG, I design plots with linear regression models, along with confidence and prediction intervals.

Traveling Salesman

Solving the traveling salesman problem using a random search algorithm to find the shortest path available. Tkinter is used for map visualization, identifying the location and order by which each city should be visited.

H-Clustering with USArrests Dataset in R

Using the USArrests dataset in R, I use Heirarchical Clustering and plot the results using a dendogram visaulization. The file is written as an R markdown file, and utilizes the datasets and cluster libraries.

US Police Killings Data Visualization

In 2020 the killing of an unarmed African American man named George Floyd by Minneapolis police officers led to protests all over the United States and the world. A look into the statistics of police killings in the United States in comparison to population demographics. Visualization is done using Matplotlib's barchart and pie chart functionalities.

ESports Earnings Visualization

Tracking the total number of tournaments, participants, and earnings for all ESports events. Visualization on genres and individual games is displayed using pandas and matplotlib.

Employment History

TJX Europe, London, UK: Jan 2023 to Present

Data Scientist, working on the customer insights team to analyze and build models focused on understanding the customer base. Working within the Marketing department and communicating with a wide range of teams to implement predictive models for better business decision making. Leveraging various tools like Snowflake, Databricks and PowerBI for creating and monitoring large data pipelines and machine learning models. Working with outside vendors to test their cutting edge technologies that might bring in business value for the company. Creating numerous proof of concept projects for research and development within the company. Mentoring various members of the team to upskill and develop across a wide range of skills and technologies.

Decoded, London, England: May 2021 to Dec 2023

Data Tutor: June 2022 to Dec 2023, working in the Product team to develop modules for the level 3 and level 4 data apprenticeship program, along with working on commercial content for the advanced academy. Delivering high quality data-centric workshops to various clients from multiple industries. Running a learner help desk to address all technical questions that clients might have while developing internal data projects for business insights.

Senior Data Mentor: Jan 2022 to June 2022, working with the Content and Technology teams on projects related to automation to improve processes within the business and the development of material for the product team. Mentoring learners across multiple industries while they complete the level 4 Data Analyst Apprenticeship program over an 18 month period. Guiding learners to develop robust and impactful data science projects for their organisations.

Data Mentor: May 2021 to Jan 2022, overseeing the learning progression of multiple learners from various organisations. Guiding learners through the level 4 Data Analyst Apprenticeship program over 18 months. Critiquing and providing feedback on data driven projects across multiple industries. Providing help on various data tools such as Anaconda, Jupyter Notebook, Python and SQL. Helping learners understand various data science techniques such as Clustering, Regression, Neural Networks, Time series analysis, Classification, and Text analysis.

Bryant High School, Alexandria, VA: Aug 2016 to Aug 2019

Mathematics Teacher, preparing and managing lessons for all levels of secondary mathematics and statistics. Directing an instructional assistant to help manage the students, along with designing interactive and engaging projects for my students to develop applicable skills for their future. Collecting, cleaning, and presenting data in the form of academic achievements to present to my principal directly. Nominated for Outstanding New Teacher Award 2018.

ARP, Alexandria, VA: May 2016 to Aug 2019

Bartender and Server, creating a relaxing and comfortable environment for customers, while allowing relevant information sharing between management and kitchen staff. Offering a detailed analysis of food, drinks, ingredients, specials, and possible pairings. Developing personal and professional relationships with patrons and staff alike.

The Learning Network, Surat Thani, Thailand: April 2015 to April 2016

Math and Science Teacher, developing and executing lesson plans for large class sizes for students who learned English as a second language. Curating yearlong projects with students and designing extracurricular activities for all age groups.

Virtue Feed & Grain, Alexandria, VA: June 2014 to April 2015

Waiter, Providing patrons with daily specials and keeping them informed on ingredients and menu items. Keeping a constant line of communication going between bar staff, management, kitchen staff and customers.

Bryant High School, Alexandria, VA: Aug 2013 to Aug 2014

Instructional Assistant, Assisting in the mathematics and biology classrooms with lectures, personal tutoring, and generally answering questions for students.

King Street Blues, Alexandria, VA: May 2011 to June 2014

Bartender and Server, Mixed and served alcoholic and nonalcoholic drinks to patrons of bar, following standard recipes: Served wine and draught or bottled beer, utilizing interpersonal communication and public relations skills.

West Virginia University, Morgantown, WV: May 2010 to May 2013

Research Assistant, working in a biological modeling lab focused on flowcytometry research. Establishing pipetting skills and expanding analysis intelligence on large sums of data using Excel and MATLAB.

Center for Multi-disciplinary Studies, Durgapur, India: March 2009 to April 2009

Research Assistant, Industrial-agricultural research project designed to discover the advantages of certain species of rice in this geographic region. Assisted with data collection and crop yield analyses.

Hilltribe Holidays Thailand, Chiang Mai, Thailand: Feb 2009 to March 2009