Table of Contents

Hi

Welcome to my online portfolio and blog.

I’m Vincent, a result-oriented data scientist and machine learning engineer with a data-driven mindset and attention to details. Ready to work and willing to learn and meet new challenges in a diverse and fast-paced working environment that appreciates my skills and offers avenues for growth. Competent in executing data mining, data preparation, exploratory data analysis, data story telling, data visualization, feature engineering, machine learning modeling and model deployment. Extensive experience in supervised and unsupervised machine learning algorithms and concepts. Proficiency in Windows and Linux operating systems. An all-round data analytics practitioner greatly passionate about artificial intelligence and python programming with an aim of becoming a world class problem solver.

Here’s a little more about me:

Technologies


My tech stack includes but is not limited to:

  • Python programming
  • SQL
  • Git and github
  • Linux

Skills


My knowledge is built around:

  • Python programming:
    • numpy
    • pandas
    • matplotlib
    • seaborn
  • Machine learning
    • sklearn:
      • Unsupervised learning:
        • KMeans
        • K Nearest Neighbors (KNN)
        • Principle Component Analysis (PCA) etc.
      • Supervised learning:
        • Logistic regression
        • Decision trees
        • Random forests
        • Support vector machines
        • Boosting algorithms etc.
  • Data science
  • Natural Language Processing (NLP)
  • Git and github
  • Data analysis
  • Data visualization
  • Data scraping
  • Data cleaning
  • …(everything data really.)

Certifications


Recent Achievements


  • Packaged machine learning code into a forecasting python package enabling the machine learning team to run forecasts with a few lines of code using a pipeline similar to scikit-learnโ€™s API.

  • Led a pilot project to predict estimated time of arrival for vessels ferrying shipments from Vietnam to Western Africa.

  • Built a stocks price prediction and analysis app for Kenyan stocks. (Link)

  • Built my online portfolio website and blog. (Link)

  • Did a machine learning project about sentiment analysis covering a full project cycle from data acquisition to model deployment. Data scraping from twitter, text modeling using word vector representation with the bag of words and tfidf models and hosted a web application on streamlit cloud.

    • Link to wep application.
    • Link to project code on github.
  • Designed a python package called datastand to help data scientists, machine learning engineers and data analysts better understand data. It gives quick insights about given data; general dataset statistics, size and shape of the dataset, number of unique data types, number of numerical and non-numerical columns, a small overview of the dataset, missing data statistics, missing data heatmap, and provides methodologies to impute missing data.

    • Link to package on PyPI.
    • Link to blog article on how to get started using the package.

Publications/Deployments๐Ÿ“‹


Stocks Watch

A stocks price prediction and analysis app for Kenyan stocks.

Code for this is currently private (might make public once I flag off everything as “refined”) and the web-app is live here.

Re-invest

A collection of investment and trading calculators (compounding).

Code available on github and web-app here.

Sentiment Analysis Web App

This project was aimed at predicting the sentiment associated with tweets during the COVID-19 pandemic. Covered a whole project cycle from data acquisition to model deployment. The whole project cycle details and code on my github The project’s web application is live here.

Portfolio Website and Blog

I built this portfolio website and blog.

Python Package datastand

Datastand is a python package designed to help Data Scientists, Machine Learning Engineers and Data Analysts to better understand data. It gives quick insights about given data; general dataset statistics, size and shape of dataset, number of unique data types, number of numerical and non- numerical columns, small overview of dataset, missing data statistics, missing data heatmap and provides methodology to impute missing data. Package link on pypi I also made a guide to showcase how datastand works and help you get started here.

Technical Articles

Find technical articles written by me on the posts page here.