Technologies Used
Python notebooks across analytics and ML; typical stack includes pandas, numpy, seaborn/matplotlib, scikit-learn, TensorFlow/Keras (for translation), BeautifulSoup (for scraping).
Notebook Breakdown
- Student Performance Indicator — EDA through model selection; walks the ML lifecycle from problem framing to training and choosing the best model.
- Effect of Government Social Programs on Poverty in Kenya — Descriptive analytics and correlations to understand program impact.
- Effect of Petroleum Prices on Demand in Kenya — Correlation and descriptive analysis of price changes vs demand.
- Effect of Taxation on SME Performance — Frequency analysis plus descriptive analytics on taxation effects.
- Fine-Tuning English-Swahili Translation Model — Deep-learning fine-tuning using TensorFlow/Keras and CIFAR-10 style preprocessing.
- Lyrics Finder — Scrapes Genius.com to gather song lyrics (URL collection + HTML parsing).
- English-Kiswahili Translation Notebook — GPU-backed fine-tuning for translation tasks.
- PandemAI — Data cleaning and formatting pipeline preparation.
- Supervised Learning with SVM — Implements and evaluates SVM models.
- Supervised Learning with Random Forests — Attribute selection, training, and evaluation with confusion matrices.
- Customer Churn Prediction — Feature engineering plus multiple classifiers (RF, AdaBoost, SVC, XGBoost) compared via accuracy and reports.
- Causal Inference with Bayesian Networks — Bayesian causal analysis (as listed in the contents).
Getting Started
Each notebook is runnable in Google Colab via the provided links in the repository; prerequisites include Python 3.x plus common data/ML libraries (TensorFlow, Keras, Matplotlib, Seaborn, NumPy, scikit-learn, BeautifulSoup, pandas, etc.).