London Climate Prediction - Machine learning models for predicting London's average temperature based on historical weather data.
This project implements various regression models to predict London's mean temperature using historical weather data. The solution leverages scikit-learn for machine learning and MLflow for experiment tracking, model management, and deployment. The system evaluates multiple algorithms (Linear Regression, Decision Trees, Random Forests) with different hyperparameters to determine the most accurate temperature prediction model.
- Data Processing Pipeline: Handles missing values, feature scaling, and data type optimization
- Multiple Model Comparison: Evaluates Linear Regression, Decision Tree, and Random Forest models
- Experiment Tracking: Uses MLflow to log parameters, metrics, and models
- Feature Engineering: Extracts temporal features (month, year) from date data
- Performance Metrics: Calculates RMSE for model evaluation
- Reproducible Experiments: Tracks all experiment parameters and data transformations
- Model Signatures: Defines explicit input/output schemas for deployment
Before running this project, ensure you have the following installed:
- Python 3.8+
- pip package manager
Core dependencies:
- pandas (>=2.2.3)
- numpy (>=1.26.4)
- scikit-learn (>=1.4.1)
- mlflow (>=2.20.1)
- matplotlib (>=3.10.0)
- seaborn (>=0.13.2)
# Clone the repository
git clone https://github.com/CarlosYazid/London-Climate-Prediction.git
cd London-Climate-Prediction
# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install requirements
pip install -r requirements.txt# Create conda environment
conda create -n london-climate python=3.10
conda activate london-climate
# Install core packages
conda install -c conda-forge pandas numpy scikit-learn mlflow matplotlib seaborn- Ensure you have the data file
london_weather.csvin the project directory - Start MLflow tracking server:
mlflow ui- Run the main notebook/script:
jupyter notebookThen open and run the notebook cells sequentially.
- Data loading and preprocessing:
weather = pd.read_csv("london_weather.csv", parse_dates=[0], date_format="%Y%m%d")
weather['month'] = weather['date'].dt.month
weather['year'] = weather['date'].dt.year- Model training and evaluation:
with mlflow.start_run(run_name=run_name):
model = RandomForestRegressor(max_depth=depth).fit(X_train, y_train)
predictions = model.predict(X_test)
rmse = mean_squared_error(y_test, predictions, squared=False)
mlflow.log_metric("rmse", rmse)# After data preparation
with mlflow.start_run():
model = DecisionTreeRegressor(max_depth=5)
model.fit(X_train, y_train)
mlflow.sklearn.log_model(model, "model")loaded_model = mlflow.sklearn.load_model("runs:/<RUN_ID>/model")
predictions = loaded_model.predict(new_data)London-Climate-Prediction/
├── .gitignore - Specifies intentionally untracked files
├── london_weather.csv - Primary dataset (not included in repo)
├── requirements.txt - Full list of Python dependencies
├── tower_bridge.jpeg - Sample image for documentation
└── notebook.ipynb - Main Jupyter notebook with all code
change_datatype(X_train, X_test, y_train, y_test)
Converts data types for memory optimization
- All scikit-learn regression model interfaces are supported
- MLflow tracking automatically captures:
- Parameters (
mlflow.log_param()) - Metrics (
mlflow.log_metric()) - Models (
mlflow.sklearn.log_model())
- Parameters (
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a new branch (
git checkout -b feature-branch) - Commit your changes (
git commit -am 'Add new feature') - Push to the branch (
git push origin feature-branch) - Create a Pull Request
- Follow PEP 8 style guide
- Include tests for new features
- Update documentation accordingly
- Use descriptive commit messages
Issue: Missing data file
Solution: Ensure london_weather.csv is in the project root
Issue: Package version conflicts
Solution: Create a fresh virtual environment and install exact versions from requirements.txt
Issue: MLflow server not starting
Solution: Check if port 5000 is available or specify another port:
mlflow ui --port 5001- Initial release
- Implemented Linear Regression, Decision Tree, and Random Forest models
- Added MLflow experiment tracking
- Completed data preprocessing pipeline
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or support, please contact:
Project Maintainer: Carlos Yazid
Email: [email protected]
GitHub Issues: Issues