This repository implements a music genre classification system using deep learning (Convolutional Neural Networks) on audio spectrograms. It explores multiple model versions with enhancements in architecture, preprocessing, and evaluation.
├── Colab_version.ipynb # Final/Colab-optimized version
├── improvedmusicgenre.ipynb # Enhanced version with augmentations and callbacks
├── musicgenre.ipynb # Base version with basic CNN pipeline
└── README.md # Project documentation
The goal is to classify audio samples into one of 10 music genres (Blues, Classical, Country, Disco, HipHop, Jazz, Metal, Pop, Reggae, Rock) using mel spectrograms as input features for CNN-based models.
Make sure to install the following libraries:
pip install numpy pandas matplotlib seaborn librosa scikit-learn tensorflowDataset used: GTZAN Genre Collection
- Format:
.wavfiles, 22050 Hz, 30 seconds each. - Location: Place the dataset inside
C:/Datasets/Data/genres_original/or change the path in the notebooks accordingly.
All audio files are converted to Mel Spectrograms using librosa:
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
S_DB = librosa.power_to_db(S, ref=np.max)- Shape standardized to
(128, 660)for CNN input. - Skips any problematic or corrupted files with error handling.
- Simple 3-layer CNN
- Uses mel spectrograms directly
- Accuracy ~75%
Conv2D → MaxPool → Dropout → Flatten → Dense- 4 Convolutional blocks with BatchNorm and Dropout
- Data augmentation using noise and pitch shifts
- Regularization (L2) on dense layers
- Callbacks: EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
Conv2D → BN → Conv2D → MaxPool → Dropout (×4)
→ Flatten → Dense (L2) → Dropout → Output- Final cleaned and modular version
- Wrapped in functions like
load_dataset_enhanced(),train_model_enhanced(), etc. - Training history and confusion matrix visualizations
- Extract Features and Load Data:
X, y, label_encoder = load_dataset_enhanced("C:/Datasets/Data/genres_original")- Train/Test Split:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)- Train Model:
model, history = train_model_enhanced(X_train, y_train, X_test, y_test, label_encoder)- Evaluate Model:
evaluate_model(model, X_test, y_test, label_encoder, history)- Accuracy: Up to 92% with the enhanced model.
- Precision and Validation Accuracy monitored during training.
- Confusion matrix and learning curves plotted using
matplotlibandseaborn.
- Training vs Validation Accuracy
- Training vs Validation Loss
- Confusion Matrix
- Classification Report
- 🧹 Dataset cleansing and error handling
- 📈 Stratified train-test splitting
- 📊 Visualizations and evaluation metrics
- 📦 Model checkpoints and adaptive learning
- 🔄 Augmented dataset for better generalization
- Add real-time inference with microphone input
- Use pretrained models (e.g., VGGish or YAMNet)
- Implement genre probability ranking
- Deploy using Streamlit or Flask for interactive web app
- GTZAN Genre Dataset
- LibROSA
- TensorFlow, Keras, NumPy, Seaborn, Matplotlib