Song Recommendation App using Streamlit and k-Nearest Neighbors

nganhahv99
Nov 15, 2023
5 min read

Updated: Nov 16, 2023

Motivation

Welcome to the future of personalized music discovery! I'm thrilled to introduce my latest project: a song recommendation app that leverages the power of K-nearest neighbors algorithm and the vast Spotify music dataset.

The app allows users to customize song features they prefer, putting the power of choice back in your hands. Whether you're in the mood for energetic beats, soulful melodies, or anything in between, the recommendation engine tailors its suggestions to match your mood and taste.

In this article, you will find detailed steps to build the recommendation engine powered by Scikit-learn's K-Nearest Neighbors machine learning model and deployed using Streamlit.

Final app | Github repo

Here is a snapshot of the app!

You can check out the demo below!

Project Overview

Preprocessed Spotify Dataset to extract songs' audio features, genre information, and release year
Optimized K-Nearest Neighbors model to get the top songs based on a set of feature inputs selected by users
Designed app layout and UI integration using Streamlit, allowing users to customize song features

Resources and Tools

Language: Python

Packages: pandas, scikit-learn, streamlit, numpy

Table of Contents:

I. Data Preprocessing

II. App Building

a. KNN Algorithm

b. App Layout Integration

III. Deployment

I. Data Preprocessing

As mentioned in overview, we will use the Spotify and Genius Track Dataset from Kaggle. This dataset contains information on thousands of albums, artists, and songs that are collected from the Spotify platform using its API.

The dataset contains three csv files: spotify_artists.csv, spotify_album.csv, and spotify_tracks.csv. We want to create a joint dataset that consists of genre information, release year, and audio features for each song. These features will be inputs for our recommendation system.

Overall, we will take the following steps to preprocess the data:

Join the three original datasets: spotify_artists.csv, spotify_album.csv, and spotify_tracks.csv
Clean genres column in the joint dataset to extract genres for each song
Retain only songs released after 1990
Clean uri column in the joint dataset to later include song covers in the app
Export preprocessed dataset to csv file for further app building

Detailed preprocessing code can be found in this notebook.

II. App Building

With the preprocessed dataset, we will start building our main application. We will use Streamlit, a Python web framework for building web apps quickly and efficiently. Check out its documentation here!

We will create a Python script file (music_rec_app.py ) to include our preprocessed data, k-NN model, as well as frontend layout code.

First, let's install/import necessary libraries and load our data.

Install/Import Libraries

!pip install streamlit
!pip install pandas
!pip install scikit-learn

import streamlit as st
st.set_page_config(page_title="Song Recommendation", layout="wide")

import pandas as pd
from sklearn.neighbors import NearestNeighbors
import streamlit.components.v1 as components

Load Data

We will define a function to load the preprocessed dataset. We will also use the Streamlit built-in function @st.cache_data() for faster loading times.

@st.cache_data()
def load_data():
    df = pd.read_csv("data/filtered_track_df.csv")
    df['genres'] = df.genres.apply(lambda x: [i[1:-1] for i in str(x)[1:-1].split(", ")])
    exploded_track_df = df.explode("genres")
    return exploded_track_df

In addition, we will also define a list of genres that we want users to select from. Here, the data is also loaded using defined function.

genre_names = ['Dance Pop', 'Electronic', 'Electropop', 'Hip Hop', 'Jazz', 'K-pop', 'Latin', 
               'Pop', 'Pop Rap', 'R&B', 'Rock']
audio_feats = ["acousticness", "danceability", "energy", "instrumentalness", "valence", "tempo"]

#Load data
exploded_track_df = load_data()

a. KNN Algorithm

With the data loaded, we can build the machine learning model to recommend songs. We will use k-Nearest Neighbors model to get the top songs that are closest in distance with the set of feature inputs selected by the user. These feature inputs include the genre of interest, release year range (start and end year), and a set of audio features (acousticness, danceability, energy, instrumentalness, positiveness, and tempo).

We will define a function to use Scikit-learn's k-NN model to return the Spotify URIs and audio feature values of top neighbors in ascending order of their rank.

def n_neighbors_uri_audio(genre, start_year, end_year, test_feat):
    genre = genre.lower()
    genre_data = exploded_track_df[(exploded_track_df["genres"]==genre) & (exploded_track_df["release_year"]>=start_year) & (exploded_track_df["release_year"]<=end_year)]
    genre_data = genre_data.sort_values(by='popularity', ascending=False)[:500] # use only top 500 most popular songs
    
    neigh = NearestNeighbors()
    neigh.fit(genre_data[audio_feats].to_numpy())
    
    n_neighbors = neigh.kneighbors([test_feat], n_neighbors=len(genre_data), return_distance=False)[0]
    
    uris = genre_data.iloc[n_neighbors]["uri"].tolist()
    audios = genre_data.iloc[n_neighbors][audio_feats].to_numpy()
    return uris, audios

b. App Layout Integration

We now can build the UI layout for our app! For this app, I'm aiming for a dashboard to allow the user to customize the songs they want to listen to. Below is a snapshot of the end product!

Let's jump into the details!

First, we will build the framework of the app, including the page title, multiple sliders and buttons so that users can customize their preferences. I will also add a sidebar with relevant instructions to use the app and links to my other exciting projects :)

st.title("Personalized Song Recommendations")
    
   st.sidebar.title("Music Recommender App")
   st.sidebar.header("Welcome!")
   st.sidebar.markdown("Discover your soon-to-be favorite songs by selecting genres and audio features.")
   st.sidebar.markdown("Tips: Play around with different settings and listen to song previews to test the system!")
    
    
   # Add buttons to the sidebar
   if st.sidebar.button("Check out my other projects"):
        st.sidebar.markdown("[https://hahoangpro.wixsite.com/datascience]")
   if st.sidebar.button("Connect with me on LinkedIn"):
        st.sidebar.markdown("[https://www.linkedin.com/in/ha-hoang-86a80814a/]")
   
    
   with st.container():
        col1, col2, col3, col4 = st.columns((2,0.5,1,0.5))
       with col3:
            st.markdown("***Select genre:***")
            genre = st.radio(
                "",
                genre_names, index=genre_names.index("Pop"))
       with col1:
            st.markdown("***Select features to customize:***")
            start_year, end_year = st.slider('Select year range', 1990, 2019, (2015, 2019))
            acousticness = st.slider('Acousticness', 0.0, 1.0, 0.5)
            danceability = st.slider('Danceability', 0.0, 1.0, 0.5)
            energy = st.slider('Energy', 0.0, 1.0, 0.5)
            valence = st.slider('Positiveness (Valence)', 0.0, 1.0, 0.45)
            instrumentalness = st.slider('Instrumentalness', 0.0, 1.0, 0.0)
            tempo = st.slider('Tempo', 0.0, 244.0, 118.0)

We choose to recommend 6 songs per user's choice of inputs. Here, we also want to display the covers of recommended songs. Using the Spotify Developer Widget, we can display an iframe using HTML.

 tracks_per_page = 6
    test_feat = [acousticness, danceability, energy, instrumentalness, valence, tempo]
    uris, audios = n_neighbors_uri_audio(genre, start_year, end_year, test_feat)
    
    # Use Spotify Developer Widget to display iframe with classic HTML
    tracks = []
    for uri in uris:
        track = """<iframe src="https://open.spotify.com/embed/track/{}" width="260" height="380" frameborder="0" allowtransparency="true" allow="encrypted-media"></iframe>""".format(uri)
        tracks.append(track)

In addition, we also want to provide user with more options than just 6 songs. Thus, we will add the "Recommend More Songs" button. To do that, we will use Streamlit's Session State. The idea is to check if the users alter any of the inputs. If they do, the system will recommend the top 6 songs (of the top neighbors list). If the users continue to press "Recommend More Songs" button without changing the inputs, then the next 6 songs in the list will be recommended.

current_inputs = [genre, start_year, end_year] + test_feat
    
    try: 
        previous_inputs = st.session_state['previous_inputs']
    except KeyError:
        previous_inputs = None
        
    if current_inputs != previous_inputs:
        st.session_state['start_track_i'] = 0
        st.session_state['previous_inputs'] = current_inputs

Now we can display the songs.

     if 'start_track_i' not in st.session_state:
        st.session_state['start_track_i'] = 0
        st.write("start_track_i initialized:", st.session_state['start_track_i'])

    # Add "Recommend More Songs" button
    if st.button("Recommend More Songs"):
        if st.session_state['start_track_i'] < len(tracks):
            st.session_state['start_track_i'] += tracks_per_page  # Show 6 more songs
            

    with st.container():
        col1, col2, col3 = st.columns(3)  # Create 3 columns for a 3x3 grid

    current_tracks = tracks[st.session_state['start_track_i']: st.session_state['start_track_i'] + tracks_per_page]
    current_audios = audios[st.session_state['start_track_i']: st.session_state['start_track_i'] + tracks_per_page]

    for i, (track, audio) in enumerate(zip(current_tracks, current_audios)):
        if i % 3 == 0:
            with col1:
                components.html(
                    track, 
                    height=400, 
                )
        elif i % 3 == 1:
            with col2:
                components.html(
                    track,
                    height=400,
                )
        else:
            with col3:
                components.html(
                    track,
                    height=400,
                )

    if st.session_state['start_track_i'] >= len(tracks):
        st.write("No more songs to recommend")

Below is a demo video to show the app works!

Entire code for the app can be found here!

III. Deployment

Now that our app is up and running, we can deploy the app to conveniently share with others. There are many options available, but I personally find deploying the app on Streamlit Cloud very simple. With just a few clicks, you can deploy the app and others can access the app using generated link.

The code shown throughout this article is available in this github repo!