Recently, I am working on a personal project related to analysing music taste of people and it requires retrieving data from some music sources like Spotify. And fortunely, it provides some quite useful APIs that I would like to share here.
Disclaimer
This is only a part of my project, with some demo codes without unit testing, error handling and other parts for a completed project.
What I want
Basically, I need a list of Playlists from Spotify including meta data of each playlist like tracks from playlist. So, technically speaking:
My input:
- User authentication token (Spotify APIs doesn’t requires it, but I do want to use because it give me higher rate limits)
And I expect an output like:
- JSON data with Playlist info, including metadata as tracks in each playlist.
Okay, I pictured my goals and it’s time to get things done.
Let’s start
Simply enough, I’ll start with the docs from official page.
We need to create Spotify account first that can be done easily from https://www.spotify.com
Then you need to create developer account to use APIs https://developer.spotify.com/dashboard/login
And last but not least, create an app from the dashboard https://developer.spotify.com/dashboard/applications to get client id and client secret
Install Nodejs because we’ll need a server-side application.
Clone the repo https://github.com/spotify/web-api-auth-examples that contains the authentication example app which provides a interface to register Spotify user account to our Spotify app.
$ git clone https://github.com/spotify/web-api-auth-examples auth-app
Then run npm to install required packages like express,request and querystring.
$ cd auth-app
$ npm install
After install, we will see a new folder node_modules
that contains all the installed modules.
Now go into the code of this auth-app
$ cd authorization_code
There are 2 things inside, the app.js
is the main code of the application, folder public
contains
a html
file
Inside app.js
there are 3 things you should care about:
- client_id: Client Id of your app. (After create a new app as above, in the dashboard)
- client_secret: Client Secret of your app.
- redirect_uri: the url we will redirect after authenticating successfully.
The port of server is set to be 8888
, so I will set redirect_uri base on it
var redirect_uri = 'http://localhost:8888/callback/';
You also need to whitelist your redirect_uri in the settings of your app.
Now go to http://localhost:8888
and follow all steps to login and you should be
redirect to this after successfully login.
Working with playlists
Now it’s the fun part. How can I retrieve data of playlists I need?
After reading the documents, I have a brief understand about what I can do with this api.
There are 3 ways to get info about playlists:
- Using API
/v1/users/{user_id}/playlists
to acquire playlists of a specific user_id - Using API
/v1/search
to search and return playlists with a specific query keyword - Using API
/v1/browse/categories/{category_id}/playlists
to retrieve playlists from a specific category
I want to use the third method, because the playlists I’m interested in should be distinguished by category instead of user, or query keywords.
There are many categories in spotify, and each category contains many playlists, each playlist has many tracks too. But I don’t want to get all playlists from each category, so I don’t have to call too many apis.
The expected output is something like this:
For each category, I retrieve 5 playlists
For each playlist, I want all tracks with its name, singer, and other data
[
{
"category": "A",
"playlists": [
{
"name": "First playlist",
"tracks": [
{
"track_name": "Song 1",
"singer": "B"
}
]
}
]
}
]
To be able to manipulate data conveniently, I would like to save category, playlists, and tracks in database.
I’ll use mongodb because it’s the most familiar database to me. You can use any of these
First, copy the access token in the login page above and assign it to new
variable in app.js
.
var acc_token = '<your access token>'
Then install mongoose package
npm install mongoose
Mongoose is a mongodb object modeling tool designed to work in an asynchronous environment or
in this case, nodejs.
Add mongoose
var mongoose = require('mongoose');
// Connect mongoose to our database named spotify.
mongoose.connect('mongodb://127.0.0.1/spotify', {useNewUrlParser: true});
// useNewUrlParser to fix a DeprecationWarning
// Declare schema
const Schema = mongoose.Schema;
Design database
We obviously care 3 objects here: category, playlist and track, but how to design our database to store and manipulate data effectively?
There are 2 ways to design our model.
The first one is store all 3 objects above into 1 Schema. I can call it Category_Detail model.
The advantage is whenever I want to retrieve data from db, I only have to access one model,
no need to use reference fields.
The disadvantage is, as I considered, bigger because I will need to store my playlists in each Category_Detail as an array which makes thing hard to update, and with each playlist I need another array for tracks, so 2 nested arrays are too complicated for a model like this.
Thus, I decide to use the second way to design my model
3 seperated models as following category, playlist and track. Playlist will contains
1 reference field that refers to category, and track will contains 1 reference field refers to
playlist that it belongs to.
const CategorySchema = new Schema({
cat_name: String,
cat_id: String,
});
const Category = mongoose.model('Category', CategorySchema);
const PlaylistSchema = new Schema({
category: {type: mongoose.Schema.Types.ObjectId, ref: 'Category'},
playlist_name: String,
playlist_id: String,
});
const Playlist = mongoose.model('Playlist', PlaylistSchema);
const TrackSchema = new Schema({
playlist: {type: mongoose.Schema.Types.ObjectId, ref: 'Playlist'},
track_name: String,
track_id: String,
artists: Array,
images: Array,
playlist_name: String, // This field for convenient data analysis later
playlist_id: String, // This field for convenient data analysis later
});
const Track = mongoose.model('Track', TrackSchema);
Get category
Now, we can define a new route to get category
app.get('/get-category', (req, res) => {
var options = {
url: 'https://api.spotify.com/v1/browse/categories',
headers: {
'Authorization': 'Bearer ' + acc_token
},
qs: {
'limit': 50,
'country': 'VN'
},
json: true
};
request.get(options, (error, response, body) => {
res.send({
'data': body
})
});
})
Now go to http://localhost:8888/get-category
, we’ll get the response as
I only need category id to query playlists for each category. So we should change the code a bit.
// Define a function that store category
var save_category_to_db = function(data){
Category.count({cat_id: data.id}, function (err, count){
if (count === 0){ // Only store category that's not existed in database yet
const cat = new Category();
cat.cat_id = data.id;
cat.cat_name = data.name;
cat.save()
}
});
}
// Define a route /get-category that call Api browse categories
app.get('/get-category', (req, res) => {
var options = {
url: 'https://api.spotify.com/v1/browse/categories',
headers: {
'Authorization': 'Bearer ' + acc_token
},
qs: {
'limit': 50,
'country': 'VN'
},
json: true
};
request.get(options, function (error, response, body) {
// Store category to db
body.categories.items.forEach((val) => {
save_category_to_db(val)
})
// Response
res.send({
'data': 'Success'
})
});
})
and in db, we have our categories
[
{
"_id" : "5d52b6f2de0da87f6ec98e7f",
"cat_id" : "toplists",
"cat_name" : "Top Lists"
},
{
"_id" : "5d52b6f2de0da87f6ec98e80",
"cat_id" : "pride",
"cat_name" : "Pride"
},
{
"_id" : "5d52b6f2de0da87f6ec98e82",
"cat_id" : "kpop",
"cat_name" : "K-Pop"
}
]
Get playlist
For each category, I want to call Api once to get list of playlists I want. The problem is nodejs is asynchronous, and after we make a request to query toplists category, nodejs processes will continue make request to query kpop before previous request responses. To solve that problem, we need to make sure all the requests must response completely before we response to client.
The logic flow can be understood as below
Make request GET /https://api.spotify.com/v1/browse/categories/toplists/playlists
to get all playlists of toplists
Make request GET /https://api.spotify.com/v1/browse/categories/kpop/playlists
to get all playlists of kpop
...
After all requests above response successfully, save those playlists to db, then response to client
To do all of these above, we’ll use Promises or in this case bluebird package
npm install bluebird
npm install request-promise
Use bluebird and request-promise
var Bluebird = require('bluebird');
var rp = require('request-promise');
var save_playlist_to_db = function(data){
var cat_id = data.href.split('/')[6]; // Extract cat id from href in response
Category.findOne({cat_id: cat_id}, function(err, doc){
if(doc){
data.items.forEach(val => {
Playlist.count({playlist_id: val.id}, function (err, count){
if (count === 0){
const playlist = new Playlist();
playlist.playlist_name = val.name;
playlist.playlist_id = val.id;
playlist.category = doc;
playlist.save()
}
});
})
}
})
}
app.get('/get-playlist', (req, res) => {
// Get all categories in db
Category.find({}, function(err, playlists){
var playlist_requests = playlists.map((val) => {
// For each category, make promise
return rp({
url: 'https://api.spotify.com/v1/browse/categories/' + val.cat_id + '/playlists',
headers: {
'Authorization': 'Bearer ' + acc_token
},
qs: {
'limit': 50,
'country': 'VN'
},
json: true
})
})
Bluebird.all(playlist_requests).then(response => {
// After response completed, store playlist in db
response.map((value) => {
save_playlist_to_db(value.playlists)
})
res.send({
'data': response
})
})
})
})
And we get result like below, with name, id, category of all playlists
[
{
"_id" : "5d52bf2aae62ab8263811977",
"playlist_name" : "Viral Hits",
"playlist_id" : "37i9dQZF1DX44t7uCdkV1A",
"category" : "5d52b6f2de0da87f6ec98e7f"
},
{
"_id" : "5d52bf2aae62ab8263811978",
"playlist_name" : "Viral 50 Việt Nam",
"playlist_id" : "37i9dQZEVXbL1G1MbPav3j",
"category" : "5d52b6f2de0da87f6ec98e7f"
},
{
"_id" : "5d52bf2aae62ab826381197c",
"playlist_name" : "New Music Friday Malaysia",
"playlist_id" : "37i9dQZF1DWZMWLrh2UzwC",
"category" : "5d52b6f2de0da87f6ec98e7f"
}
]
Get all tracks in each playlist
Now we have playlists
[
{
"id": "37i9dQZF1DXcBWIGoYBM5M",
"name": "Today's Top Hits"
},
{
"id": "37i9dQZF1DX44t7uCdkV1A",
"name": "Viral Hits"
}
]
And if we make request like GET https://api.spotify.com/v1/playlists/37i9dQZF1DX44t7uCdkV1A/tracks
we
will be able to retrieve list of tracks in that playlist.
But once again, nodejs is asynchronous and there are too many playlists we need to get tracks, we’ll get Apis limit rate error.
So we need to make seperated requests to avoid the error above. I make new route
app.get('/get-tracks', (req, res) => {
get_tracks(function(response){
// get_tracks function will receive 1 callback to response to client
res.send({
'data': response
})
});
})
The function get_tracks
will do something for us:
- Query all category from database
- From each category, take 2 or 3 playlists
- From each playlist, call api GET tracks from playlist
var get_tracks = function(cb){
Category.find({}, function(err, cats){
var promise_arr = [];
var query_promises = [];
cats.forEach(cat => {
query_promises.push(
Playlist.find({category: cat.id}, null, {limit: 2}).exec() // Only query 2 playlist
// for each category to avoid limit rate error
)
})
// This promise is query promise to make sure we call api after completing all queries
Promise.all(query_promises).then(results => {
results.forEach(playlists => {
// For each playlist, make promise
playlists.forEach(val => {
promise_arr.push(
rp({
url: 'https://api.spotify.com/v1/playlists/' + val.playlist_id + '/tracks',
headers: {
'Authorization': 'Bearer ' + acc_token
},
qs: {
'limit': 100,
'country': 'VN'
},
json: true
})
)
})
})
Bluebird.all(promise_arr).then(response => {
response.forEach(val => {
// After finishing calling API, save response data to database
save_track_to_db(val)
})
}).catch(err => {
console.log(err)
})
// Because query and saving data took too much time, I response to client this msg
cb('Processing')
})
})
}
After calling API completely, we save the response data to track database
var save_track_to_db = function(data){
var playlist_id = data.href.split('/')[5]
Playlist.findOne({playlist_id: playlist_id}, function(err, pl){
data.items.forEach(val => {
if(val && val.track){
Track.count({track_id: val.track.id, playlist: pl}, function(err, count){
if(count === 0){
const track = new Track();
track.track_id = val.track.id;
track.track_name = val.track.name;
track.playlist = pl;
track.artists = val.track.artists;
track.images = val.track.album.images;
track.playlist_name = pl.playlist_name;
track.playlist_id = pl.playlist_id;
track.save()
console.log('Done save track ' + val.track.name) // Log on terminal
}
})
}
})
})
}
While client receive a response like Processing, in the console terminal, we can see what the node server is doing
As a result now we have track database like this
[
{
"artists" : [
{
"external_urls" : {
"spotify" : "https://open.spotify.com/artist/7plUpXSFcSJUZSiZAoXqr1"
},
"href" : "https://api.spotify.com/v1/artists/7plUpXSFcSJUZSiZAoXqr1",
"id" : "7plUpXSFcSJUZSiZAoXqr1",
"name" : "Ximena Sariñana",
"type" : "artist",
"uri" : "spotify:artist:7plUpXSFcSJUZSiZAoXqr1"
}
],
"images" : [
{
"height" : 640,
"url" : "https://i.scdn.co/image/07f28993ffc11c7382b754464e1f1443c4bc79ce",
"width" : 640
},
{
"height" : 300,
"url" : "https://i.scdn.co/image/9944f5b468191add68ce6b6dc406a96275fea294",
"width" : 300
},
{
"height" : 64,
"url" : "https://i.scdn.co/image/91608d3fd627308407f5fbdc73705cd5f98df438",
"width" : 64
}
],
"track_id" : "2vNDq5XAoF4Gl4hfY7aacS",
"track_name" : "¿Qué Tiene?",
"playlist_name" : "Viral Hits",
"playlist_id" : "37i9dQZF1DX44t7uCdkV1A"
},
{
"artists" : [
{
"external_urls" : {
"spotify" : "https://open.spotify.com/artist/6y8XlgIV8BLlIg1tT1R10i"
},
"href" : "https://api.spotify.com/v1/artists/6y8XlgIV8BLlIg1tT1R10i",
"id" : "6y8XlgIV8BLlIg1tT1R10i",
"name" : "Old Dominion",
"type" : "artist",
"uri" : "spotify:artist:6y8XlgIV8BLlIg1tT1R10i"
}
],
"images" : [
{
"height" : 640,
"url" : "https://i.scdn.co/image/f7bb9602ffa552a79e04ae047fa6d4cd973b6c6e",
"width" : 640
},
{
"height" : 300,
"url" : "https://i.scdn.co/image/cc2a45aee931af6218740d2bebf74c4bcd720b5e",
"width" : 300
},
{
"height" : 64,
"url" : "https://i.scdn.co/image/df2775a7d2ec9f5f319d28833c1b3d06ec6bb770",
"width" : 64
}
],
"track_id" : "1kTugNMVMbaQep1srMua2q",
"track_name" : "Bad At Love - Recorded at Sound Stage Studios Nashville",
"playlist_name" : "Viral Hits",
"playlist_id" : "37i9dQZF1DX44t7uCdkV1A"
}
]
Conclusion
And we’ve done. I’m now able to retrieve thounsand of songs and playlists from Spotify. I could make a small background tasks run periodically to query new songs, new playlists if I want. After that, when I have enough data, I can conduct some analysis over it to learn new insights about music.
Thank for reading, this is one of my first blogs so it should have many mistakes. Hope you guys enjoy it and give me some feedback (when I enable comment from this website, of course).