Analyzing Netease Music- Part I: Playlist

Analyzing Netease Music- Part I: PlaylistMartin LiuBlockedUnblockFollowFollowingJun 11Netease Music LogoNetease Music (https://music.

163.

com/), a Chinese equivalent of Spotify, is a music app that allows users to stream, download, and engage with music from all over the globe.

With a monthly active user base of over 70 million, Netease ranks top 4 in the Chinese music app, along with Kugou Music, Kuwo Music, and QQ Music.

Besides, it is the only non-Tencent based App among the four.

Netease Music’s audience exhibits some interesting characteristics.

Based on the company’s report as well as data from QuestMobile, the App’s users is younger, has a higher percentage of male, and engage with the community frequently.

Not only do users open and use the app more often than its competitors, but they also tend to leave a lot of comments.

In fact, comments in Netease Music has become one of its major attractions.

People would post funny jokes, heartbroken stories, or even express their feelings to their secret lover and ask other listeners to thumbs-up the comment so that he/she could see it.

In one of Netease Music’s promotional campaigns, it published the comments with the highest number of thumbs-ups in the metro stations and trains in Hangzhou.

This creative campaign was a huge success for the App and further establishes itself as the most music App with the most “social” attribute.

The comment promotional campaignSimilarly to its Western counterparts, most of the listeners would use Playlist as a way to store their favorites tunes as well as discover new things.

In Netease Music, users can “play” the list, “fav” the list to listen to it in the future, “comment” on the playlist, and “share” the playlist either within the App or to WeChat/ Weibo.

In this article, I will be looking at some of the metrics of the Playlist by itself.

In the next few articles, I will also explore the metrics on individual songs and the comments of those songs.

How to get the dataAll data are publicly available on the website, and gather using a crawler I built personally.

The crawler basically follows the logic of discovering playlists from the homepage, discovering songs from playlists, and then downloading the comments from the playlists and songs.

The crawler is still being actively developed.

If you are interested in learning how to crawl, please leave a comment below and I will an article on the crawling part.

Data SizeThe analysis conducted below used a sample of 6,000 playlists.

Those are either recommended in the homepage or somewhat related to those top playlists, so these lists tend to have a higher number of engagements.

Explore numeric variablesFirst, we will look at the numeric value of the dataset.

As explained previously, the engagement metrics are “fav”, “comments”, “share”, and “play”.

The dataset also has a “song_num” variables which describe the number of songs in a playlist.

Since the default Pandas describe() uses the float type and leave many 0s in the table, we can make it prettier using the applymap() function to change the format of the output.

df[['comment_num', 'fav_num', 'share_num', 'play_num','song_num']].

describe().

applymap(lambda x: format(x, '.

0f'))describe() resultAs we can see from here, the median of comments and shares is 10.

For “fav” the number is 663, and play number is 36k.

Roughly speaking, a playlist would need to be played 4,000 times in order to get a “share” or “comment”.

This shows that the comment and share functions are not that frequently used.

Note that the “comment” here applies to the entire playlist and it’s different than commenting on an individual song, which is much more popular.

To visually show these metrics, we can use a boxplot the columns.

However, since the data of all four columns are massively skewed, plotting them directly would not yield a good result.

The graph below doesn’t really tell us anything since the x-axis is very large and doesn’t show enough details of the “long-tail” playlists.

sns.

boxplot(data = df[['comment_num', 'fav_num', 'share_num', 'play_num']], orient='v')Boxplot without normalizationIn order the normalize this skew data, we can do a quick and dirty trick of using logarithm function.

Read this article if you want more details.

https://becominghuman.

ai/how-to-deal-with-skewed-dataset-in-machine-learning-afd2928011ccsns.

boxplot(data = df[['comment_num', 'fav_num', 'share_num', 'play_num']] .

applymap(lambda x: math.

log(x+1)), orient='v')Boxplot with normalizationAfter the normalization, we can see that all four metrics follow a relatively normal distribution.

Correlation between the variablesDoes playlist with a high number of plays number tend to have a high number of comments as well?.It seems to be an obvious positive correlation, but it is still interesting to see how the number rolls out.

corr = df[['comment_num', 'fav_num', 'share_num', 'play_num']].

corr()sns.

heatmap(corr, xticklabels=corr.

columns, yticklabels=corr.

columns, cmap = 'Blues', annot = True)We can use the corr() function by pandas to get the correlation figures and use the seaborn to graph it.

Note that you can use the cmap parameters to change the color of the graphs.

For correlation heatmap, a single color based graph will be more efficient since it shows the number comparison more clearly.

Interestingly enough, play number and fav number have the strongest correlation.

This makes sense since there need to be a large number of people listening to the playlist before they can save it, and fav it means that they will listen to it in the future.

Comments have a weaker correlation with the other metrics.

Since comment number and share number are much smaller compared to the other two, it is likely that those metrics have a higher portion of “noise” in them, therefore result in a weaker correlation.

Share number, on the other hand, has an even weaker correlation.

Although still a strong positive correlation, the number is not nearly as deterministic.

Maybe one reason is that sharing a playlist means something different than playing it or faving it.

By playing or faving, a listener is engaging with the music individually.

By commenting on a playlist, the listener is engaging with the author and other people who like the playlist.

However, by sharing a playlist, it means that the listener is engaging with the music socially and committed to recommending the list to their friends.

Maybe a guy would share a classic rock playlist in the day and loop “I know you were trouble” in the night?Explore other variablesAnother interesting column we can look is the “tag_list”.

When a user creates a playlist, the user can choose at most 3 tags for it.

The tags include languages, genre, suitable occasions, emotions, and themes.

For those of you who do not know Chinese, I have created a translation file to map those tags to English.

Tag selection page at Netease Music AppAfter expanding the tag list into a list of labeled tags, we can then plot the histogram of each of those labels.

Although the sample size is quite small, the data may be able to show some interesting trend on the music preference of Netease Music’s listeners.

LanguageLanguage count plotIn terms of language, Chinese music is still the most dominant category, but Western Music (mostly English) is following closely behind.

Japanese also takes up a decent portion of playlists.

GenreGenre count plotFor Genres, besides “World Music”, which is rather hard to define precisely and probably chosen because people don’t know what else to choose, Pop is the most popular category.

“New Age” and “Light Music” are both relaxing, taking up the third and fourth place.

This may come as a surprise since those music are not nearly as popular in the Western world.

EDM and Rap, despite its recent popularity in the younger generation, still take up a relatively small percentage of the playlists.

OccasionOccasion count plotFor occasion, night and lunch break are the two dominant tags.

This goes in line with the “light music” trend we identified above since people tend to hear more “soothing” music during those two time periods.

It is also worth noting that for the “Pop” lists, list creators may not select an “occasion” tag for it.

EmotionEmotion count plotTheme count plotThe top two emotions are “nostalgia” and “happy”.

Compared to the low number of “sad” and “lonely” lists, seems like people are more inclined to make playlists on positive feelings.

For theme, since the number is quite small, it is hard to identify a trend.

It is worth noting that animation and gaming have some strong representation, showing that the “anime” culture in China is still thriving.

Overall, the Netease Music listeners have a “lighter” taste compared to the Western world.

The audience as a whole has embraced a diverse set of music.

Fun FactsThe most played playlists have a play number of 148 million, and it is a collection of English songs with the best intros.

The first song in the list is “Five Hundred Miles”.

The playlist that has the most comments (10k) is a collection of soundtracks of the Chinese cartoons in the 90s.

Okay, that’s it for this one.

In the next article, I will explore some machine learning methods on the playlist data, trying to predict the category based on text data and song data.

.

. More details

Leave a Reply