Exploratory Data Analysis with Python in B2B Marketing

Exploratory Data Analysis with Python in B2B MarketingMai NguyenBlockedUnblockFollowFollowingMay 26This project focuses on conducting Exploratory Data Analysis (EDA) for B2B Marketing using Python.

Not only providing the method and the code, but I also want to discuss the fundamentals in B2B Marketing.

I also discuss how these insights can help Olist to make a better decision.

All the Python code is provided on my GitHub.

IntroductionOlist is a platform that connects small and medium business with the marketplaces in Brazil (Extra, Shoptime, Walmart, Submarino, .

etc) and helps the merchants to sell at these marketplaces.

Unlike in some countries, the fast-growing e-commerce sector in Brazil is dominated by a number of large-size marketplaces rather than by a few marketplaces (Ex: Amazon and Ebay in the U.



Therefore, a merchant has the incentive to operate in multiple marketplaces to maximize their revenue.

With the limitation in human resources as the main pain point, the merchant would want to use Olist to manage its sales channel in different marketplaces and streamline the stock management and the order fulfillment process.

Olist’s business modelWith this B2B2C model, Olist’s marketing team have 2 main objectives:From the seller side, they want to increase the number of merchants using Olist platform.

From the buyer side, they want to maximize the e-commerce sales of the merchants on these marketplaces.

The scope of this analysis focuses more on the seller side (B2B) of Olist.

With a given dataset, I will conduct the exploratory data analysis (EDA)that can bring some helpful insights into Olist’s B2B marketing team.

My Python code for the EDA is also provided.

You can access the full notebook here.

Marketing & Sales process for B2B:B2B Marketing FunnelData characteristicsThe datasets were obtained from Kaggle.

You can download it here.

The first dataset has 4 variables with 8000 data points.

The variables are MQL_id (marketing qualified leads), its first contact date, the landing page that captured the leads, and the lead origin (the channel that brings the leads to the landing pages).

The second dataset contains 842 observations with 14 variables.

Each observation is a won deal of Olist which consists of the mql_id of the merchant, seller_id (used in Olist platform), sdr_id and sr_id(the sales development representative and sales representative that were in charge of the deal), won_date, business segment, lead_type, lead_behaviour_profile, business_type, declared_monthly_revenue.

Among these, 5 variables have too many Null values that they cannot bring value to the analysis.

Third dataset: Via Kaggle, Olist also donated their dataset about the demand side.

We will use this to inform Olist marketers about the performance of the sellers, thus help them to improve the B2B marketing process.

Exploratory Data AnalysisThe number of MQL: The MQL data is provided from 07/2017 to 06/2018, while the won MQL data is provided from 01/2018 to 12/2018.

In 2017, MQLs that Olist marketing team generated is around 400 MQLs/month.

In 2018, they boost the number to 1000–1400 leads per month.

Regarding the won MQL, its peak was at 05/2018 with 200 won opportunities.

#Figure 1num_mql = mql.


to_datetime(mql[‘first_contact_date’]))num_mql = num_mql.


Grouper(freq = “M”)).


drop(axis =1, columns = [‘first_contact_date’,“landing_page_id”, “origin”])num_won = close.


to_datetime(close[“won_date”]))num_won = num_won.


Grouper(freq = “M”)).


drop(axis = 1, columns = [“seller_id”, “sdr_id”, “sr_id”,“business_segment”, “lead_type”, “lead_behaviour_profile”, “has_company”, “has_gtin”, “average_stock”, “business_type”,“declared_product_catalog_size”, “declared_monthly_revenue”, “won_date”])plt.

figure(figsize = (8,6))plt.


index, num_mql, "-", label = "number of MQL")plt.


index, num_won, "-", label = "number of won MQL")plt.


title("Number of MQL", size = 15)plt.


png")FIGURE 1Channels: The majority of MQLs come from organic_search channel, followed by direct traffic and social.

Other, other_publicities, referral and display are the sources that bring the least MQLs to Olist.

The organic_search MQL is significantly increased in 2018–02 and declined after that.

This can be the result of a big event/PR campaign.

#Figure 2mql_origin = mql.


agg({‘mql_id’:”count”});print(mql_origin)origin = list(mql_origin.


figure(figsize = (10,8))fancy_plot = plt.

subplot()for i in origin: channel = mql[mql[‘origin’] == i] channel = channel.


to_datetime(channel[‘first_contact_date’])) channel_agg = channel.


Grouper(freq = “M”)).


drop(axis = 1, columns =[“first_contact_date”, “landing_page_id”, “origin”]) fancy_plot.


index, channel_agg, “-o”, label = i)fancy_plot.


title(‘Number of MQL by channels overtime’, size = 15)plt.


png”)FIGURE 2The conversion rates for these channels are varied.

Organic_search, paid_search and direct traffic are the sources that enjoy the highest conversion rates (12.

5%, 11.

5%, and 11% respectively).

Email, other_publicities and social have the lowest conversion rates (3%, 5%, 5.

5% respectively).

This means that SEO and Google Adwords are the most effective marketing channels for Olist.

This result seems to be relevant to a couple of surveys illustrating the most effective marketing channel for B2B companies.

Here's An Industry Analysis Of The Most Effective B2B Marketing Channels [Updated w/ 2017 Data]In 2015 we published our inaugural State of Pipeline Marketing report and shared our findings on how pipeline marketers…www.


com#Figure 3origin_lost = data.

groupby([‘origin’, ‘lost’]).


drop(axis = 1, columns =[‘first_contact_date’,’landing_page_id’, “seller_id”, “sdr_id”, “sr_id”, “won_date”, “business_segment”, “lead_type”, “lead_behaviour_profile”, “has_company”, “has_gtin”, “average_stock”, “business_type”, “declared_product_catalog_size”, “declared_monthly_revenue”])percentage = []for i in origin: pct = origin_lost.






loc[False][0]) percentage.


figure(figsize = (6,4))plt.

bar(origin, percentage)plt.

xticks(rotation = 90)plt.

ylabel(‘won rate’)plt.


png”)FIGURE 3: Won rate by channelsLanding PageOlist used 495 landing pages to capture MQLs.

Even though the sales team is responsible for converting an MQL into a won opportunity, the marketing team can affect this likelihood from the top of the funnel by providing relevant messaging and benefits on the landing pages.

From figure 4 above, there are 2 landing pages that have a very high number of MQLs (~800 MQLs) as well as very high won rate (~20%), meaning that 20% of the MQLs captured from these landing pages become Olist’s sellers.

#Figure 4mql_lp = mql.


agg({‘mql_id’:”count”})mql_lp = mql_lp[mql_lp[‘mql_id’] > 30]data_lp = pd.

merge(data, mql_lp, how = “inner”, left_on = “landing_page_id”, right_index = True)lp_lost = data_lp.

groupby([‘landing_page_id’, ‘lost’]).

agg({‘mql_id_x’:”count”})landing_page = list(mql_lp.

index)percentage_lp = []landing_page_2 = []Num_mql = []for i in landing_page: if mql_lp.

loc[i][0] == lp_lost.


loc[True][0]: lp_lost.

drop([i]) else: pct = lp_lost.






loc[False][0]) percentage_lp.

append(pct) landing_page_2.

append(i) Num_mql.


loc[i][0])fig = plt.

figure(figsize = (10,4))ax = fig.

add_subplot(111)ax2 = ax.


bar(landing_page_2, percentage_lp)ax2.

plot(landing_page_2, Num_mql, color = “red”)ax.

set_ylabel(‘won rate’)ax2.

set_ylabel(‘number of MQL’)for tick in ax.

get_xticklabels(): tick.



png”)FIGURE 4: Number of MQLs and won opportunity by landing pagesLearning from these landing pages can help Olist to replicate the success to other landing pages.

Targeting- Business SegmentTargeting is very important in B2B marketing.

There’re segments that will be very interested in Olist and vice versa.

The following graph shows that by using the sales cycle (how long the sales process is) as the indication:#Figure 5data2 = pd.

merge(mql, close, how = “right”, on = “mql_id”)data2[‘first_contact_date’] = pd.

to_datetime(data2[‘first_contact_date’])data2[‘won_date’] = pd.

to_datetime(data2[‘won_date’])data2[‘sales cycle’] = data2[‘won_date’] — data2[‘first_contact_date’]data2[‘sales cycle’] = data2[‘sales cycle’].


dayssegment_time = data2.


agg({“mql_id”:”count”, “sales cycle”:”mean”})fig = plt.

figure(figsize = (10,4))ax = fig.

add_subplot(111)ax2 = ax.



index, segment_time[‘sales cycle’])ax2.


index, segment_time[‘mql_id’], color = “red”)ax.

set_ylabel(‘Sales cycle’)ax2.

set_ylabel(‘number of MQL’)for tick in ax.

get_xticklabels(): tick.

set_rotation(90)FIGURE 5Business segments such as Home_decor, health_beauty, household_utilities, construction tool for house and garden, car accessories and electronics are the majorities of the sellers with the sales cycle typically around 50 days.

Some segments that have a long sales cycle than others are perfume and watches.

However, since these 2 segments only have a small number of MQLs (7 & 10) so Olist should probably wait for more MQLs to have a more accurate observation.

FIGURE 6: Number of MQLs and sales length by lead typeRegarding the Lead_type variable, Online_medium is the most popular type among Olist MQLs.

There is no significant difference between the sales cycle of these business types.

#Figure 7fig = plt.

figure(figsize = (10,4))ax = fig.

add_subplot(111)ax2 = ax.



index, lead_time[‘sales cycle’])ax2.


index, lead_time[‘mql_id’], color = “red”)ax.

set_ylabel(‘Sales cycle’)ax2.

set_ylabel(‘number of MQL’)for tick in ax.

get_xticklabels(): tick.

set_rotation(90)Olist can also base on the sales cycle to evaluate the performance of a Sales Rep.

From Figure 7, It seems that the number of leads an SR has, the shorter the sales cycle.

This can be explained that an experienced sales rep (more MQLs) will perform better (shorter sales length) than a new sales rep.

FIGURE 7: Number of MQLs and sales length by Sales RepsLead BehaviorThe table below is the description of each behavior profile, based on DiSC, a behavior assessment tool developed from DiSC theory of psychologist William Moulton Marston.

segment = pd.

DataFrame(index = segment_time.

index)for i in [‘cat’,’eagle’, ‘wolf’, ‘shark’]: lead_profile = close[close[‘lead_behaviour_profile’] == i] segment = lead_profile.



rename(columns = {‘mql_id’:i}).

merge(segment, how = “right”, left_index = True, right_index = True)segment = segment.


figure(figsize = (6,10))sns.

heatmap(segment, annot = True)FIGURE 848.

3% of lead behavior profiles are cat — steadiness, meaning that this person places emphasis on cooperation, sincerity, and dependability.

Understand the characteristics of the leads can help Olist team to create relevant personas, making it easier for them to craft contents, messaging and stories to the targeted audience.

Business TypeFIGURE 9There are 2 business types: reseller and manufacturer.

Reseller accounts for 69.

7% of total MQLs.

Therefore, there is no surprise when reseller appears the most among the business segments.

However, in the home_decor segment, 56% are manufacturers.

Close deal performanceTime for first ordersFIGURE 10Most sellers have to wait less than 50 days to have the first order from Olist since the day they agree to join Olist platform (figure 10).

Main business segments such as heath_beauty, home_decor, household_utilities, audio_video_electronics,…typically wait for 50–60 days to have the first order since won date.

We also see some other segments such as food_drink, handcrafted and fashion_accessories have significantly longer time before its first order.

FIGURE 11Sales per segments:FIGURE 12Figure 12 is the top 10 product categories that have the highest sales from Olist.

Since health_beauty, computer_accessories, furniture_devor, housewares, auto, and garden_tools are also the main business segments among MQLs, the sales values from these segments also on top.

The only thing that surprises me is Watches.

Though only have a small number of sellers from Watches segment, revenue from watches is really high in comparison with other segments.

Conclusion & RecommendationWe look at Olist seller side marketing from different angles: channels, messages (landing pages), targeting (business segment, lead behavior, business type,.

), sales process (sales cycle, sales rep), closed deal performance to indicate the platform usage.

Each angle can add value to how the team can optimize marketing campaigns.

Here are some detail recommendations:Continue to work on SEO/Events to increase organic trafficLearn from successful landing pages → hypothesis for A/B testing -> replicate the successesSport leisure and watches can be developed more because we have high revenue from themLeads are steadiness type → build persona around these type of people and build messaging/content around themInvestigate in Home_decor segment why it attracts a lot of manufacturers?.Can it be replicated in other segments?About the author: I’m an MBA student at the University of California, Davis.

I have a passion for Marketing Analytics, Modeling, Machine Learning, and Data Science.

If you have any comment, feedback or questions, feel free to reach me at mapnguyen@ucdavis.

edu or connect me on LinkedIn.

Happy modeling!.. More details

Leave a Reply