Data Analysis of 10.000 AI Startups

That’s a huge difference.

Python is one of the most used languages when it comes to machine learning and it looks like a great favorite within Angel’s AI startups.

Please notice that we are only comparing among AngelList top techs, according to them, so other important programming languages were not included.

We could rearrange this data by date joined and check the growth of each of those techs during the last years:Python is growing, indeed.

It’s an amazing high-level, general-purpose language, with an extensive range of powerful libraries, and probably the most famous one when it comes to data science and machine learning.

Back to our analysis, let’s take a look at the market frequency now.

Which are the most common ones?Nice.

Although some of them are too general (like b2b and SaaS) and others could fit in the same category (like Big Data Analytics and Big Data), we can get a good comparison on the existing sectors.

Let’s try something more interesting.

Group our data by market and sum up the raised values to see how much money, in total, was invested by sector:Those are the 20 markets with the highest investment.

That doesn’t necessarily mean they have the largest amount of invested companies.

Let’s take a look at the biggest companies:Airbnb → 10.

3 Bi (Hotels)Netscape → 4.

2 Bi (News)Nest → 3.

3 Bi (Internet of Things)Palantir → 2.

1 Bi (Analytics)Grail → 1.

7 Bi (Diagnostics)That explains the enormous investment in the hotels market.

One or two huge companies can weight too much the total sum of investments.

Maybe taking the median investment of each market could give us a different outcome:Those are the 10 markets in which the median investment is highest.

And Hotels is not even there anymore.

Still, there may be other approaches that lead us to more revealing results.

Let’s count the number of invested companies by their market, instead of getting the amount invested.

Second, it would be nice to have that comparison made between investment ranges.

For instance, how many Mobile Advertising companies received an investment that ranges from 1 to 10 million dollars?For that, I built an interactive chart, in which you can click the buttons to interact (up to 1 Million, from 1 to 10 Million, and so on).

For each button, you get a bar plot with the number of companies that raised some amount in between that range.

That’s a much more complex analysis and can give investors and founders a deeper insight on how those markets behave in relation to an investment scale.

In which markets is it easier to be raised if you are in the first stage (seed)?.And which are the ones in which companies that became billion-dollar unicorns?If you’re reading this on a smartphone, you probably won’t be able to use the chart below.

Otherwise, feel free to interact with it and take your own conclusions.

Using the amount invested per year for each sector, we could even compare how some markets evolved since 2011.

Then we can check the average investment by stage:Average investment by stageFor some reason, Series A presents a lower average investment than Seed.

Let’s take a look at the total amount invested during the last years:We clearly see that 2012 was the year when AngelList exploded, probably together with a growth in the Venture Capital financing and an increasing number of startups worldwide.

Next plot shows the number of startups registered on the website per year.

Number of startups per yearFinally, what we can do is use the coordinates extracted from location with Geopy and build a cluster map with the world distribution of those startups.

The result is an interactive map that looks like this:That’s a location map for every single one of those 10.

000 companies.

Even if it’s a small sample, it is a pretty good representation of technology distribution across the countries.

To make it I used the Folium library and saved the output in HTML.

If you want to interact with the map, just go to my GitHub repository → click here, download cmap.

html and open it in your computer.

Click on the clusters to open up smaller clusters and click on those to see the companies.

If you click on a single company you’ll get the link of their website.

The picture below shows a heat map (hmap_weighted.

html) weighted by the investment amount, or: where does the AI money go to?That’s not even half of what we could do with a data set like that.

More insights could be obtained from the number of employees (size of the company), companies’ lifetime and even pitches could be analyzed using NLP.

For now, let’s just check the most common words used on startup slogans.

Word CloudWhat else could you extract?.Contact information of Founders, Co-Founders, and Investors.

Web scraping is amazing, and together with data analysis and machine learning, it becomes an incredibly powerful tool.

If you want access to the maps, data or notebooks, just go to my GitHub repository → click here, or leave a comment below.

Feel free to let any observations, concerns or ideas, and thank you for reading this post.

.. More details

Leave a Reply