The Anatomy of 200 Books

The Anatomy of 200 BooksStephen Ó MathúnaBlockedUnblockFollowFollowingFeb 24Sometime in 2011, though I don’t remember when or how it occurred, I realised that I missed reading.

I’d been quite a bookworm in childhood, but the usual distractions — TV, video games, a then-mysterious thing called The Internet — eroded my reading habits.

Words gave way to pixels.

During that year I gradually started reading again, and a watershed moment arrived towards the end of the year: I bought a Kindle.

It was an apt name; my lost appetite for reading was revived.

I found myself carving out more and more time during the day to read, such was the ease with which I could do it.

When January 2012 arrived I decided that I should log what I read.

I hadn’t yet discovered Goodreads, so I started with a plain old Excel spreadsheet.

This contained the author, title, and genre of each book I read.

I went on to discover Goodreads, but I retained the spreadsheet, tracking my books on both.

Over the last year or so I’ve felt the need to do something with the years’ worth of data.

For much of that time, I had no good ideas for an output.

Towards the end of 2018 I began some online courses on data analytics.

As a UX Researcher by trade, I’ve always been most comfortable with qualitative data.

I wanted to add a string to my bow, and improve my faculty for numbers.

I also discovered an article written by Andrea Zanni around that time.

Andrea shared my habit of tracking books read, but also the desire to tell a story about it.

I’m indebted to Andrea for introducing me to some of the tools that helped me put this together (outlined in the next section).

I was inspired by the article and decided that Medium would be a good — er — medium to tell my story to.

My goal was to apply the principles that I was picking up from the courses I was taking to help tell the story.

What follows is that story, seven years of my life measured by the books that I’ve read.

In that time I’ve lived in three cities, changed careers, emigrated, repatriated, and gotten married.

Big changes in my life that spawned hundreds of smaller changes.

But books have been the constant.

I’m amazed at how a book can conjur such vivid memories at a mere glance of its title.

I’ve jumped back in time to cafés visited, train journeys taken, parks visited at lunchtime, and a dozen other places and times.

MethodBelow is an outline of my approach.

If you’re more interested in books than data analysis, you can skip the more detailed description after the bullet points.

Build a spreadsheet to record the books I’ve read.

This combines my own data points with data captured by GoodreadsClean the data using OpenRefineCarry out analysis of the data using queries and pivot tables, using Google Sheets and Microsoft ExcelStoryboard the data story I wanted to tell using SketchPad for iPadCreate data visualisations using Excel for basic charts and RAWGraphs and VizyDrop for more complex visualisationsWrite up the article as per my storyboardsThe approach began with the spreadsheet I’ve maintained since 2012.

Initially I kept the spreadsheet basic, recording only the author, title, and genre.

When I started using Goodreads, I began to add more columns to my spreadsheet with all the extra data points I was capturing.

For this exercise, I added a couple of data points manually, since they are not tracked by Goodreads.

For example, Goodreads doesn’t capture how many new authors you read in a given year.

With the spreadsheet populated with all the data I wanted to work with, it was time to clean it.

I used OpenRefine, which has a little bit of a learning curve.

But after a couple of tutorial videos I was able to carry out the basics required by my spreadsheet.

It was pretty clean to begin with, since I had built the sheet from scratch.

But it did flag a few consistency issues, such as the labelling of authors.

It showed me, for example, that I had a George R.


Martin and a George R.



With squeaky clean data I started my analysis.

I began this in Google Sheets, making myself a dashboard to display many of the basic counts I would need at a glance.

It was a good exercise in improving my syntax.

I had a few ideas that I wasn’t sure how to execute, so I turned to Google’s help documentation.

When it came to pivot tables, I migrated the spreadsheet to Excel, because I found the pivot table feature on Google Sheets borderline unusable.

That said, I revisited Google’s pivot tables when I had finished the project, and it has been completely revamped.

Maybe I wasn’t the only one that had difficulty.

When I had bled the data dry for insights it was time to figure out what story I was telling.

I started by writing down all the data points and looking for themes.

These themes would form the main body of the story.

As the themes emerged, I did some storyboard sketches containing the visualisations I wanted to include:My low fidelity sketches ended up being only part of the story.

There were visualisations I didn’t know existed until I started using tools like RAWGraph and VizyDrop.

These are both great products that provide you with dozens of graphs at the click of a button.

Some of these visualisations are quite out there, and it was hard not to get carried away.

I tried to stick to visualisations that served the story I wanted to tell.

I hope I’ve been successful on that front.

With that done, it was time to write.

I’ve divided the main body of the article into ‘Authors’ and ‘Books’, with some concluding thoughts at the end.

AuthorsTo start with, here is a visualisation of all the authors I have read to date:The size of the bubbles represents the number of books read by a given author.

I’ve read a total of 71 authors over seven years of recording my reading habits.

While I don’t have any benchmarks to hand, my suspicion is that this is a narrow range of authors.

I’ve read two or more books by 43.

6% of the authors shown above.

It seems that if I enjoy a book then I am quite likely to return to that author.

And if I really enjoy a book, I’ll return again and again — my top five read authors account for 39.

5% of all the books I’ve read.

The majority of those belong to Stephen King (24).

Trying to read King’s catalogue feels like walking the wrong way up an escalator, such is the man’s capacity for putting out new books.

The master of horror pips Terry Pratchett (23), whose death in 2015 was a gargantuan loss to the literary world.

My ongoing journey through the Discworld feels more like a pilgrimage than a hobby.

It dawned on me recently that I should slow my progress because I never want it to end.

I broke down my top five authors (King, Pratchett, Wodehouse, Conan Doyle, and Gaiman) by year read.

Looking at the breakdown, it seems as though I get ‘hooked’ on a particular author during a given year.

Four of the five authors each accounted for the majority of books read in a single year (Neil Gaiman the exception).

However, this trend does seem to have levelled off over the last couple of years:2013 was the year I tackled King’s Magnum Opus, The Dark Tower, and all its associated works.

All things serve the Beam.

The top five changes composition when you look at pages read rather than books read.

Out of the top five drop P.


Wodehouse and Arthur Conan Doyle, replaced by George R.


Martin and Raymond Feist.

I guess it’s not surprising that authors best known for whimsical tales set in Victorian London are ousted by authors of high fantasy tomes.

I mentioned above that my author dependence has begun to level out over recent years.

This is further evident in the number of new and unique authors read over time.

New authors is a self-explanatory metric: it is the introduction of an author that I have not read before.

Unique authors refers to those read only once in any given year.

Examining the data on each of these dimensions suggests that I’ve been broadening my horizons pretty consistently since 2012:2012 is excluded from this chart, as all authors were ‘new’ as far as the records are concerned.

The number of new authors inducted to my personal library has grown every year apart from 2017.

I don’t think this has been a conscious effort on my part, but it is both interesting and encouraging to see the trend.

Now that I’m aware of the figures, I’m going to attempt to keep the trend going.

Looking at it another way, 2017 was my most diverse reading year.

In terms of unique authors, it was the only year in which I attained a 1:1 ratio of authors to books.

I confess that this is not random — it was a reading challenge I set myself at the outset of that year.

I did increase the number of unique authors in 2018, owing to the higher volume of books I read.

This means that, in absolute values, the number of unique authors has risen year-on-year since 2014:The 1:1 ratio for 2017, reported above, is not quite accurate; during that year I read Mistress of the Empire, co-written by Raymond Feist and Janny Wurts, meaning the number of unique authors outnumbers books read.

An alternative way of looking at these figures is measuring the average number of books per author read on an annual basis.

The figure reduces from 2.

1 in 2012 to one author in 2017, before climbing again last year.

The final author metric concerns gender.

It doesn’t make for great reading: only 25 out of 200 books were by female authors, with a total of eleven unique authors.

When plotted on an Alluvial diagram, below, it’s a bit startling to see that I read my first female author four years into tracking my books:The gender of an author isn’t something I’d given much thought to until I started examining the data behind my reading habits.

So it’s encouraging, at least, to see that the ratio has grown steadily enough since 2015, with 2018 being the peak so far.

I want to continue this trend, not least because some of my favourites were written by female authors.

Susanna Clarke’s Jonathan Strange and Mr Norrell; Claire North’s The First Fifteen Lives of Harry August; and Emily St John Mandel’s Station Eleven spring to mind.

BooksThe data on my books read encompasses areas such as length, genre, and publication dates.

The first thing I’ll touch on, though, is book format.

As I mentioned already, my rediscovery of the joys of reading was in large part down to my purchase of a Kindle.

It made me realise that I could read pretty much anywhere, owing to its form factor.

That said, the Kindle is more than convenience to me.

It is a beautiful reading experience in and of itself, which is a quality that is sometimes lost when people talk about the convenience factor.

The Kindle Voyage is one of the best designed products I’ve ever owned.

I realise I’m at risk of coming across as an Amazon shill.

But all that is to say that I love eBooks every bit as much as their paper counterparts.

I didn’t need to examine the data to know that they account for the lion’s share of books read — to the tune of 79%, it turns out.

In terms of annual distribution, there doesn’t appear to be any discernible patterns.

The only year in which I read an equal share of eBooks and paper books was 2014.

In every other year eBooks have dominated to a greater or lesser extent:The vast majority of paper books are hardbacks.

I’ve been collecting hardback editions of Discworld (Terry Pratchett) and Jeeves (P.


Wodehouse), published by Gollancz and Everyman respectively.

As two of my top five authors, these editions have been propping up the number of paper books in my collection.

In respect of volume, I’ve read 81,302 pages since 2012, an average of 407 pages per book.

I was particularly interested in how this would break down by year.

I wondered how my personal situation in any given year would impact the amount of reading I did, and the extent to which it would be visible in the data.

The impact of my personal circumstances on my reading habits is most evident looking at 2015.

That year was, by quite some distance, my most productive reading year (in pages read).

It was also the year I moved to London, where I spent roughly an hour per day on the Underground.

Granted, that is pretty good going by London standards, but it has been the longest commute in my career so far.

Looking at the data, it looks like I took advantage of my longer commute by reading some of the bigger tomes in my collection.

I read Susanna Clarke’s Jonathan Strange and Mr Norrell, Alexandre Dumas’s The Count of Monte Cristo, as well as the majority of George R.


Martin’s A Song of Ice and Fire series that year.

That said, my circumstances didn’t change the following year and there was a dip in productivity.

Perhaps I can’t read too much into my personal circumstances and how they affected my reading in a given year.

Speaking of tomes, I’ve averaged a little under one book over a thousand pages per year.

This hasn’t been a very even distribution — as alluded to above, half of these were all read in 2015 — but in the three years since I’ve only read one.

Indeed, my appetite for longer books in general appears to be on the wane: Patrick Rotfuss’s A Wise Man’s Fear is the only book over 700 pages I’ve read since 2016.

Before 2017, I was reading an average of 3.

6 books of that length per year.

Not Pictured: A few months before I started recording my reading habits I read King’s The Stand.

If plotted here it would sit alongside his other masterpiece, IT, at circa 1350 pages.

This begs the question: what does page distribution look like over the seven years as a whole?.As mentioned above, the overall page mean is 407 pages per book.

But that doesn’t make for a very interesting visual, so I plotted the page intervals of my 200 books.

This shows that the modal group is 325 to 368 pages:The spread looks even up to around the 370 page mark, with a gradual decline from there to the unwieldy tomes.

I should point out that 2018 accounts for a significant portion of books on the lower end of the scale.

Sometime during the autumn I realised that I was within sight of 200 books read.

I admit that the prospect of hitting that milestone influenced my reading choices for the rest of the year.

Twenty out of the 37 books read in 2018 fell within the first four distribution groups — making up a little under a quarter of those groups.

The final theme of my analysis concerns genre.

In Andrea Zanni’s article, genre delineation was kept to Fiction and Nonfiction.

I understand the merit of keeping it binary.

Breaking down books beyond this distinction is difficult, because it becomes arbitrary.

The boundaries between genres are often blurry.

Is Ridley Scott’s Alien foremost a Horror or a Science Fiction story?.With that said, breaking down my read books into different genres is exactly what I’ve done, scientific or not.

This is not least because there is a dearth of nonfiction in my personal library.

To an extent, this is down to how I’ve been recording.

My list contains books that I’ve read for leisure only.

Over the years I’ve read plenty of books for professional development, or to improve some other aspect of my life, but they are not included here.

That said, I would like to find more Nonfiction books with stories that engage me on the same level as their fictional counterparts.

I wanted to find a way of making genre definition less arbitrary, so the idea came to me to crowd-source it.

Goodreads provides the genre(s) of a book, as determined by its users.

The genres are in descending order of nominations by users.

For example, Richard Matheson’s I Am Legend is placed within the genres of ‘Horror’, ‘Science Fiction’, ‘Fiction’, ‘Classics’, ‘Vampires’, and ‘Post Apocalyptic’.

But I wanted to keep genres mutually exclusive.

Since the majority of users placed it within the Horror genre, that is how I recorded it in my data.

This proved to be an effective method for most books, but in some cases it created inconsistencies.

For example, it would have placed the Sherlock Holmes books into separate genres such as ‘Fiction’, ‘Classic’, and ‘Detective’.

But for the most part it was easy to adhere to the Goodreads users.

This left me with a tidy classification of five genres: Fantasy, Fiction, Horror, Nonfiction, and Science Fiction:Tree Maps aren’t terribly exciting, so I couldn’t help adding a flourish.

Icons made by Freepik from www.


comThis was another result that surprised me when plotted on a chart.

If someone pressed me for my favourite genre I’d toss a coin between Horror and Sci-Fi.

As with film, they’re the genres I most identify with.

Indeed, it is almost a coin toss between the two in my book tally.

The surprise is that they are third and fourth respectively.

Science Fiction does appear to be where I’m most adventurous in respect of new authors.

Looking at the chart below, it’s the only genre (leaving aside Nonfiction) that has more new authors than previously read authors:This makes sense, on reflection.

The fantasy books I’ve read tend to be part of a long series (some of them, seemingly, without end.

What I’d do for the next installment of A Song of Ice and Fire, The Kingkiller Chronicles, or Gentleman Bastard).

The same can be said for my Fiction habits: P.


Wodehouse and Arthur Conan-Doyle are best known for characters that span a long series of books.

There is also my affinity for espionage.

I’ve often returned to Ian Fleming and John le Carré, following the adventures of James Bond and George Smiley respectively.

As for Horror, well, one only need to look at which author is top of my reading charts.

In the case of Science Fiction, I tend to read fewer series and more standalone stories.

Curiously, there are a number of series on my Sci-Fi bookshelf for which I’ve read only the first book.

These include Frank Herbert’s Dune Chronicles, Orson Scott Card’s Ender’s Saga, and Douglas Adams’ Hitchhiker’s Guide to the Galaxy.

I say curious because I enjoyed the opening salvo for each of those series, but for some reason didn’t continue on with any.

The final data point I explored in respect of genre was publication date.

I wanted to know whether I leaned towards any particular era within a given genre, and how they compared to one another:There are some pretty clear differences between genres.

Fantasy stands out immediately as the genre most tied to a particular timeframe.

I’ve only read two Fantasy books published before 1980, fewer than in any other genre (again leaving aside Nonfiction).

Of all the genres it has the highest concentration of books published in the 1980s, 1990s, and 2010s.

On the opposite end of the spectrum, the genre with the widest spread of publication dates is Fiction.

Dumas’s glorious revenge epic The Count of Monte Cristo is the oldest book I’ve read.

Horror and Science Fiction are quite evenly distributed.

I expected a bigger cluster of Sci-Fi books from the middle of the 20th century, but only four books represent the ‘Golden Age’ of Science Fiction.

ConclusionsThere were two main objectives of this exercise:1) Apply some of the principles I picked up in the data analysis and visualisation courses I did2) Learn something about myself, and my reading habits, through the data I’ve collected over the last seven years.

Here are a few takeaways:The data underlines something I’d already suspected — that I’m something of a ‘comfort zone’ reader.

I have a tendency to return to the same authors, genres, and series.

In any given year I’m more likely to pick up a book by an author I’ve read before than try a new one.

Moreover, there’s a good chance I’ll read more than one book by that author that year.

One in every three books I read is written by a new author, but I’d like to get that closer to every other book.

There aren’t many noticeable patterns in my general reading productivity.

During the seven years, the number of books read hasn’t risen or fallen for more than two consecutive years before going in the other direction again.

I had expected this, but the peaks and troughs don’t map as easily as I thought they might to particular times in my life.

The only trend I could identify is that my appetite for long books has diminished a little in the last few years.

The mean page count has gradually fallen from 478 pages in 2015 to 325 pages in 2018.

So, what do I do with this information?.Beyond writing this article, it has suggested a few tweaks to my reading habits that I’m going to endeavour to make:Break out of my comfort zone a bit more: I’d like a higher ratio of new to old authors.

Within that goal I’d like to increase the number of female authors, because one in every seven is a poor showing.

Comfort zone applies to genre, too.

I plan to increase the volume of Nonfiction I read for leisure.

If I do this exercise again in a few years, I’d like the percentage to at least be in sight of double digits.

If the volume of Fantasy books is going to stay consistent, I’d like to branch out to other authors and series.

My reading time is precious to me, but I don’t think any of these changes would impact it for the worse.

I usually don’t go in for reading challenges, but I’m excited about the idea of steering my habits in a lateral direction.

I should say a few words about the exercise itself.

It wasn’t a complicated data set I was working with, but it was good to apply principles like data cleaning using a tool like OpenRefine.

With regards to the data storytelling course, I was able to put some practical tips to good use.

It taught me the effectiveness of storyboarding with my data.

I also picked up some specific story mechanisms demonstrated by the instructor, which came in helpful.

I’m not claiming that this work has any particular value outside of personal catharsis.

Hopefully there is some novelty value for folk who love reading or love data.

Or, like me, those fall in the middle of the Venn diagram.

That said, I don’t want to start seeing books as a series of labels and percentages.

I’m going to forget about the data for a while, and get on with the reading.


. More details

Leave a Reply