Conditions Are Power-Law Distributed: An Example

First sort them then pass them through uniq -c to count them.

> grep -R –include=’*.

py’ ‘if ‘ .

| perl -nle ‘print $1 if /.

*if (.

*):/’ | sort | uniq -c 1 “ “ in v 2 “ at ‘^’ position” in err 1 “#LogService.

ClearLog” in _data[u”Actions”] 4 “%s” not in validateSort these numerically in reverse order and we can see the heavy hitters.

> grep -R –include=’*.

py’ ‘if ‘ .

| perl -nle ‘print $1 if /.

*if (.

*):/’ | sort | uniq -c | sort -n -r 2332 __name__ == ‘__main__’ 682 ‘message’ in response 645 state == ‘present’ 644 not module.

check_modeWhat we want eventually is a histogram showing how many single-use conditions there are, how many conditions are used twice, etc.

Use “cut” to extract the counts, then the same “sort | uniq -c” trick to get a histogram.

> grep -R –include=’*.

py’ ‘if ‘ .

| perl -nle ‘print $1 if /.

*if (.

*):/’ | sort | uniq -c | sort -n -r | cut -c 1–5 | sort -n | uniq -c28611 1 4817 2 1335 3  623 4Sure enough, there are lots of conditions (28K) used once, many fewer used twice, many fewer used three times, and on down.

Down at the bottom we have one condition used 2332 times.

Graphing this data we get an inkling that we’re not in Normalistan any more.

Shifting the axes to logarithmic shows something like a power-law distribution.

There is a trend in how often a condition “ought” to appear.

And there you have it — preferential attachment at work.

The more often a condition appears in a codebase, the more likely that condition is to be used the next time a conditional appears.

.

. More details

Leave a Reply