Is O(n) the only way to improve your algorithms?

Python ShortsIs O(n) the only way to improve your algorithms?How Understanding the problem statement could help you to optimize your codeRahul AgarwalBlockedUnblockFollowFollowingApr 25Photo by Helloquence on UnsplashWhenever we talk about optimizing code we always discuss the computational complexity of the code.

Is it O(n) or O(n-squared)?But, sometimes we need to look beyond the algorithm and look at how the algorithm is going to be used in the real world.

In this post, I am going to talk about a similar situation and how I solved it.

The Problem:Recently I was working with an NLP problem and had to clean up numbers from the text.

On its surface, it feels like a fairly trivial problem.

And one could do it using many ways.

I did it in the way given below.

The function replaces numbers with #s in a question string.

def clean_numbers(x): x = re.

sub('[0-9]{5,}', '#####', x) x = re.

sub('[0-9]{4}', '####', x) x = re.

sub('[0-9]{3}', '###', x) x = re.

sub('[0-9]{2}', '##', x) return xThis is all clean code but can we optimize its run time?.Think before proceeding further.

The Trick:The trick to optimize this lies in understanding the objective of the code.

While most of us understand the O(n) paradigm and could always go look for a better algorithm, one of the forgotten ways to optimize your code is to answer a question:How this code is going to be used?Do you know, if we just check for a number before subbing strings, we can get a 10x runtime improvement !!!Since most of the strings won’t have a number, we won’t evaluate 4 function calls to re.

sub.

And it takes just one simple if statement.

def clean_numbers(x): if bool(re.

search(r'd', x)): x = re.

sub('[0-9]{5,}', '#####', x) x = re.

sub('[0-9]{4}', '####', x) x = re.

sub('[0-9]{3}', '###', x) x = re.

sub('[0-9]{2}', '##', x) return xWhat was necessary to do this?The understanding of the data we are using and understanding of the problem statement.

These improvements are hidden of course and you might need to think about how many times the piece of code will be run?.or who will trigger the function?.to do such sort of improvements.

But it will be worth it.

Think of every function you write as your child and try to improve it as much as you can.

We can use %%timeit magic function in the Jupyter notebook to see the performance results.

%%timeitclean_numbers(“This is a string with 99 and 100 in it”)clean_numbers(“This is a string without any numbers”)The results for the above simple function call using the first function is:17.

2 µs ± 332 ns per loop (mean ± std.

dev.

of 7 runs, 100000 loops each)And using the second function it is:12.

2 µs ± 324 ns per loop (mean ± std.

dev.

of 7 runs, 100000 loops each)We have improved 2x even when we have 50% of our sentences having numbers.

Since we won’t have that many sentences in the database containing numbers, we could do this and get good improvements.

ConclusionThere are different ways to improve an algorithm and maybe even the above problem can be improved further using some logic I cannot think of.

But the point I am trying to make is that thinking in a closed box to come up with a solution will never help you.

If you understand the problem statement and the goal you are trying to achieve, you will be able to write much better and optimized code.

So next time your boss asks you to reduce Computational complexity of an algorithm, tell him that while you were not able to reduce the complexity, you were able to make the algorithm faster using some custom business logic.

I guess he would be impressed.

Also if you want to learn more about Python 3, I would like to call out an excellent course on Learn Intermediate level Python from the University of Michigan.

Do check it out.

I am going to be writing more beginner friendly posts in the future too.

Let me know what you think about the series.

Follow me up at Medium or Subscribe to my blog to be informed about them.

As always, I welcome feedback and constructive criticism and can be reached on Twitter @mlwhiz.

.

. More details

Leave a Reply