Linear programming and discrete optimization with Python using PuLP

Linear programming and discrete optimization with Python using PuLPLinear and integer programming are key techniques for discrete optimization problems and they pop up pretty much everywhere in modern business and technology sectors.

We will discuss how to tackle such problems using Python library PuLP and get a fast and robust solution.

Tirthajyoti SarkarBlockedUnblockFollowFollowingApr 19IntroductionDiscrete optimization is a branch of optimization methodology which deals with discrete quantities i.

e.

non-continuous functions.

It is quite ubiquitous in as diverse applications such as financial investment, diet planning, manufacturing processes, and player selection for professional sports.

Linear and (mixed) integer programming are techniques to solve problems which can be formulated within the framework of discrete optimization.

Knowledge of such optimization techniques is extremely useful for data scientists and machine learning (ML) practitioners as discrete and continuous optimization lie at the heart of modern ML and AI systems as well as data-driven business analytics processes.

There are many commercial optimizer tools, but having hands-on experience with a programmatic way of doing optimization is invaluable.

There is a long and rich history of the theoretical development of robust and efficient solvers for optimization problems.

However, focusing on practical applications, we will skip that history and move straight to the part of learning how to use programmatic tools to formulate and solve such optimization problems.

There are many excellent optimization packages in Python.

In this article, we will specifically talk about PuLP.

But before going to the Python library, let us get a sense of the kind of problem we can solve with it.

An example problem (or two)Suppose you are in charge of the diet plan for high school lunch.

Your job is to make sure that the students get the right balance of nutrition from the chosen food.

However, there are some restrictions in terms of budget and the variety of food that needs to be in the diet to make it interesting.

The following table shows, in detail, the complete nutritional value for each food item, and their maximum/minimum daily intake.

The discrete optimization problem is simple: Minimize the cost of the lunch given these constraints (on total calories but also on each of the nutritional component e.

g.

cholesterol, vitamin A, calcium, etc.

Essentially, in a casual mathematical language, the problem is,Notice that the inequality relations are all linear in nature i.

e.

the variables f are multiplied by constant coefficients and the resulting terms are bounded by constant limits and that’s what makes this problem solvable by an LP technique.

You can imagine that this kind of problem may pop up in business strategy extremely frequently.

Instead of nutritional values, you will have profits and other types of business yields, and in place of price/serving, you may have project costs in thousands of dollars.

As a manager, your job will be to choose the projects, that give maximum return on investment without exceeding a total budget of funding the project.

Similar optimization problem may crop up in a factory production plan too, where maximum production capacity will be functions of the machines used and individual products will have various profit characteristics.

As a production engineer, your job could be to assign machine and labor resources carefully to maximize the profit while satisfying all the capacity constraints.

Fundamentally, the commonality between these problems from disparate domains is that they involve maximizing or minimizing a linear objective function, subject to a set of linear inequality or equality constraints.

For the diet problem, the objective function is the total cost which we are trying to minimize.

The inequality constraints are given by the minimum and maximum bounds on each of the nutritional components.

PuLP — a Python library for linear optimizationThere are many libraries in the Python ecosystem for this kind of optimization problems.

PuLP is an open-source linear programming (LP) package which largely uses Python syntax and comes packaged with many industry-standard solvers.

It also integrates nicely with a range of open source and commercial LP solvers.

You can install it using pip (and also some additional solvers)$ sudo pip install pulp # PuLP$ sudo apt-get install glpk-utils # GLPK$ sudo apt-get install coinor-cbc # CoinORDetailed instructions about installation and testing are here.

Then, just import everything from the library.

from pulp import *See a nice video on solving linear programming here.

How to formulate the optimization problem?First, we create a LP problem with the method LpProblemin PuLP.

prob = LpProblem("Simple Diet Problem",LpMinimize)Then, we need to create bunches of Python dictionary objects with the information we have from the table.

The code is shown below,For brevity, we did not show the full code here.

You can take all the nutrition components and create separate dictionaries for them.

Then, we create a dictionary of food items variables with lower bound =0 and category continuous i.

e.

the optimization solution can take any real-numbered value greater than zero.

Note the particular importance of the lower bound.

In our mind, we cannot think a portion of food anything other than a non-negative, finite quantity but the mathematics does not know this.

Without an explicit declaration of this bound, the solution may be non-sensical as the solver may try to come up with negative quantities of food choice to reduce the total cost while still meeting the nutrition requirement!food_vars = LpVariable.

dicts("Food",food_items,lowBound=0,cat='Continuous')Next, we start building the LP problem by adding the main objective function.

Note the use of thelpSum method.

prob += lpSum([costs[i]*food_vars[i] for i in food_items])We further build on this by adding calories constraints,prob += lpSum([calories[f] * food_vars[f] for f in food_items]) >= 800.

0prob += lpSum([calories[f] * food_vars[f] for f in food_items]) <= 1300.

0We can pile up all the nutrition constraints.

For simplicity, we are just adding four constraints on fat, carbs, fiber, and protein.

The code is shown below,And we are done with formulating the problem!In any optimization scenario, the hard part is the formulation of the problem in a structured manner which is presentable to a solver.

We have done the hard part.

Now, it is the relatively easier part of running a solver and examining the solution.

Solving the problem and printing the solutionPuLP has quite a few choices of solver algorithms (e.

g.

COIN_MP, Gurobi, CPLEX, etc.

).

For this problem, we do not specify any choice and let the program default to its own choice depending on the problem structure.

prob.

solve()We can print the status of the solution.

Note, although the status is optimal in this case, it does not need to be so.

In case the problem is ill-formulated or there is not sufficient information, the solution may be infeasible or unbounded.

# The status of the solution is printed to the screenprint("Status:", LpStatus[prob.

status])>> Status: OptimalThe full solution contains all the variables including the ones with zero weights.

But to us, only those variables are interesting which have non-zero coefficients i.

e.

which should be included in the optimal diet plan.

So, we can scan through the problem variables and print out only if the variable quantity is positive.

for v in prob.

variables(): if v.

varValue>0: print(v.

name, "=", v.

varValue)>> Food_Frozen_Broccoli = 6.

9242113 Food_Scrambled_Eggs = 6.

060891 Food__Baked_Potatoes = 1.

0806324So, the optimal solution is to eat 6.

923 servings of frozen broccoli, 6.

06 servings of scrambled eggs and 1.

08 servings of a baked potato!You are welcome to download the whole notebook, the data file, and experiment with various constraints to change your diet plan.

The code is here in my Github repository.

Finally, we can print the objective function i.

e.

cost of the diet in this case,obj = value(prob.

objective)print("The total cost of this balanced diet is: ${}".

format(round(obj,2)))>> The total cost of this balanced diet is: $5.

52What if we want a solution with whole numbers?As we can see that the optimal result came back with a set of fractional numbers of servings for the food items.

This may not be practical and we may want the solution to be forced to have only integer quantities as servings.

This brings to us the technique of integer programming.

The algorithm used for the previous optimization is simple linear programming where the variables were allowed to assume any real number value.

Integer programming forces some or all of the variables to assume only integer values.

In fact, integer programming is a harder computational problem than linear programming.

Integer variables make an optimization problem non-convex, and therefore far more difficult to solve.

Memory and solution time may rise exponentially as you add more integer variables.

Fortunately, PuLP can solve an optimization problem with this kind of restrictions too.

The code is almost identical as before, so it is not repeated here.

The only difference is that the variables are defined as belonging to Integer category as opposed to Continuous.

food_integer = LpVariable.

dicts("Food",food_items,0,cat='Integer')For this problem, it changes the optimal solution slightly, adding iceberg lettuce to the diet and increasing the cost by $0.

06.

You will also notice a perceptible increase in the computation time for the solution process.

Therefore, the optimal balanced diet with whole servings consists of——————————————————————–Food_Frozen_Broccoli = 7.

0Food_Raw_Lettuce_Iceberg = 1.

0Food_Scrambled_Eggs = 6.

0Food__Baked_Potatoes = 1.

0A cool application of integer programming is solving a driver-scheduling problem which can be an NP-hard problem.

See this article (also note in the article, how they compute the costs of various actions and use them in the optimization problem),An Intro to Integer Programming for Engineers: Simplified Bus SchedulingThis article is part of Remix’s series on the software engineering problems we face.

In this installment, Remix…blog.

remix.

comHow to incorporate binary decisions in a linear programming problem?Often, we want to include some kind of ‘If-then-else” kind of decision logic in the optimization problem.

What if we don’t want both broccoli and iceberg lettuce to be included in the diet (but only one of them is fine)?How do we represent such decision logic in this framework?Turns out, for this kind of logic, you need to introduce another type of variables called indicator variables.

They are binary in nature and can indicate the presence or absence of a variable in the optimal solution.

Involving indicator function as a constraint in a LP problemThanks for contributing an answer to Mathematics Stack Exchange!.Please be sure to answer the question.

Provide details…math.

stackexchange.

comBut for this particular problem, there is an apparent problem with using indicator variables.

Ideally, you want the cost/nutritional value of a food item to be included in the constraint equation if the indicator variable is 1 and ignore it if is zero.

Mathematically, it is intuitive to write this as a product of the original term (involving the food item) and the indicator variable.

But the moment you do that, you are multiplying two variables and making the problem nonlinear!.It falls under the domain of quadratic programming (QP) in that case (quadratic because the terms are now the product of two linear terms).

The popular machine learning technique Support Vector Machine essentially solves a quadratic programming problem.

What is an intuitive explanation of quadratic programming, and how is it defined in SVM?Answer: Quadratic programming involves minimizing a form that is quadratic in the components of the unknown vector…www.

quora.

comHowever, this general concept of using an indicator variable for expressing binary logic in a linear programming problem is also extremely useful.

We have given a link to a problem of solving Sudoku puzzle by LP in the next section where this trick is used.

It turns out that there is a clever trick to incorporate such binary logic in this LP without making it a QP problem.

We can denote the binary variables as food_chosen and instantiate them as Integer with lower and upper bounds of 0 and 1.

food_chosen = LpVariable.

dicts("Chosen",food_items,0,1,cat='Integer')Then we write a special code to link the usual food_vars and the binary food_chosen and add this constraint to the problem.

for f in food_items: prob += food_vars[f]>= food_chosen[f]*0.

1 prob += food_vars[f]<= food_chosen[f]*1e5If you stare at the code long enough, you will realize this effectively means that we are giving food_vars importance only if the corresponding food_chosenindicator variable is 1.

But this way we avoid the direct multiplication and keep the problem structure linear.

To incorporate the either/or condition of broccoli and iceberg lettuce, we just put a simple code,prob += food_chosen['Frozen Broccoli']+food_chosen['Raw Iceberg Lettuce']<=1This ensures the sum of these two binary variables is at most 1, which means only one of them can be included in the optimal solution but not both.

More applications of linear/integer programmingIn this article, we showed the basic flow of setting up and solving a simple linear programming problem with Python.

However, if you look around, you will find countless examples of engineering and business problems which can be transformed into some form of LP and then solved using efficient solvers.

Following are some of the canonical examples to get you started thinking,Solving Sudoku as an LP problemMaximizing return on the long-term investment as an LP problemLP applied to production planningSolving warehouse location problem using ILPMany machine learning algorithms also use the general class of optimization of which linear programming is a subset — convex optimization.

See the following article for more information about it,What lies beneath?.Optimization at the heart of Machine LearningWe show the core optimization frameworks behind the most popular machine learning/statistical modeling techniques.

towardsdatascience.

comSummary and conclusionIn this article, we illustrated solving a simple diet optimization problem with linear and integer programming techniques using Python package PuLP.

It is noteworthy that even the widely-used SciPy has a linear optimization method built-in.

Readers are encouraged to try various other Python libraries and choose a good method for themselves.

If you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.

com.

Also, you can check the author’s GitHubrepositories for other fun code snippets in Python, R, or MATLAB and machine learning resources.

If you are, like me, passionate about machine learning/data science, please feel free to add me on LinkedIn or follow me on Twitter.

Tirthajyoti Sarkar – Sr.

Principal Engineer – Semiconductor, AI, Machine Learning – ON…Georgia Institute of Technology Master of Science – MS, Analytics This MS program imparts theoretical and practical…www.

linkedin.

com.. More details

Leave a Reply