Process Mining in 10 minutes with R

Process Mining in 10 minutes with RDr.

Gregor ScheithauerBlockedUnblockFollowFollowingMay 9, 2018Process Mining makes process analysis relevant again.

Instead of relying solely on workshops, interviews or outdated process documents Process Mining makes use of data that is generated in your business systems.

It can automatically generate actual process models with frequencies, and performance measures.

Moreover, discovered process models let you easily identify any compliance issues at once.

If, and only if, data is available.

In this article I show you how to get started with Process Mining using R.

Process MiningProcess mining techniques allow for extracting information from event logs.

For example, the audit trails of a workflow management system or the transaction logs of an enterprise resource planning system can be used to discover models describing processes, organizations, and products.

Moreover, it is possible to use process mining to monitor deviations.

[1]Mined process example with bupaRTooling — bupaRCurrently, there are a number of tools available to perform Process Mining [2].

One open source tool is bupaR [3] that allows to use process mining capabilities on top of the data science language R [4].

bupaR is made by Gert Janssenswillen and consists as a number of R packages.

Package overview, https://www.

bupar.

net/images/workflow.

PNGExample dataFor this post I use real world data (anonymized) of a banking credit application process provided by the BPI Challenge 2017 [5].

The BPI Challenge is a contest held by the organizers of the ‘International Workshop on Business Process Intelligence (BPIC)’.

For some years now they provide real-world datasets along with business questions for process mining enthusiast to solve.

In short, it could be described as a Kaggle challenge for Process Mining.

Please note, that I shortened the log file due to file size limitations by GitHub.

If you are interested in the challenge’s outcome, feel free to read our paper.

SetupIf you do not only want to read along but try for yourself, please do so.

I provide here the list of useful tools and files.

Download R https://cran.

r-project.

org/mirrors.

htmlDownload RStudio (Desktop) https://www.

rstudio.

com/products/rstudio/Clone GitHub project https://github.

com/scheithauer/processmining-bupaRAnalysisEventlog overviewevents %>% summaryNumber of events: 1123342Number of cases: 31509Number of traces: 4047Number of distinct activities: 26Average trace length: 35.

65146Start eventlog: 2016-01-01 10:51:15End eventlog: 2017-02-01 15:00:30Activity overviewevents %>% activity_frequency(level = "activity")# A tibble: 26 x 3 Activity absolute relative <fct> <int> <dbl> 1 A_Accepted 31509 0.

0561 2 A_Cancelled 10431 0.

0186 3 A_Complete 31362 0.

0558 4 A_Concept 31509 0.

0561 5 A_Create Application 31509 0.

0561 6 A_Denied 3753 0.

00668 7 A_Incomplete 23055 0.

0410 8 A_Pending 17228 0.

0307 9 A_Submitted 20423 0.

0364 10 A_Validating 38816 0.

0691 # .

with 16 more rowsFilter processes where one activity need to be presentevents %>% filter_activity_presence(activities = c('A_Cancelled')) %>% activity_frequency(level = "activity")# A tibble: 21 x 3 Activity absolute relative <fct> <int> <dbl> 1 A_Accepted 10431 0.

0730 2 A_Cancelled 10431 0.

0730 3 A_Complete 10321 0.

0723 4 A_Concept 10431 0.

0730 5 A_Create Application 10431 0.

0730 6 A_Incomplete 1413 0.

00989 7 A_Submitted 7573 0.

0530 8 A_Validating 1504 0.

0105 9 O_Cancelled 13735 0.

0962 10 O_Create Offer 13735 0.

0962 # .

with 11 more rowsGenerate process mapevents %>% filter_activity_frequency(percentage = 1.

0) %>% filter_trace_frequency(percentage = .

80) %>% process_map(render = F) %>% export_graph(file_name = '.

/02-output/01_pm-bupar_process map.

png', file_type = 'PNG')Generate process map with performance measuresevents %>% filter_activity_frequency(percentage = 1.

0) %>% filter_trace_frequency(percentage = .

80) %>% process_map(performance(mean, "mins"), render = F) %>% export_graph(file_name = '.

/02-output/02_pm-bupar_process map performance.

png', file_type = 'PNG')Generate a matrix with activity follower frequency overviewprecedence_matrix <- events %>% filter_activity_frequency(percentage = 1.

0) %>% filter_trace_frequency(percentage = .

80) %>% precedence_matrix() %>% plot()As an example the activity O_Sent (mail and online) is often followed by the activity W_Call after offersGenerate variant overviewtrace_explorer <- events %>% trace_explorer(coverage = 0.

5)Show throughput time; In hours by Application Typeevents %>% filter_trace_frequency(percentage = .

80) %>% # show only the most frequent traces group_by(`(case)_ApplicationType`) %>% throughput_time('log', units = 'hours')# A tibble: 2 x 9 `(case)_ApplicationType` min q1 median mean q3 max <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 New credit 0.

0781 278.

479.

524.

758.

4058.

2 Limit raise 0.

0597 219.

327.

400.

529.

2110.

Show throughput time; In hours by Loan Goalevents %>% filter_trace_frequency(percentage = .

80) %>% # show only the most frequent traces group_by(`(case)_LoanGoal`) %>% throughput_time('log', units = 'hours')# A tibble: 14 x 9 `(case)_LoanGoal` min q1 median mean q3 max <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Existing loan takeover 0.

118 307.

499.

543.

758.

2779.

2 Home improvement 0.

174 289.

471.

521.

756.

2442.

3 Car 0.

139 239.

403.

488.

756.

3269.

4 Other, see explanation 0.

125 268.

461.

515.

757.

4058.

5 Remaining debt home 0.

135 351.

694.

632.

805.

3252.

6 Not speficied 0.

131 307.

621.

563.

777.

1442.

7 Unknown 0.

0597 214.

349.

429.

733.

2013.

8 Tax payments 51.

4 293.

437.

506.

742.

1220.

9 Caravan / Camper 0.

174 232.

358.

457.

744.

2110.

10 Motorcycle 22.

7 254.

410.

489.

763.

1338.

11 Boat 55.

0 266.

395.

512.

743.

1535.

12 Business goal 227.

255.

403.

526.

756.

1167.

13 Extra spending limit 17.

8 258.

406.

485.

743.

1356.

14 Debt restructuring 732.

740.

748.

748.

757.

765.

ConclusionProcess Mining is much more than using a specific tool.

Mostly, it is an iterative procedure involving asking the relevant business questions, understanding the data, interpreting the data correctly (statistical significance vs.

practical relevance), and most importantly deriving measures for improving the process under investigation.

Commercial tools exist that support this this iterative procedure.

bupaR allows you to apply Process Mining analyses for free (i.

e.

, without licensing cost) that are not as flexible as in commercial tools, yet.

bupaR also provides interactive dashboards, which I did not test, but plan to do so in the near future.

If you are interested in Process Mining, please feel free to reach out.

All the best,GregorReferences[1] http://www.

processmining.

org/research/start[2] https://en.

wikipedia.

org/wiki/Process_mining#Software_for_process_mining[3] https://www.

bupar.

net/[4] https://www.

rstudio.

com/[5] http://www.

win.

tue.

nl/bpi/doku.

php?id=2017:challenge.. More details

Leave a Reply