Process Mining: Everything You Need to Know

Written by
inverbis analytics
7 de June de 2022 Max 12 min read

In this guide on process mining, we are going to cover in depth what process mining is, what it is for, what type of companies and in which sectors it can apply, its benefits, examples, and use cases, the challenges you can find, and much more.

What is Process Mining? 

Process mining is a technique that reconstructs the workflows of organizations from the information that the logs and information system databases collect.

By reconstruction, we mean a visualization of the workflow, in the form of sequences of connected activities that indicate the direction of the process as it progresses. With every execution of a process (for example, every pizza order, from the placing of the call until the delivery person delivers it), this visualization represents all recorded activities. We call this representation of each execution in the process a trace.

Each trace has a series of essential data: what activities, in what order, how often they have happened, and how much time has elapsed between each pair of contiguous activities (which we represent as arcs). However, the power of process mining comes from its ability to aggregate the information of individual traces.

In this way, you can identify each unique trace sequence (called a variant) and measure how often it appears, and calculate statistics on activity durations and arc times, providing a complete view of your workflow.


Obviously, these data offer the entire statistical range associated with the process: the distribution of variants and execution times of activities and traces.

Process Mining vs. Data Mining vs. Business Process Management

Some confusion exists among these three disciplines. Although they have points in common, they are different.

Process Mining vs. Data Mining

In general, data mining (or data mining) is a general term that covers a series of heterogeneous techniques for extracting relevant information from raw, preprocessed data (data). But, as with traditional mining, in which each type of raw material (coal, diamond, gold, etc.) requires a specific extraction method and different tools, mining process data requires a highly specialized analytical discipline.

In fact, as we will see later, process mining works with very specific data models and a series of algorithms that differentiate it from other types of more popular data-mining techniques. These include regression, clustering, or classification, which, in any case, can also apply to process mining in a complementary way, depending on the case).

Process Mining vs. Business Process Management (BPM or BPMN)

How is a representation based on process mining different from one made in BPMN?

It is a completely different conception:

  • BPMN notation supports process design. That is, it is a conceptualization of how a process is interpreted or intended to work. It is an ideal model.
  • Process mining takes data from reality and represents what has happened, not what is expected to happen.
  • Therefore, they are complementary techniques. We can consult the results that process mining obtains, to see how much and how they resemble the process designed using BPMN. Or, on the contrary, we can export the result from BPMN notation and use it as input for redesigning a process. process.

By analyzing the digital footprint, in the form of traces of everything that has happened while executing operations, process mining provides an objective view of how businesses behave, reflecting all their complexity. Meanwhile, the classical notations (BPMN and others) offer simplified views that allow a quick understanding of what is intended, rather than reality.

Process mining inevitably provides us with a complex view since, in reality, there are many more alternative paths (variants) than design usually foresees.

Thus, the aggregate representation of a process (the sum of all the executions with all their variants at the same time) usually generates difficult-to-unravel images, known in the nomenclature of this technique as “spaghetti.”

The spaghetti image is impractical for analysis, but not entirely useless. It provides analysts, process owners, and improvement teams with a comprehensive overview of what’s going on. From there, we proceed to observe the number of variants in which the process breaks down, and the aggregate visualization of all the traces that follow the same path, with the ability to filter them down to each individual execution.

In summary, process mining is an objective verification of what has been done and how it has been done to carry out the organization’s operations (its processes). The information can be consulted at different levels of aggregation to analyze the reasons that have led to each behavior, expected or (as happens in most cases) unexpected.

How Process Mining Works

Process mining captures the events that the company’s information systems create when executing an activity. In other words, it collects the fingerprint of a process.

Events consist of basic fields and attributes. The basic fields are:

  • A unique trace identifier: Identifies all the events that make up a trace. For example, each time a pizza is ordered from a home-delivery service, the order generates a trace of activities, beginning with the order and continuing through the assignment of the production to a restaurant, to final delivery to the customer.
  • The description of the activity: We must know which event has been executed. For example, “the response to the customer call” is an activity (generally appearing in a less human way in the databases), followed within the same trace by the “assignment of the order” to a restaurant.
  • The timestamp: A record of the start of the activity (“answering the customer’s call”) at a certain time (the more detail the better—minutes, seconds . . . ) and another record of the end of the activity. The existence of start and end allows us to know both the duration of each activity and the time spent between activities. When we only have one of those records, we can only measure the duration of “activity starts” or “activity ends.”

The attributes are all the additional data that give context to the variation of the process. What types of attributes? There is no specific rule, but they can group around the branches of a traditional cause-effect chart (Ishikawa diagram):

  • People: what shift, what hours, what office (or, following the example above, what restaurant), what qualification or experience
  • Machines: all kinds of technical means that intervened to carry out the activity, including their sensors (for example, ovens and temperature, what type of oven, from what manufacturer)
  • Materials: raw materials and supplies—for us, the ingredients of the pizza
  • Methods: what kind of procedure (following our example, what recipe)

Source: InVerbis Analytics

Attributes are the way to fully enrich the process-mining technique. They allow us to filter the data by isolating execution variants related to a behavioral factor or several of them. In turn, this allows us to detect different problems in isolation, apply different solutions, and generate different monitoring.

For a more detailed look at how InVerbis process mining works, you can go here.

What Companies Can Do Process Mining?

All. Processes, formal or not, documented or not, always exist. The generation of value occurs by breaking down activities. Obviously, at the level of an artisan (such as a potter who controls all the phases of making his product), the complexity is reduced when it comes to controlling the final result, from a single point of control.

But when complexity increases, the coordination of all activities and their alignment with customer expectations and cost efficiency become more remote. Then, the need arises to structure processes, establish performance indicators, and prevent the defect of one phase from becoming the input of the next . . . the old idea of ​​garbage-in/garbage out (or, literally, if what you receive is “garbage,” what you deliver still contains “garbage”).

Therefore, service companies, industrial companies, nonprofit organizations, or public administrations can use process-mining techniques and extract their benefits.

What Professionals Can Do Process Mining?

The beneficiaries are any person in an organization, regardless of the level of management or participation in managing the process. If the improvement is understood as a cooperative effort, the data is the knowledge necessary to knowing how to improve.

On the side of the practitioners, process mining involves many possible specialists:

  • Data scientists: They know how to approach the exploitation of the results for their conjugation with other sources of information, create reports, and generate knowledge for third parties.
  • Process consultants (and the so-called “process owners” in organizations more oriented toward transversal management): They propose the improvement cycles and can guide what data must be captured from the process, as well as how to focus the monitoring.
  • The information technology equipment: Data capture and its exportation take place within its infrastructure. In cross responsibility with data scientists, storage and transformation of data is another responsibility affecting them.

All in all, the information generated is within the reach of any professional carrying out work in an organization, and not knowing the scientific foundations of process mining is not an impediment. With the right leadership, it is not necessary, but interpreting statistical information is highly beneficial. This aspect would equal that of any data exploitation, whether or not it comes from process mining.

What is Process Mining for? 

Process mining is a transversal technique.

It has no limit; as long as there is a sufficient digital footprint of workflows, of whatever type, it is applicable. But there are obvious generic applications:

  • Process optimization and improvement: Take control, reduce variation, and avoid waste and repetition of tasks. The mere fact of discovering reality and being able to model it enables addressing process reengineering.
  • Automation: To automate correctly, you must know what is actually being done. Mining allows for that reality check that also removes perceptions and opinions. Any consultant for this type of tool can explain the complexity of unraveling the real process.
  • Compliance: Process mining works with the entire universe of data, not samples. This allows us to analyze the compliance of each and every one of the traces with rules and procedures, leading us to the correction and prevention of risks as executions occur, not at audit time. In addition, it prevents the sampling from leaving out cases with harmful potential.
  • Prediction and planning: Having captured the variants of the process, we can create predictive models that can generate alarms when a certain variant is expected, sufficiently in advance of its appearance. The variant can be desired or undesired, foreseen by design or not, but the prediction allows the action to proceed in each case, especially resource planning.

Benefits of Process Mining 

Process mining has significant benefits.

First, it allows you to discover the Real Life of the Company. By revealing what actually happens, it allows us to understand what alternative paths made it possible to reach (or prevent reaching) the desired KPI. In other words, it allows us to get away from the means as indicators and concentrate on the distributions of each way of completing a procedure.

By concentrating on distributions, we can automatically measure the improvement potential of a process. We see when we are better and when we are worse. If it can be done well or better than average, why can’t we increase the number of cases?

Second, mining immediately detects activity repetitions. In process mining these repetitions are called “loops,” and they can occur in a single activity, as well as in a cycle that returns to very distant phases of the process.

Loop cases allow you to quickly determine margins for improvement without having to re-engineer the process—simply determine the causes of repetition and establish corrective measures. The loop is measured in terms of frequency and impact, with respect to time variations.

Finally, another important advantage of process mining is that it can enable examining performance in real time or very close to real time. This allows you to focus on day-to-day execution and detect unwanted deviations or those that require intervention.

Additionally, the constant examination of detecting variants and their appearance frequencies allows estimating the mutation of the process into new activities or variants that were in the minority and are now carried out preferentially.

Implementation Challenges in Process Mining

Experience shows us that the first barriers are intellectual and cultural, followed by the technical.


Cultural barriers reside in the fact that the mental images that organizations manage regarding their activity and work are not based on processes. The idea that value generation is a sequence of jobs divided among different people, and the end of some is the beginning of others, is old but not part of the conversations and visions around effectiveness. Yes, there are procedures, but often as a bureaucratic repository of compliance with regulations and internal control, not so much in response to the need to provide value to the customer.

The consequence is that where a process begins and where it ends is not well known, including what activities a process contains, whether it exists (for instance, “bill payment”), and who is responsible for complying with it. Such disciplines as Six Sigma, Lean Management, or the classic TQM have long addressed this aspect, but not all companies operate on their doctrinal foundations.

An additional result is that if there are no processes configured as units of management or monitoring the flow of internal value, there is no tradition of improvement based on scientific thinking (the old PDCA). In other words, there is also no cultural approach to improvement based on data and not opinions—just one of the virtues (and opportunities) of process mining.

Access to Data

Once the cultural barrier has been overcome, the problem is access to data.

Do we have data that informs us of the process? The answer is that given the massive computerization of all work procedures, there is a multitude of data that generally allows conducting basic analyses. And, as usual, they yield obvious and useful conclusions for improvement and calculation of nonquality costs.

Since nobody really thought about processes, process mining, or what concepts to use to define and fix the footprint of process digitization, the data is not well prepared to be mined easily and quickly. In addition, if the entire process is to be well understood, absent data must not only be captured; it must also be traceable through more than one information system. Ordering a pizza from a call center will generate a way of recording the call that will not necessarily be traceable when the order arrives at the restaurant, but it will surely use a different identifier from the one used in the previous phase.

Resolving these circumstances takes hours of work, generally by consultants who must solve the data capture by hand, transform it into a minable format, and having previously defined what a specific process is, define its start and its end. It is often laborious, yet not only feasible but resolved quite quickly, by a well-trained professional in the discipline (at InVerbis, we help in this phase with our team).

But the important thing is to know that data already exists and is usable to understand the ROI that a better design of the data model and generalized capture systems can achieve. If you want to be data-driven and take advantage of the data revolution for the control and improvement of processes, systematize the capture for the design of process maps, and have data models prepared for mining, this is the way to go.

How to Capture and Manage Data for Process Mining

In the first implementation phase, the organization that embraces process mining usually works with a single process and with data extracted from a single information system. It can almost be said that more threads than processes are addressed, though that’s a blurry distinction.

An extraction in .csv format is enough to do this, making sure that we have the basic fields. Obviously, the classic ETLs and other formats for accessing a system (API Rest) are equally valid, but data exports in formats as simple as .csv are definitely valid.

However, if we want to systematize the power of process mining to discover, improve, and control our real activity, we need to update the data and mine it with high frequency (real time or close to real time—each business environment will impose a requirement). And we must be able to bring together data that spans the full breadth of processes, no matter from how many systems we must drink the data. Finally, in some way, we must be able to compose the data to relate to the internal customer-supplier chain and analyze the value flow of the company.

Two technological solutions for this require some integrations and internal developments but allow both solving the problem and InVerbis working:

Data Virtualization

Data virtualization allows connecting all the sources to the virtualization software and, subsequently, designing “views” of the data. Those “views” are the extracts we need, in the format we need for mining it. From the virtualization tool, we import InVerbis and proceed to the analysis and monitoring of the data.


By design, applying observability techniques to systems generates traces of events in an organization’s infrastructure. Since every day more work activities (even manual) are linked to cloud services, the generation of traces becomes a problem, (said with caution) obviously, minor.

How to Start Doing Process Mining

Choose a Process

Make sure you know it and can minimally describe it. Check which systems manage it and what digital footprint it leaves at the moment. If it’s not complete, it doesn’t matter.

We make sure that we have a trace or an execution identifier, a timestamp, and an explanation of the event (the activity carried out).

If the InVerbis tool does not support the format to do so, contact us and we will transform the file.

Mine. Filter Data. Export Results. Analyze

Find the source of your problems. Consider what you would improve and how you want to represent the monitoring on a dashboard. Connect the data to update as often as you need—you can use the APIs directly or . . . call us.

Get Results and Broaden the Spectrum

With a virtualization tool (we work with Denodo) and observability techniques, we will improve the data and broaden the spectrum of analysis of your organization’s value-generation flows.


Read more…

The parallel activity dilemma in Process Mining

The parallel activity dilemma in Process Mining

Why commercial Process Mining does not do it (that much) One of the main advantages of discovery algorithms in process mining is that they are able to produce process models that abstractly represent how execution flows. This includes determining and modeling when...