#### The Graphical Summary: Minitab’s “Aha” Tool

As any six sigma practitioner knows good data are the key to verifying the vital few root causes and then verifying that the new process has eliminated or sufficiently reduced the root cause(s) to the point where the customer requirement is met. That good data can come from new data or from valid and acceptable historic data.

Once good data is available the power of a software like Minitab will make the project team’s analytical efforts fast and comprehensive. The range of statistical and graphical tools in Minitab makes even the most complex calculations accurate and immediate. Minitab, in short, is a real productivity enhancer for belts.

Once that good data is available it is very tempting for the team to get on with analyses. Perhaps a hypothesis test is selected as one tries to compare the outputs of different shifts, different suppliers or different temperatures. Or perhaps the team decides to run a regression analysis to assess the impact of a factor (an X) on the output variable (the Y). The stats and graphs may show a strong or weak relationship or, perhaps, leave one scratching one’s head wondering what they are looking at.

While Minitab can assess even the most complex of data sets one should always start at the basics and those basics are via the Minitab Graphical Summary. Found under Stats > Basic Stats > Graphical Summary this numeric and graphical output provides an immediate and clearly understood review of the data.

The Graphical Summary contains a variety of graphs showing histogram of the data with the normal curve superimposed over the histogram. Other graphs show the distribution of data around the mean and the median. On the right side of the summary one finds the “numbers.” The Anderson-Darling Normality Test giving one the immediate feedback as to where their data are normally distributed or not. Below that one finds a number of additional statistical calculations to include an identification of the number of values in the column of data, the minimum value, the maximum value, the Variance and Standard Deviation—key measures of data variation. And values for Skewness and Kurtosis.

Why is this simple Minitab output so important to a belt? A real life example will help.

Cycle time is one of the most common Y variables in LSS projects. With few exceptions most processes have a time component to them. Customers expect something in some period of time. That time can be nano-seconds or months, but in most processes it is a Critical to Quality (CTQ) measure.

Just recently a large public sector organization was suffering from a high level of complaints from its employees about delayed payments for business expenditures paid for by the employee. Not surprisingly the employees wanted to be reimbursed quickly. Delays were upsetting, demoralizing and eventually led some employees to leave the organization.

To address this problem the CFO established a LSS team, lead by a newly trained Black Belt, to find the root causes of late payments and create a new process to meet a CTQ of no more than 45 days to pay.

As the LSS project progressed the lead belt approached the master black belt for help. She had plenty of data and did some basic calculations with that data using Excel. The team found that, on average, it was taking over 50 days for employees to be reimbursed, but they also found some strange anomalies that indicated processing times in some cases were close to and even below zero days.

The MBB reminded the lead belt of the importance of the Graphical Summary and so, together, they ran the numbers though Minitab. The first reaction was stunning and sobering. While the histogram reflected the expected right skewness of the data (most values on the left (closer to zero) with descending numbers of values extending well to the right, beyond 50 days. The stunning discovery was seeing a number of values below zero. From the stats side of the Graphical Summary they noticed that the minimum value in the Days to Pay column (the Y) was-16 days and the maximum value was 131 days. The longest time someone waited for payment was 131 days while apparently at least one person was paid 16 days before the claim was even started. Hlmmm.

Studying the Minitab worksheet the lead belt and MBB noted the values in the Days to Pay column (the Y data) were calculated by subtracting the values in the Date Completed column from the values in the Date Started column. The difference was the cycle time in days. This led to an “aha” moment.

By subsetting the Minitab worksheet to examine only the rows of data with negative Days to Pay they discovered that a number of claims were submitted with the start and finish dates in the wrong column. In those situations a larger number (Date Completed) was subtracted from the smaller number (Date Started) resulting in a negative number in the Days to Pay.

The whole situation suddenly became clear. Some of the data in the Days to Pay column was wrong. As the data for start and finish dates were manually input into the original Excel spreadsheet the cause was probably human error. And by simply reversing the values in those affected row of data the negative numbers disappeared.

When running a Graphical Summary with the corrected data the same skewness appeared (as expected) but no values were negative. In fact the shortest waiting time was 8 days while the maximum number of days to pay remained at 131. Now the team could work with data that made sense.

The lesson is obvious. Before launching into analyses of new data a Green or Black belt should always run a Graphical Anaysis of a new worksheet. This simple and easy step provides you with a world useful information and can help you avoid costly errors by working with data are not correct.

By the way, the project was successful with the maximum waiting time reduced to 35 days and over 65% of claims paid in 21 days.