Fun and effective: Teaching statistics with JMP

JMP has a growing fan club of people who are passionate about the software as a great teaching tool to more easily convey statistical concepts. Colleagues on our global academic team and I pooled some comments from noteworthy educators about why they like teaching with JMP.

 

Stangl“In the early years of teaching statistics to non-majors, it was frustrating — so much class time lost to coding frustration. I was overjoyed when JMP came out. JMP is very visual and interactive, freeing time to focus on conceptual understanding of statistics.”

-- Dalene Stangl, PhD, Associate Chair and Director of Undergraduate Studies, and Professor in the Department of Statistical Science at Duke University, has won many awards for outstanding teaching. She has a long list of other accomplishments, one of which is helping to organize the first-ever Women in Statistics Conference.

 

goos“Even though I am generally reluctant to change, I very enthusiastically started using JMP for teaching basic statistics, probability, regression analysis and design of experiments in the last two years, after having seen impressive demos on the visualization of data, the interactive capabilities and the JMP team’s support tools for teachers. For any regression or ANOVA class, the Prediction Profiler (a dynamic, interactive tool for interpreting regression or ANOVA models) is the best tool out there to explain what an interaction effect is, what a quadratic effect is, …. JMP allows students to actually understand and use the results of an analysis, whether the dependent variable is continuous or binary or ordered categorical and no matter whether the independent variables are quantitative or categorical or both.”

-- Peter Goos, full Professor in Statistics at the Faculty of Bioscience Engineering, University of Leuven, and at the Faculty of Applied Economics, University of Antwerp, and coauthor of Optimal Design of Experiments: A Case Study Approach.

 

Deppa“JMP is fundamental for intro stats, which is required for nurses and a number of other majors. I teach the entire intro course with the Distribution and Fit Y by X platforms. Starting with a graph is incredibly important. I have entire lessons based on Fit Y by X — if Y is continuous, if Y and X are categorical, etc. Data type and technique are incredibly important. Fit Y by X provides a clean way to introduce bivariate thinking. Through the interactive, visual Profiler, we are able to get to more advanced statistical topics like multivariate logistic regression with a less statistically savvy audience. A huge drawback with other packages is that you're not always moving forward. With JMP, you're always moving forward. JMP is very progressive. JMP also features in other classes along with R. Data cleaning is often easier in JMP — variable recoding is cleverly implemented, and you can easily collapse categories.”

-- Brant Deppa, PhD, Professor of Statistics and Department Chair of Mathematics & Statistics at Winona State University, has taken a leadership role in developing one of the most successful and nationally recognized undergraduate statistics programs in the US. He has been teaching using JMP for nearly 20 years in courses ranging from introductory statistics for non-majors to upper-level courses for statistics majors, such as multivariate analysis and supervised learning. He has also developed several JMP-based online statistics courses in support of master’s and doctoral programs in nursing.

 

meintrup"One reason I like to teach statistics using JMP is that I don't need to spend a lot of time explaining the software. I show my students the basic principles, and then they are ready to explore JMP on their own. Once they discover the interactivity of graphs, or how to create graphs interactively with the Graph Builder, I can virtually observe how they dive deeper and deeper into JMP. In the end, they still need to learn statistics, but JMP makes it far less painful."

-- David Meintrup works in both academia and industry. David is a Professor of mathematics, statistics and operations research at Ingolstadt University of Applied Sciences in Germany. He also consults with scientists and engineers in the semiconductor, solar, pharmaceutical and biotech industries. In a recent webcast, David also stated the importance of doing two things in teaching applied statistics: using software and teaching design of experiments (DOE).

 

Adams“Whether teaching a graduate course in design of experiments, data mining or an undergraduate introductory methods course, JMP has become my statistical tool of choice. Many software platforms provide nice tool boxes from which an analyst selects a desired tool, considers the results and then selects another tool for another try. JMP gives you the same tool set but in a fully integrated and dynamic diagnostics system. Highly interactive graphics, lightning fast and technically sound numeric algorithms, and dynamic tools for data manipulation become a means to an end, not the key focus of the class. Simulation studies, interactive scripts for illustrating key statistical concepts, JSL for moving into production settings – well, there are just too many good things to talk about. JMP, I love it.”

-- Michael Adams, Professor of Applied Statistics at The University of Alabama, has a focus on statistical education, which includes activity-based statistics lessons for K-12 teachers in the Alabama Quantitative Literacy Program, and developing distance learning courses and study-abroad programs in China.

 

bild_hilbert"Using JMP, which recently replaced SPSS completely in all our lectures, we appreciate the ease-of-use and brilliant ways to visualize data. Students are very excited about JMP — they are requesting more and more tips and tricks for their final theses. Since JMP offers so many valuable supporting resources, the students find it easy to help themselves.”

-- Andreas Hilbert, Full Professor and Chair of Business Informatics at the Technical University of Dresden, is Chairman of the board of the Business Intelligence Research Association and a board member for the business intelligence task force within the Gesellschaft für Informatik, the largest society of computer scientists in Germany.

 

Orzechowski“At the Northwestern master’s program for Product Design and Development, I have the privilege to teach business leaders about the use of statistics for product development using JMP….  Within minutes of the first class, their eyes are lighting up as they are seeing their data in an interactive visual way that they’ve never experienced before. The fact that JMP successfully helps engage the students is such a critical success factor that it is at the foundation of this class. I've used Excel and Minitab, and these simply can’t replicate that excitement. JMP is a wonderful visualization tool that allows me to teach statistics with almost no equations — it's all about visualization. At work, we leverage these same strengths to help us rapidly obtain meaning from our data, develop compelling visual communications for our leaders and, in the end, effectively drive decision making. JMP is more than a statistics package to us. It is an important tool that helps us manage our business.”

-- Anthony Orzechowski, Director of R&D Quality Engineering at Abbott Diagnostics, is a certified Lean Six Sigma Master Black Belt and serves as an Adjunct Professor at Northwestern University’s Master of Product Design and Development program.

 

Chen“As a global company, one of the hurdles we face is language differences across regions. Statistics is one of the ways to bridge the gap, so we have classes for intro stats and DOE for associates throughout the globe. We made a commitment to use JMP in these classes over 10 years ago and have not looked back since. The data visualization and interactive output in JMP bring the common language of statistics to life for our associates in ways that a static textbook or slide deck cannot accomplish. I know JMP is a successful tool because I see associates who have taken the classes sharing data across regions using JMP output to explain their results and conclusions. If a picture is worth a thousand words, then JMP's visualization tools are thousands of words that need no translation, which is a huge savings for our Enterprise.”

-- Chris Chen is a statistician for W.L. Gore & Associates Inc. He splits his Commitments between supporting the Core Technology group and leading the Global Statistics Team. Chris has been using SAS and JMP for more than 15 years. He has developed and taught in-house courses in basic statistics, DOE and SPC. His mission at Gore is to help Associates tell better stories.

 

Haney“I have had many years’ experience teaching Six Sigma in industry. My students come from various academic backgrounds and have a variety of work experiences. Most have come to understand the power in JMP Pro and all that they can use it for to achieve success. I do remember one new employee who commented, ‘How can Dow justify the expense in providing JMP software to its employees? My previous employer used a less expensive product, and they did just fine.’ After the training, I asked the new employee what he thought of JMP now. He was a bit embarrassed about his earlier comment, having seen all that JMP could do. He remarked, ‘I guess at my old company we got what we paid for.’ ”

-- Chris Haney is a Six Sigma Master Black Belt who mentors Master Black Belts and develops training for Black Belts and Green Belt Project Leaders globally for Dow.

UPDATE 6/16/14 2:00 pm ET Another professor sent in a comment for this post:

Alwan-Layth-2239

“What I love about JMP is that it provides a flexible platform for users who range from the most basic user to the most sophisticated. I use JMP in delivering my undergraduate, Executive MBA, and traditional MBA statistics courses. JMP’s visual capabilities and intuitive platforms are unmatched. Literally, on the first day of classes, I have my students play with Graph Builder with a provided data set. I frequently hear students exclaim ‘Cool!’ when they realize that they are capable of uncovering a variety of visual stories with the data. These students of limited statistical background immediately feel the power of data. JMP allows instructors to go as deep as they want on topics. For me, I find all the embedded capabilities in JMP’s Fit Model platform to be invaluable including modeling searching using AIC, BIC, and out-of-sample validation, logistic regression, and the variety of customizable tests. As time passes, I continually uncover more capabilities of JMP which result in improvements in my course delivery and a better learning environment for my students.”

-- Layth C. Alwan is an Associate Professor of Business Statistics and Operations Management at the University of Wisconsin. He has twice won teacher of year for the school and has also won outstanding EMBA teacher of the year. He is co-author of Practice of Statistics for Business and Economics.

Education is ongoing, and the nature of our work is continually evolving, requiring us to learn new things — change is the one constant.  If you teach with JMP — in an academic, professional or other setting — and have something you’d like to add, please do!

Post a Comment

Alias Optimal versus D-optimal designs

In my previous blog entry, I discussed the purpose of the Alias Matrix in quantifying the potential bias in estimated effects due to the alias terms. In this blog post, we look at an example that creates a D-optimal design and an Alias Optimal design with the same number of factors and runs.

Consider an experiment where we have six two-level categorical factors denoted by X1 to X6, and a budget of 12 runs. Our interest is in the main effects.

We start with the D-optimal design. Using the Custom Designer with six factors and 12 runs for the main effects, JMP will find a D-optimal design by default. If we take a look at the Color Map On Correlations below, we see all the main effects are orthogonal to each other, but correlated with two-factor interactions (absolute correlation of 0.3333). Since our model contains only main effects, looking at the design diagnostics and estimation efficiencies reveal that we are not going to find a better design for getting precise estimates of the effects in this model.

dopt color map

Taking a look at the Alias Matrix, we find the following:

dopt alias

The entries in the Alias Matrix and Color Map On Correlations do not generally correspond directly to each other as they do in this example. If you try this out on your own, you may notice different positions for the positive and negative values in the Alias Matrix.
Read More »

Post a Comment

Q&A with consumer research expert Rob Reul

Anne Milley prepares to interview Rob Reul of Isometric Solutions Inc. about consumer research for an Analytically Speaking webcast.

Anne Milley prepares to interview Rob Reul of Isometric Solutions Inc. about consumer research for an Analytically Speaking webcast.

Rob Reul, founder and Managing Director of Isometric Solutions, has decades of experience helping businesses understand what customers want. In his Analytically Speaking webcast, Rob talked about using data analysis to focus product development in areas that customers consider most critical. Rob demonstrated some great success stories in choice modeling and customer satisfaction surveys. In this interview, we asked Rob some more detailed questions about how these methods can be used to understand customer needs as related to software quality, feature enhancement, and managing a continuing conversation about user experience.

Melinda: Businesses are always looking to grow, which means attracting new customers, but it also means keeping current customers happy. Can you tell us about the differences between consumer research that’s intended to find new markets and consumer research that’s intended to examine performance for existing customers?

Rob: Research that seeks new markets usually coincides with the search for unmet needs. The common phrase “necessity is the mother of invention” rings true. These unknowns can be best isolated by studies of preference. Preference experimentation presents choices that respondents select from based on their interests. These studies often include an economic variable that then introduces a financial dimension that then expresses choice preferences based on a respondent’s willingness to pay. Together, this characterizes a new market venture by coupling economics with the probabilities of preference.

Research that examines performance seeks to increase a company’s competitiveness by evaluating the extent to which existing requirements are met. These are held as expectations. Thus an expectation scale is recommended because it is much more exacting than a satisfaction scale.

Melinda: Quality is an evolving issue for those of us who make software. Software is developed in the equivalent of a laboratory, so there’s a disconnect between how we make the product and how the product is used. What can consumer research teach the software industry about measuring customer happiness?

Rob: This is an interesting slice of the research equation. Looking into “happiness” (although initially overlooked by many) has lately been recast (in software) as the user experience. This new emphasis on the “experience” has been pursued by many with some success, the belief being that the greater the user’s experience the stronger the affinity the user will have toward the software. Stronger affinity then likely would extend to greater levels of overall satisfaction, product loyalty and product referral.

Melinda: Any thoughts on identifying focus areas for product development based on actual customer needs?

Rob: Software product development based on actual needs is known as the study of “use cases.” Here, researchers first seek to understand the very nature of the work task. They study the challenge the software user faces and what he or she seeks to accomplish. With this knowledge, software research focuses ensuing software development on ways to better meet those needs.

Melinda: When developing technical products (like statistical software), what the consumer wants is often only half the picture. What’s desired is often not technically feasible and sometimes does not solve the customer’s true problem. Can you talk about how consumer research can be linked to product research for technical products? How do you build a through-line between what the customer asks for and what their actual needs are? 

Rob: As I touched on earlier, the deconstruction of the “use-case” helps to understand exactly what the software user seeks to accomplish. With that understanding, draw the line between those task needs and specific software functionality. Regarding customers’ lofty desires vs. feasibility, customers will continue to be customers, and those who best meet their true needs (stated or derived) will prevail.

Missed Rob’s Analytically Speaking webcast? View it on demand for an in-depth conversation on consumer and market research.

Update 6/10/14: If you're interested in learning more about consumer research, take a look at the upcoming training on the topic:

Post a Comment

What is an Alias Matrix?

When I create a design, the first place I typically look to evaluate the design is the Color Map On Correlations. Hopefully, I see a lot of blue, implying orthogonality between different terms. The Color Map On Correlations contains both the model terms and alias terms. If you have taken a design of experiments/linear regression class, you may be familiar with the idea that correlation among predictors inflates the standard error of the estimates. However, the majority of the familiar design diagnostics relate to the model terms. What about if we have missed some terms in our model? Not surprisingly, if we are missing terms in the model, these terms can still affect our results. Fortunately, we do have a way to assess this for those terms specified in the list of alias terms.

What effect do missing terms have on the model estimates?

We will dig into the technical details below, but the takeaway message is that active terms not in the model can bias the estimates of terms in the model. If a missing term is specified in the list of alias terms, the Alias Matrix gives us a means of quantifying that bias. The rows of the Alias Matrix correspond to each of the model effects, while the columns represent the different alias terms and how they influence the expected value of the effect estimate for each of those model effects.

Read More »

Post a Comment

Determining chemical concentration with standard addition: An application of linear regression in JMP

One of the most common tasks in chemistry is to determine the concentration of a chemical in an aqueous solution (i.e., the chemical is dissolved in water, with other chemicals possibly in the solution). A common way to accomplish this task is to create a calibration curve by measuring the signals of known quantities of the chemical of interest - often called the analyte - in response to some analytical method (commonly involving absorption spectroscopy, emission spectroscopy or electrochemistry); the calibration curve is then used to interpolate or extrapolate the signal of the solution of interest to obtain the analyte's concentration.

However, what if other components in the solution distort the analyte's signal? This distortion is called a matrix interference or matrix effect, and a solution with a matrix effect would give a different signal compared to a solution containing purely the analyte. Consequently, a calibration curve based on solutions containing only the analyte cannot be used to accurately determine the analyte's concentration.

Overcoming Matrix Interferences with Standard Addition

An effective and commonly used technique to overcome matrix interferences is standard addition. This involves adding known quantities of the analyte (the standard) to the solution of interest and measuring the solution's analytical signals in response to each addition. (Adding the standard to the sample is commonly called "spiking the sample.") Assuming that the analytical signal still changes proportionally to the concentration of the analyte in the presence of matrix effects, a calibration curve can be obtained based on simple linear regression. The analyte's concentration in the solution before any additions of the standard can then be extrapolated from the regression line; I will explain how this extrapolation works later in the post with a plot of the regression line.

Procedurally, here are the steps for preparing the samples for analysis in standard addition:

1) Obtain several samples of the solution containing the analyte in equal volumes.
2) Add increasing and known quantities of the analyte to all but one of the solutions.
3) Dilute the mixture with water so that all solutions have equal volumes.

These three steps are shown in the diagram below. Notice that no standard was added to the first volumetric flask.

This above image was made by Lins4y via Wikimedia with some slight modifications.

At this point, the five solutions are now ready for analysis by some analytical method. The signals are quantified and plotted against the concentrations of the standards that were added to the solutions, including one sample that had no standard added to it. A simple linear regression curve can then be fitted to the data and used to extrapolate the chemical concentration.

Determining the Concentration of Silver in Photographic Waste: An Illustrative Example in JMP

The following example is from pages 117-120 in "Statistics for Analytical Chemistry" by J.C. Miller and J.N. Miller (2nd edition, 1988). The light-sensitive chemicals on photographic film are silver halides (i.e., ionic compounds made of silver and one of the halogens: fluorine, bromine, chlorine and iodine). Thus, silver is often extracted from photographic waste for commercial reclamation. A sample of photographic waste containing an unknown amount of silver was determined by standard addition with atomic absorption spectroscopy. Here are the data in JMP:

I used the "Fit Y by X" platform and the "Fit Line" option under the red-triangle menu to implement simple linear regression. (You can also do this with the "Fit Special" option; just click "OK" without adjusting any settings.) After adjusting the axes and adding some captions, I get the following plot:

This plot illustrates the key idea behind using this calibration curve. The magnitude of the x-intercept is the concentration of the silver in the original solution. To understand why this is so, consider the absorbance at the following two values:

  • at x = 0, the value of y is the absorbance of the solution with no added standard (i.e., it corresponds to the concentration of silver that we ultimately want).
  • at the x-intercept, there is no absorbance.

Thus, the magnitude of the difference between x=0 and the x-intercept is the concentration of silver that is needed to produce the signal for the original solution of interest! Our job now is to determine the x-intercept.

Using a little linear algebra, we can mathematically obtain the x-intercept. However, there is a clever way to find it in JMP using the "Inverse Prediction" function under "Fit Model". (I thank Mark Bailey, another JMP blogger, for his guidance on this trick!)

First, let's run the linear regression again by "Fit Model" under the "Analyze" menu.

fit model - standard addition

Notice how JMP automatically suggests "Standard Least Squares" in the top right of this dialog window.

Here is the output from "Fit Model".

Fit Model Output - standard addition

Now, to get the x-intercept, let's go to the red-triangle menu for "Response Absorbance." Within the "Estimates" sub-menu, choose "Inverse Prediction." This allows us to predict an x-value given a y-value. Since we need the x-intercept, the y-value (absorbance) needs to be zero. I prefer to use a significance level of 1%, so I set my confidence level at 0.99.

inverse prediction (confidence interval) - standard addition

There is an option on the bottom left that says "Confid interval with respect to individual rather than expected response," and you may be wondering what it means. This option allows you to get the prediction interval, which quantifies how certain I am about the x-value (silver concentration) of a new observation at "Absorbance = 0". In contrast, a confidence interval quantifies how certain I am about the mean silver concentration at that particular absorbance. A prediction interval takes into account two sources of variation:

  1. Variation in the estimation of the mean x-value.
  2. Variation in the sampling of a new observation.

A confidence interval takes only the first source of variation into account, so it is narrower than a prediction interval.

Since I am interested in the x-intercept alone and not a new observation at zero absorbance, let's leave that option unchecked and just use a confidence interval.

Here is the output that has been added to the bottom of this results window.

confidence interval of x-intercept - standard addition

The estimate of the x-intercept (concentration of silver in the standard solution at zero absorbance) is 17.2605 µg/mL, and its 99% confidence interval [14.4811, 20.5585].

Conclusion

Standard addition is a simple yet effective method for determining the concentration of an analyte in the presence of other chemicals that interfere with its analytical signal. Its use of simple linear regression can be easily implemented and visualized in JMP using the "Fit Model" platform, and its "Inverse Prediction" function provides an easy way to not only estimate the analyte's concentration, but also to generate a confidence interval for it.

References

J.C. Miller and J.N. Miller. Statistics for Analytical Chemistry. 2nd Edition, 1988, Ellis Horwood Limited. Pages 117-120.

G. Gruce and P. Gill. "Estimates of Precision in a Standard Addition Analysis." Journal of Chemical Education, Volume 76,  June 1999.

Eric Cai works as a statistician in the Laboratory Program of the British Columbia Centre for Excellence in HIV/AIDS in Vancouver, British Columbia, Canada. He also shares his passion about statistics, machine learning, chemistry and math via his blog, The Chemical Statistician; his Youtube Channel; and his Twitter feed @chemstateric. This is Eric's first post as a guest blogger for JMP.

For more information on how JMP can be used in chemistry and chemical engineering, visit our website.

Post a Comment

See optimal settings with JMP Pareto Efficient Frontier

Pareto Efficient Frontier (PEF) is becoming an increasingly popular tool for measuring and selecting project or design parameters that will yield the highest value at the lowest risk. PEF is being used widely in many industrial areas, such as when selecting the best exploration projects in oil and gas, finding optimum design parameters in consumer product research, and even finding the right pricing of products and service in sales and marketing. This tool is especially useful for anyone involved in project, product or service management, at it allows you to see in a clear visual the most important points that you care about among all the other points in a graph.

We can easily create a PEF in JMP using the features in Graph Builder and Row Selections. Let's look at some design team data from 500 tested units. The team wanted to find those tested units that would provide the lowest level of Battery Voltage (V) that would work for the highest level of Ambient Celsius (C) temperature. We would describe these points as being the most “dominant” for Ambient (C) points across low Battery (V) settings. The reason this would be important for the design team is the need to find those tested units that can operate under the highest operating temperatures with the lowest strain on the battery.

 Table 1: Design Team Data Partial Snapshot

Table 1: Design Team Data Partial Snapshot

Looking at a Distribution of the parameters, we can see the spread of the data for Ambient (C) and Battery (V) against their respective statistics. Most of the tested units seem to be within specifications, so let's go on to the next view to help find the PEF.

 View 1 - Distributions: Specifications

View 1 - Distributions: Specifications

The next graph was created in Graph Builder with a scatterplot and smoother line view of Ambient (C) and Battery (V) points. Now we can use the Row Selection features in the Row menu to help find the PEF points. The Row Selection – Select Dominant option gives us an input box where we can ask to look at dominant points for our parameters. Note that we checked the input box for Ambient (C) so we could see highest coordinating points for this parameter, while leaving the box for Battery (V) unchecked so we could see the lowest coordinating points for this parameter.

  View 2 - Graph Builder: Overlay

View 2 - Graph Builder: Overlay

Row Selection: Select Dominant

Row Selection: Select Dominant

Row Selection: Select Dominant Input Box

Row Selection: Select Dominant Input Box

This allows us to highlight the dominant points with high Ambient (C) temperature at low Battery (V). To make it more visual, we colored the dominant points red. Now we can start to see the PEF, as this is the ridge of red points on our graph.

View 3 – Graph Builder: Overlay w/ PEF Points Selected

View 3 – Graph Builder: Overlay w/ PEF Points Selected

While these dominant points are still selected, we can also use the Row Selection – Name Selection in Column from the Row menu to create a new column in our data tables that will identify the dominant and non-dominant points with an indicator. In this case, we created a new column called PEF (for Pareto Efficient Frontier) where a “1” indicates a dominant point and a “0” indicates a non-dominant point.

Row Selection: Name Selection in Column

Row Selection: Name Selection in Column

Row Selection: Name Selection in Column Input Box

Row Selection: Name Selection in Column Input Box

This will let us use the PEF column as an overlay column in our graph. Combined with a data filter, we can just select the PEF indicator points and clean up our Graph Builder view to just show the dominant point ridge where the tested units performed with the lowest Battery (V) at the highest Ambient (C). We can now easily see where our tested products will operate the most efficiently in our design.

View 4 – Graph Builder: PEF w/ Filter

View 4 – Graph Builder: PEF w/ Filter

Note: This post was co-written with Jeff Perkinson, Customer Care Manager.

Post a Comment

Using JMP to visualize a solid state drive reconditioning process

This past week, I noticed that my computer had seriously slowed down. My usual tasks seems to be taking forever, and even my standard JMP demos were taking quite a bit longer than I was used to. I tried the normal things such as repairing permissions, checking the memory, seeing if there were any corrupt kernel extensions and even going so far as installing a clean version of Mac OS Mavericks to see if that would fix the behavior I was seeing.

Then it dawned on me that it might be something going on with my solid state drive (SSD). Since I still had the original hard drive (a standard spinning drive) that came with my MacBook Pro, I installed that and tried booting from the original drive.

A program that I use for seeing how your computer is performing is Geekbench. It does a number of processor, graphics and disk-intensive tasks and then reports back a single- and multi-core performance number that you can then compare to a database of computers that are similarly specified to yours. Then you can see if you are achieving comparable performance.

Well, as it turns out, the performance of my computer should be around 2,000 for single-core and 10,000 for multi-core. I was getting 700 for single-core and 3,000 for multi-core with the SSD. When I put the original HDD back into my laptop, performance increased to about 2,000 and 8,500, much closer to what it should be.

So obviously something was going on with my drive. Not giving up, I decided to see if I could recondition it. I also thought this was a great opportunity to collect and visualize some data using JMP. I used a program called DiskTester, from the Digiloyd Tools Suite. One of the functions in DiskTester is a recondition SSD function. This writes a large chunk of data to all of the free space of a drive and lets you iterate a number of times. The program reports the chunk offset in MB, average write speed, current write speed as well as a minimum and maximum write speed.

The drive, according to the Digiloyd Tools developer, “responds to this treatment by cleaning up and defragmenting itself.” If this process works on my drive, I should see some pretty bad performance for the first iteration that drastically improves after a few iterations.

So I erased my drive, booted from my other internal drive and started the reconditioning process, letting it run overnight and collecting eight iterations of raw data.

DiskTester gave me the option to copy the raw data to the clipboard, which I did, and then created a .txt file that I will now import into JMP.

I’ll use File > Open to get the .txt file, and then when given the option, I’ll choose Open As: Data (Using Preview), which gives me an option to inspect the data before getting it into JMP.

SSDReconditionJMP-Fig1

I’m happy with the way the columns look, and I see by the 123 icon that all my data will be coming in as continuous, which is what I want. In this window, I have the option to give the columns names, which I will do.

SSDRecondition-Fig2

And now I have my raw data into JMP, but there is one more step I will need to do before I can visualize the results. You can see I am missing an important column, which is the iteration number. I’ll need this to use as a phase or grouping variable. Fortunately, I can generate this pretty easily in JMP. I'll create a new column and then right-click to get to the column info. When you create a new column, you have the option to initialize data. I'll pick sequence data, and then enter 1 for the From, 8 for the To (as I want data from iteration from 1 to 8) and 1 for the Step.

I know that my last block is 482,176 MB and the program is writing 128 MB chunks, which means each iteration will have 3,767 unique measurements. So I will put 3767 into the Repeat each value N times field.

CreateIterationColumnJMPwithSequenceI can check my work by looking at rows 3,767 and 3,768. Sure enough, 3,767 is labeled iteration 1, and 3,768 is labeled iteration 2. Now I’m ready to go.

SSDReconditionJMP-Fig6

I’ll use Graph Builder to see what’s going on in the data. Having the Offset in GBs instead of MBs may be a better way to display the labels on the X-axis for all of my plots, so before I go any further, I'm going to transform MB to GB by right-clicking on the Offset MB, clicking on Formula and then taking Offset MB and dividing by 1000. This creates a custom transform column without having to go back to the data table. I'll want to use this later, so I'll rename it Offset GB, right-click and then select: Add to Data Table. The plot below shows the pattern of performance readings across the drive for each iteration. I’ve turned the transparency of the points down to 0.1 so we can see them better and also added a lower spec limit of 110 MB/sec to the graph.

SSD-GraphBuilderJMPperformance

As you can see in the graph, the performance on the first iteration is all over the place. While the average write performance is decent (91 MB/sec), there are many 128 MB chunks that are being written much more slowly to the disk. By iteration 2, however, things are starting to improve drastically. The average write performance has increased to 115 MB/sec. By iteration 5, things are starting to settle in, and at that point, I seem to be seeing asymptotic behavior in write performance.

What remains through all the iterations, however, is a band of blocks that are in the high 40s for write performance. Even by the end of the test, 130 blocks are in the lower performing sector. This is vastly improved from the first iteration, where 2,294 blocks are below the spec limit. If I add a Local Data Filter to Graph Builder, I can focus on just the first and last iteration, and compare performance. While the write performance seems to be greatly variable on iteration 1, by iteration 8, there are three straight bands, indicating consistent performance over all the tested sectors of the drive. The cause of the lower-performing sectors is still a bit of a mystery to me, but I suspect it may be something in the operating system, where the program is being interrupted by some other system task causing a drop in measured performance. (If anyone has a better hypothesis, leave it in the comments).

DrivePerformnce1-8iterationSSD

So this big question is: “Did it work?” Well, I am happy to report that it did. After running the reconditioning procedure and reinstalling the drive in my laptop, my Geekbench score is back to 2,900/9,500, which is what I should expect given my hardware specifications. And the drastic drop in speed that I noticed on my computer is no longer there.

Post a Comment

Michael Schrage on experimentation, innovation & communicating analytic results

michael-schrageWe are delighted that Michael Schrage, Research Fellow at MIT Sloan School of Management’s Center for Digital Business, will be a keynote speaker at Discovery Summit 2014. I first encountered Michael when he skillfully moderated an Analytics Exchange Panel at the 2009 conference.

Earlier this year, Michael was the featured expert in an Analytically Speaking webcast. We spent extra time together thanks to the wintry weather in North Carolina! That enabled us to have some interesting conversations, and I'm pleased to share more of his perspective on some important topics: experimentation, innovation and communication of analytic results.

Why don't organizations do more (good) experimentation?

Michael: This is a deceptively difficult question. I see many organizations perform "tests" rather than experiments. The web has inspired a whole new generation of A/B tests and testing. You certainly see technical/engineering folks use Taguchi and Box/Fisher "design of experiments" statistical methodologies. But, no, I really don't run across that many organizations or innovation teams culturally committed to "experimentation" as central to how they learn, iteratively design or manage risk. I fear that DOE has turned itself into a "black box" for technical quants rather than a platform for "new value creation" and exploration by entrepreneurs and innovators.

But why? My empirically anecdotal answer would be that most business people think in terms of "ideas" and "plans" and "programs" rather than in terms of "testable hypotheses" and "experiments." Experimentation is for geeks, nerds and quants — not business management and leaders. Designing business experiments is what we delegate, not celebrate or see as strategic. A second reason that I've seen surface when organizations resist the fast, cheap and simple "experiments" option is that experimentation doesn't "solve the problem." It only gives insight. A lot of people in management are much more interested in paying for "three-quarter" solutions — or even half-baked ones — than genuine insights. In other words, they see the products of experiment as more interesting than compelling. Of course, when one sees the digital successes of the Googles, Amazons and Netflixs in using experiments to innovate, as well as optimize, you have to wonder just how much of this resistance reflects generational dysfunction, not simple ignorance.

You've said there is a tension between incremental and disruptive innovation. What are your observations on organizational cultures that foster both kinds of innovation?

Michael: Well, almost by definition, "incremental" is likelier to be easier and less disruptive than "disruptive" innovation. Remember, I'm not a fan of "innovation" for innovation's sake; I believe innovation is means to an end, and we want to make sure we understand and agree about which aspects of that desired "end" are tactical versus strategic. We need to have the courage and honesty to confront the possibility that "disruptive" innovations will be better for our customers, clients and us in the nearer or longer term. We need to be confident that our disruptive innovations will advantage us with our customers while concurrently disadvantaging our competitors. Culturally speaking, companies that are proud to the point of arrogance about their technical skills and competences tend to be understandably reluctant about truly "disruptive" innovation because it disrupts their sense of themselves and what they think they're good at. Organizations more focused on UX, customer service, client relationships and a broader/bigger "mission" seem more culturally comfortable with disruption because it is a means to a greater end rather than just an opportunistic tactic.

You've said sometimes we need to look for approaches versus solutions. Can you relate that to JMP?

Yes, I've largely gotten out of the "solutions business" both in my teaching and advisory work. Almost everyone I work with is pretty smart, so my focus now is less on the transmission of my expertise than on the cultivation of their capabilities. I want my students and clients to be able to embrace, understand, and exploit a new power and capability that lets them find and customize the solutions they want. I am not there to "solve their problems." I'm there to facilitate how they choose to bring both existing and new capabilities to bear on solving problems in the ways that are culturally and economically compatible with their needs, not my expertise and "experience." How does that relate to JMP? That's easy — I've been a part of the JMP community long enough to know that your best customers and users come up with novel and compelling ways to get value from your products. You learn as much from them as they from you. I'm comfortable arguing that mutual/collaborative learning is more about an "approach" than a "solution."

Share with us the importance of communicating analysis results to executives.

That importance can't be overstated, but the heuristic I look forward to offering and discussing is that the purpose of communicating analytical results should not be to make executives feel stupid or ignorant but to make them feel smarter and more engaged. If all your communications do is fairly and accurately convey useful results in an accessible way, you're underperforming.

We hope you will bring your own questions to ask Michael live in September. If you’ve never attended Discovery Summit, perhaps it’s time for you to experiment with a new conference unlike any other.

Note: This is the second blog post in a series on keynote speakers at Discovery Summit 2014. The first post was about David Hand.

Post a Comment

Two kinds of dot plots

The name “dot plot” can refer to a variety of completely different graph styles. Well, they have one thing in common: They all contain dots. For analytic use, the two most prominent styles are what we might call the Wilkinson dot plot and the Cleveland dot plot.

The Wilkinson dot plot displays a distribution of continuous data points, like a histogram, but shows individual data points instead of bins.

Wilkinson Dot Plot

Though variations of such plots have been around for more than 100 years, Leland Wilkinson’s seminal paper “Dot Plots” largely standardized the form. Last summer, the support for Wilkinson dot plots in JMP was greatly enhanced by an add-in, which is now built-in to JMP 11.1 (see the Wilkinson dot plot blog post).

The Cleveland dot plot is featured in William S. Cleveland’s book Elements of Graphing Data and displays a continuous variable versus a categorical variable.

Cleveland Dot Plot

This kind of dot plot is similar to a bar chart, but instead of using length to encode the data values, it uses position. As a result, the dot plot does not need to start its data axis at zero, can use a log axis and is more flexible for overlaying multiple variables. Cleveland breaks down the estimation aspect of graph perception into three parts: discrimination, ranking and ratioing. In general, dot plots help with the first two at the expense of the third, making relative proportions less accessible. For instance, it’s easier to see when one bar is twice as long as another without consulting the axis.

Cleveland’s books, along with Wilkinson’s The Grammar of Graphics, were influential in the creation of Graph Builder, and as a result, the Points element is the default view in Graph Builder for both continuous and categorical data.

Below is a Graph Builder recreation of a Cleveland’s display of barley yields . A challenge: Can you spot the odd feature of the data?

dotplots3

The use of dotted lines is presumably a constraint of black and white printing, and it’s more common to see faint gray lines in dot plots. Beyond the usual drag-and-drop of variables into roles, the Graph Builder steps to make the dot plot above are:

  • Add a Value Ordering property for the Variety column (on the Y axis) to match Cleveland's order.
  • Put the Site variable in the Group Wrap role and set the number of columns to be 1.
  • Turn off Show Title for Site.
  • Turn on grid lines for the Y axis.
  • Change the legend position to the bottom.

And now the answer to the challenge: The odd feature of the data is that the 1931 values are generally greater than the 1932 values except for the Morris site, which suggests the values may have been swapped.

For more discussion of Cleveland dot plots, see the article “Dot Plots: A Useful Alternative to Bar Charts” by Naomi Robbins.

Post a Comment

New contingency analysis add-in for JMP

A contingency analysis determines whether there is a relationship between two categorical variables. A caterer, for example, might be interested in knowing whether entrée selections at an event were related to gender, given the following data:

image_1

The contingency platform in JMP requires the X and Y variables to be contained in two columns, with the cell counts in a third:

image_2

With the “Contingency (Table Format)” add-in, developed especially for students and others who are new to JMP, you can launch a contingency analysis with data that is arranged in “crosstab” format, without having to stack the data first:

image_3

Simply specify the variables as shown here:

image_4

The add-in creates the contingency report and includes buttons that let you easily swap the rows and columns of the report if desired:

image_5

If the “Show New Table” option is selected, the add-in also generates a stacked table containing a script that will launch the platform in the standard manner:

image_6

This add-in, along with many others, is available on the JMP File Exchange. (A free SAS profile is needed for access.)

Post a Comment