John Sall, Author at JMP Blog

May 11, 2015 0

With multivariate methods, you have to do things differently when you have wide (many thousands of columns) data. The Achilles heel for the traditional approaches for wide data is that they start with a covariance matrix, which grows with the square of the number of columns. Genomics research, for example,

May 4, 2015 0

Handling outliers at scale

In an earlier blog post, we looked at cleaning up dirty data in categories. This time, we look at cleaning dirty data in the form of outliers for continuous columns. In industry, it’s not unusual to have most of your values in a narrow range (for example between .1 and

April 27, 2015 0

Accessing data at scale from databases

Many JMP users get their data from databases. A few releases ago, we introduced an interactive wizard import dialog to make it easier to import from text files. In a subsequent release, we created a feature that lets you import Web page tables into JMP data tables. In JMP 11,

April 20, 2015 1

Cleaning categories at scale with Recode

Data entered manually is usually not clean and consistent. Even when data is entered by multiple-choice fields rather than by text-entry fields, it might need additional work when it is combined with data that may not use the same categories across sources. Sometimes the same categories are spelled differently, abbreviated

March 23, 2015 0

Flow and Frontier in JMP 12

Long lists of improvements go into each new version of our software, and usually there are one or two themes that characterize the release. JMP 12 launches this week, and the themes of this new version are flow and frontier. By flow, I mean workflow, the way we can smooth

October 4, 2014 2

Statistical discovery with JMP at the 25-year point

For you, today is Oct. 4. At JMP, we call it Sept. 34. We had been determined to release the first version of JMP by the end of the third quarter of 1989. But, as it turned out, we needed a few extra days to make our own deadline. So

December 3, 2013 17

“The desktop computer is dead” and other myths

The desktop or laptop is now in decline, squeezed from one side by mobile platforms and from the other side by the cloud. As a developer of desktop software, I believe it is time to address the challenges to our viability. Is software for the desktop PC now the living

November 5, 2013 0

Big real data is different from big simulated data: Benchmarking

To benchmark computer performance on statistical methods with big data, we can just generate random data and measure performance on that, right? Well, it could be that simulated data may not act the same as real data. Let’s find out. Logistic Regression Suppose that we are benchmarking logistic regression. So

October 29, 2013 0

It’s not just what you say, but what you don’t say: Informative missing values

Sometimes emptiness is meaningful. If a loan applicant leaves his debt and salary fields empty, don’t you think that emptiness is meaningful? If a job applicant leaves this previous job field empty, don’t you think that emptiness is meaningful? If a political candidate fills out a form that has an

October 22, 2013 1

Big Data always has significant differences but not always practical differences: Practical significance and equivalence

When you have millions of observations of real data and do a simple fit across two variables, if you don’t get a significant test, then it is strong evidence of fraud. The one kind of data that is reliably non-significant for very tall data tables is simulated data. We live