Exploring text and other data with Heath Rushing

HeathHeath Rushing is someone I count myself very fortunate to know — first as a colleague at SAS and now as co-founder of Adsurgo, a successful consultancy.

Over years of JMP use, Heath has enthusiastically taught classes using JMP, written papers and the book, Design and Analysis of Experiments by Douglas Montgomery:  A Supplement Using JMP, and given us valuable feedback to make JMP better.

JMP 13 will be released in September, and we are grateful to Heath for his significant input on the new Text Explorer platform. A few years ago, some of our customers were wanting to do some basic text analysis, and Heath leveraged the JMP-R integration to very good effect. One of these applications is highlighted in his top-rated Discovery Summit presentation last year: "Harness the Power of JMP: Big Data and Social Media for Competitor Analytics." And he will be presenting “Mind the Gap: JMP on the Text Explorer Express” using new features in JMP 13 at JMP Discovery Summit next month.

We are pleased to feature Heath on August’s Analytically Speaking. We hope you will join us to hear about successful text analytics projects, easier workflow for basic text analytics and see a preview of some of this new capability in JMP 13.

Post a Comment

Video: Using Recode in JMP for data preparation

Data preparation before modeling is an unavoidable chore. One of the most time-consuming tasks can be cleaning up categorical data that may have misspellings, inconsistent capitalization and abbreviations, and the like. The Recode tool in JMP makes data prep a lot easier.

Watch this video by my colleague Ryan DeWitt to learn about the Group Similar Values option, which lets you group categories that are almost the same. Group Similar Values lets you ignore such things as case, non-printable characters, whitespace and punctuation.

In the video, he also covers these other Recode options: Convert to Titlecase, Convert to Uppercase, Convert to Lowercase, Trim Whitespace, Collapse Whitespace, First Word and Last Word, All But First Word, and All But Last Word.

If you have JMP 12 or the trial version, you can follow along using the same sample  data set that Ryan uses for the example in the video.

Read more about Recode right here in the JMP Blog.

Subscribe to the JMP channel on YouTube to see the latest videos.

Post a Comment

Submit an abstract to present in Prague

Discovery Summit Europe may not be until next March, but we’re already thinking about the conference agenda. The call for papers is now open to those who’d like to present.

If you’ve tackled an interesting problem with JMP, the conference steering committee wants to hear from you. Submit an abstract describing your work – it doesn’t have to be long, about 150-200 words. If your abstract is selected, you’ll present at the event in Prague, March 20-23, 2017.

As always, much of the agenda will be dedicated to user-led breakout sessions. Each year, our breakout presenters are a source of inspiration for the Discovery Summit – posing and challenging analytic theories, benchmarking best practices and conceiving innovative concepts.

Presenters will:

  • IMG_9840_croppedShare the Discovery Summit agenda with progressive analytic minds, including our keynote speakers, who are thought leaders in statistics, technology and innovation.
  • Shape the conference conversation about how to apply analytics in forward-looking companies around the globe.
  • Gather feedback from other attendees so you can refine your own analyses.
  • Demonstrate to your colleagues and managers how analytics benefits your organization.

Oh, and you get a discount, too! Paper and poster presenters receive 50 percent off conference admission, and student presenters receive complimentary conference admission.

If you aren’t interested in giving a talk or your application is better suited for a smaller, niche audience, consider presenting a poster. Posters can depict a class assignment, a research project or a business application. Posters will be judged based on their originality, innovative application and/or the use of visualization to express the data.

Not sure what to present? Take a look at past Discovery Summit presentation materials in the JMP User Community.

Post a Comment

Banner-fying your images

These days, customization of social media profiles is crucial. Everyone can find images to populate their banners, walls and timelines. But sometimes, banner images don't quite cut it. Especially, if you're anything like me, you aren't satisfied with only one picture for your LinkedIn profile banner (particularly if you have multiple interests you want to show off). So, I used JMP Scripting Language to find a solution.

For example, I love art and computer science, and have these two images:

Art    ProgrammingSmall

I wanted something smoother than a vertical line to separate the two images on my banner.

Perhaps, I'd prefer something more like this to place on my profile page:


This is the image I created for my LinkedIn profile, and you can do it, too! I wrote a short segment of JMP code that can blend two images of any size (though images in landscape work best), forming a picture with the correct LinkedIn banner height-to-width ratio. The usage isn't limited to LinkedIn, either. Whenever you need to combine pictures for a smoother image, you can use it.

Read More »

Post a Comment

Video: Subsetting data from a JMP Distribution report

Let's say you are in the Distribution platform in JMP, and you have created a report that you wish to drill down into. Well, the Local Data Filter can help with that.

But perhaps you also want to share a portion of the data with a co-worker, and not just the report itself.

So how do you do that?

You could subset the data in the report. In JMP, you can do that visuallly and very easily with a few clicks. Check out this short video by my co-worker Ryan DeWitt, who demonstrates subsetting (and mentions the Local Data Filter) using some sample data included in JMP.

Thanks for watching!

Subscribe to the JMP channel on YouTube to see the latest videos.

Post a Comment

Video: Joining tables in JMP

Joining tables that are open in JMP is a task we often need to do when collecting and combining data from different sources or observation dates.

Plus, you may want to customize your join further, for example, by matching specific columns or even leaving out a few columns.

Watch the demo below to see how to customize joining data tables in JMP.

And then try it out yourself with sample data sets provided by my colleague, Ryan DeWitt, who created this video tip. The data sets are in the JMP User Community, which is the best place to ask questions about JMP and share your own knowledge. Join the community, if you haven't already!

Subscribe to the JMP channel on YouTube to see the latest videos.

Post a Comment

Visual Six Sigma: A practical approach to data analysis and process improvement

67155Do you want to discover new and useful knowledge in your data using interactive, dynamic graphical displays? Would you like to be able to make sound decisions faster by understanding the patterns of variation in your data and separating it into useful signal and random noise? You can, with the help of Visual Six Sigma: Making Data Analysis Lean, Second Edition!

What is Visual Six Sigma?
It is "a practical and pragmatic approach to data analysis and process improvement.... In the typical business environment of process improvement, people are looking for simple-to-use tools that can be used by everyone at all levels to rapidly explore and interpret data, and then use that understanding to drive improvement. By making these tools highly visual and engaging, we can accelerate the process of analysis and eliminate the need for advanced statistical analysis in all but the most complex of situation," wrote Andrew Ruddick, Andy Liddle and Malcolm Moore in a 2008 white paper.

Using the principles, concepts and detailed road map outlined in the book Visual Six Sigma – along with JMP – you can broaden and deepen data analysis and process improvement in your organization by making the tools intuitive and easy to use. Plus, the results are easy to interpret! You will be able to quickly see the important and useful patterns in your data – enabling you to improve processes, connect with customer needs and expectations, react to emerging market trends, and seize opportunities for growth.

This second edition incorporates ways to take advantage of developments that make the implementation of Visual Six Sigma even easier, further increasing the scope and efficiency of its application. The book was updated using JMP 12.2.0 (with detailed instructions and illustrative screenshots demonstrating the latest functionalities in JMP and JMP Pro). It also includes two new chapters: "Managing Data and Data Quality" and "Beyond 'Point and Click' with JMP."

Visual Six Sigma is a powerful way to help you focus on the relevant and important data you have and to use this data effectively. According to Ruddick, Liddle and Moore, visual approaches facilitate rapid exploration of the data to quickly find the " 'hot Xs,' the process inputs responsible for driving variation in product quality or associated with variation in product quality."

Now for some fun…

61985If you are among the first seven people to comment on this blog post describing your strategies for making data analysis lean in your organization, you could win a hardcover copy of Visual Six Sigma: Making Data Analysis Lean. (This is the first edition of the book – only seven of them are left!)

Be sure to enter your e-mail address when you write your comment so we can contact you if you are a winner. Only one book per commenter and for U.S. addresses only.

In addition, SAS is turning 40, and we're celebrating you, our users! Every Friday in July, we'll feature a special offer for SAS and JMP users. The discounts include specials on training, certification, books and SAS events.

How do you get these rewards?

  • Follow us on Twitter @SASSoftware or Facebook.
  • Watch these social channels each Friday in July to see the special offers.
Post a Comment

Helping clinical trials run better, faster

JMP Clinical enables everyone involved in clinical trials to work better and faster together -- and keep fraud out.

JMP Clinical enables everyone involved in clinical trials to work better and faster together -- and to help improve safety and data quality.

As you read this post over your afternoon coffee, scientists all over the world are hard at work trying to prevent the spread of deadly viruses, and cure and treat debilitating illnesses like cancer, HIV and Alzheimer’s.

When a breakthrough happens and one of those scientists puts her finger on a potentially helpful drug, her laboratory faces a new obstacle: the clinical trial.

With JMP Clinical, medical monitors, medical writers, clinical operations and reviewers can evaluate clinical trials efficiently and effectively. The latest version, JMP Clinical 6, which was released June 24, makes it possible for all involved to perform their jobs better and faster.

Designed for all organizations involved in clinical trials, including clinical research organizations (CROs), pharmaceutical and biotechnology companies, regulatory agencies, and medical universities, JMP Clinical has been on the market for seven years and is the gold standard in the industry, said Geoffrey Mann, JMP Life Sciences Product Manager. The new release makes an already-popular product much easier to use, and includes new tools that will help save time and money, and ultimately, produce better drugs.

Organizations of all sizes find value in the software. “While large pharmaceutical companies already have more than 100 copies of our software, the little company that has five employees is able to behave like a large pharmaceutical company when it uses JMP Clinical,” Mann said.

What’s new about JMP Clinical 6?

Every new drug that comes to market has to go through three rigorous phases of clinical trials – such trials produce an enormous amount of data that has to be sorted and analyzed to determine the answers to questions, such as:

  • What are the most common side effects of the drug?
  • Do side effects occur in certain populations more than others?
  • Does the drug do what it’s supposed to do?
  • What is the best dosage?

The all-new user interface of JMP Clinical was built to make it easier to answer these questions quickly and correctly. “It reduces the work by fivefold to tenfold,” said Mann.

The new user interface allows for both a tabulation and a visualization of the data. “Users can generate any table they want, and it’s interactively filtering according to the reviewer’s specifications,” he said. Users can also print or download static views of tables or visualizations as PDFs or PowerPoint slides.

JMP Clinical 6 facilitates collaboration across various divisions and between multiple users. “These configurations let you share all of your data and reviews with anyone in the world,” said Mann.

The new risk-based monitoring tools in JMP Clinical 6 are especially important, as data quality has always been a major issue in clinical trials. Mann tells a story about a data scientist at a CRO who was found to have been falsifying data for years. If that company had been using JMP Clinical, the problem would have been uncovered immediately, saving time and money, and also would have prevented bad drugs from going to market.

“The software runs all kinds of algorithms to find data quality issues, and it will discover things humans alone could never find,” said Mann.

The software helps to ensure that all parties involved in clinical trials stay honest. “It’s a check on everybody,” said Mann.

JMP Clinical helps regulatory agencies hold pharmaceutical companies and CROs to the highest standard. It helps pharmaceutical companies bring lifesaving drugs to the market faster. And it helps CROs ensure that their studies are free of falsification. This means that when that miracle drug is finally released – whether it’s a cure for cancer or a Zika vaccine – you’ll be able to trust it.

Learn more about JMP Clinical at the JMP website.

Post a Comment

Video: Red triangles in JMP

The little red triangles in JMP are ubiquitous, hard-working and powerful!

Here's a quick video by my colleague Ryan DeWitt on these drop-down menus that some users call "hot spots," "inverted triangles" or just plain old "triangles."

Subscribe to the JMP channel on YouTube to see the latest videos.

Post a Comment

Graph Makeover: Bars on a log scale

Every once in a while, I run across a bar chart on a log scale, and it always feels wrong. At first glance, I compare the bar lengths and start making comparisons. But eventually, I notice the log scale on the axis and try to convince my brain to forget everything it just saw and just compare the tops of the bars against the axis scale. In that sense, bars on a log scale are a special case of bars without a meaningful baseline.

Here’s a recent example I saw, comparing speeds for reading CSV files (comma-separated value text files).


The source of the comparison is a white paper from the vendor for the coral-colored tool, ParaText from wise.io, showing how fast it is. The company can hardly be accused of deception in the visualization since using a log scale only makes the competitor speeds look closer to its own speeds. It's about 10x faster than R readr but looks only 2x faster. The only advantage ParaText gets from the log scale is that its speed looks very close to the black I/O bandwidth bar (the upper limit) when, in fact, the speeds are about half the I/O bandwidth.

Like any other non-trivial endeavor, data visualization often involves conflicting constraints that must be balanced. Yes, using bars on a log scale certainly interferes with gaining insight from the graph, but it’s possible that all the alternatives are worse. That’s why I always look at alternatives when making assessments of data visualizations.

Log scales are most useful when the underlying data is very skewed or varies by many orders of magnitude. This speed data is both skewed and varied, but not terribly so. The maximum variation is about 200:1, which is only two orders of magnitude. Immediately, we can try two variations on this chart:

  1. Keep the bars and change the scale to linear.
  2. Keep the log scale and change the bars.

Here’s a straightforward conversion to a linear scale. Using JMP, I’ve scaled all the values to be relative to the I/O bandwidth, so the black bars are not shown since they would all be at 100%.


I haven’t labeled the bars with values. I don’t think all the bars need labeling with exact values (I'd rather have a supporting table for that). But if I were sharing this in a report, I would try labeling the highest bar or two in each category for some grounding. I find all the rotated labels in the original to detract from the visual representation of the data and take too much effort to read.

The linear scale is not bad, and I already like it better than the original in that it portrays the speed differences among the products directly. One weakness of both charts is that the product labels are separated from their bars. Rotating the bars at least puts the bars and the legend labels into the same arrangement.


A different grouping hierarchy lets us label the product bars directly.

Comparing across tests is now less direct, but I’m thinking that’s a less important comparison.

Now let’s go back to the beginning and try keeping the log scale and changing the data elements. Here’s a view using points and lines instead of bars.


The points themselves are enough to carry the position information, but the lines add connection information, which helps simplify the labeling. In general, line segments carry three connotations:

  1. Interpolation (continuous)
  2. Connection (categorical)
  3. Pattern recognition (continuous or categorical)

Interpolation doesn’t make much sense here since our x-axis is categorical, so that’s a detraction here. But connection is very valuable, and pattern recognition is informative, too. For instance, we notice a couple products have the same up-down-up pattern.

With the lines labeled in place, the color is not as necessary. While the color does help distinguish intersecting lines and help the data lines stand out from the grid lines, there is enough separation that we can try using color for technology group (R, Python or specialty) rather than individual labels.


That makes the chart less busy but keeps the advantages of color.

The chart looks nice, but does it work? We still have a log scale, which still requires more thinking. But at least now the data elements are not in such conflict with the scale, and we have more room to show grid lines that reinforce the non-linearity. The log scale makes it easier to understand the differences between values across the entire range. In particular, we can see how the low values differ from each other better than we can on a linear scale.

It’s interesting to me that the data itself makes such a big difference in the usefulness of each chart option. The linear scale is at its limit of usefulness with differences around 10x. If the differences were more like 1000x, a linear scale would be useless. And if the values were too similar across products, the points would be obscuring each other and less useful.

Having seen a few possibilities, which is most effective for understanding the performance? Or would something else entirely be better?

Post a Comment