I blog about a lot of topics, but the following five categories represent some of my favorite subjects. Judging by the number of readers and comments, these articles have struck a chord with SAS users. If you haven't read them, check them out. (If you HAVE read them, some are worth re-reading!)
- SIMULATION: If you run simulations in SAS, you had better understand the four essential functions for statistical programmers. Simulation depends on random samples, so it is good to know how to generate random numbers in SAS. Lastly, it is important to understand random number streams in SAS and how they work.
- MATRIX COMPUTATIONS: In the article "Solving linear systems: Which technique is fastest?" I show that solving a specific linear system is about four times faster than solving for a general inverse. This article also inspired the popular article, "What is the chance that a random matrix is singular?"
- STATISTICAL PROGRAMMING: You can't program something if you don't understand it. In the article "What is Mahalanobis distance?" I describe the geometry of Mahalanobis distance, which provides a way to measure distances that takes into account correlations in the data. This article is linked to from Wikipedia because it is an "intuitive illustrated explanation." I've also written several other articles related to multivariate statistics:
- Detecting outliers in SAS: Multivariate location and scatter, which desribes how to use SAS software to find multivariate outliers
- How to compute Mahalanobis distance in SAS
- Testing data for multivariate normality, which uses Mahalanobis distance to assess the distribution of multivariate data.
- The curse of dimensionality: How to define outliers in high-dimensional data?
- SAS PROGRAMMING: When you run SAS 9.3 in the windowing environment, HTML is the default output destination. The article "How to clear the output window in SAS 9.3" describes how to clear the tables and graphics in the HTML destination. If you haven't yet adopted SAS 9.3, read the top five features of SAS 9.3 that every SAS user will love.
- STATISTICAL THINKING: The internet is great for a lot of things, but not for estimating the relative popularity of topics. In the article "Estimating popularity based on Google searches: Why it's a bad idea" I argue that an estimate of popularity that is based on internet searches is a biased estimate. Statisticians at Google don't make this mistake and neither should you.
4 Comments
As a statistician working with SAS, often SAS IML, I felt that the improvements made to IML in SAS 9.22 (and 9.3) were a huge step forward - kudos for that.
The only thing that immediately springs to my mind when thinking about improvements I would like to see is an improvement of the enhanced editor. Just a few simple features like telling you which bracket you just closed or highlighting the line on which an error occurred. Yes, I could use SAS enterprise guide but on any computer I have tried it so far it has been rather laggy.
Hi, I wish there are three improvements in SAS could be made in future. One is about large scale optimization problem, especially for the non-linear optimization problems. The second is about generalized linear mixed models. The SAS procedure NLMIXED is not efficient enough and tends not to converge to obtain the valid estimate of parameters. The last one is about Markov Chain Monte Carlo. This is an attractive but tricky technique. I know there are some improvements on the proc MCMC in SAS 9.3, but I wish this proc will be more efficient in the future versions.
The NLP solvers available in PROC OPTMODEL are much improved in both 9.3 and the upcoming release 12.1 due later this year. Difficult customer instances help us improve our solvers, so please send them to tech support.
Hi, here are a few things that I think would allow me to work more efficiently. In Enterprise Guide it would be very useful to be able to have a text box. Currently you can add a note but it really blends in with everything else. A text box would allow for some higher level commenting, especially since I use Guide a lot for exploratory work. I would be able to place text describing what a certain portion of the diagram is doing. Another thing that would help this is to be able to change background colors. These could assist the text box for marking a grouping that is implementing some functionality. I also realize that sometimes I have too many resulting data sets from a program node. Sometimes this tells me that I need to separate some of the code but other times it is needed. Is there a way to a pseudo library node, if checked or option given all resulting data sets would be placed in this node. It currently uses up to much landscape by having them result in a vertical list. It would be very nice to open it up and have a new nested part of the process flow with data sets laid out in more of an array structure. The alternative is to turn off layout which has much uglier results that are harder to work with. These are all mostly cosmetic.
One feature that would be very helpful comes from something my project did in Matlab Simulink. The diagrams themselves could be looked at as text. So we could place a version control on the project. There was a higher level model which had various subsystems that could all be worked with independently of each other, even though there was a logical dependence. This same need arises in SAS. It would be amazing if there was some sort of hierarchy, such as a project then a process flow and then a node. We mostly use program nodes, but really all tasks nodes are really a wrapper around a program. Then there are notes and data sets as well. I think it would really advance the whole platform if worked like this and version control could be applied to this. This structure would allow for a better team use for larger projects and code reuse throughout multiple projects. Thanks, I hope these are inline with what you were asking.