SAS® Fast-KPCA implementation bypasses the limitations of exact KPCA methods. SAS® internally uses k-means to find a representative sample of a subset of points. This row reduction method has the advantage that c centroids are chosen to minimize the variation of points nearest to each centroid and maximize the variation to the other cluster centroids. In some cases, the downstream effect of using k-means on computing the SVD increases numerical stability and improves clustering, discrimination, and classification.
Tag: data mining
Several weeks ago, I wrote about practical advice from a Chief Data Scientist in my blog “From Aristotle to Pi: Practical advice from a chief data scientist.” Now I want to offer my advice as a newbie trying to navigate through machine learning concepts and how to code them. Over
Who says machine learning can't be fun? A crew of us from SAS went to San Francisco for the recent KDD conference, which bills itself as "a premier interdisciplinary conference, [which]brings together researchers and practitioners from data science, data mining, knowledge discovery, large-scale data analytics, and big data." We brought
Optimization for machine learning is essential to ensure that data mining models can learn from training data in order to generalize to future test data. Data mining models can have millions of parameters that depend on the training data and, in general, have no analytic definition. In such cases, effective models
When you go to the grocery store, you see that items of a similar nature are displayed nearby to each other. When you organize the clothes in your closet, you put similar items together (e.g. shirts in one section, pants in another). Every personal organizing tip on the web to
It is said that everything is big in Texas, and that includes big data. During my recent trip to Austin I had the privilege of being a judge in the final round of the Texata Big Data World Championship, a fantastic example of big data competitions. It felt fitting that
SAS is hosting this year’s European Analytics 2015 conference in Rome November 9 – 11. This three-day inspiring event will give you the chance to boost your company’s analytics culture in an international environment to make sure your knowledge and expertise meet the demands of the digital era. But what if
Every time I pick up a new article about analytics, I am always disappointed by the fact that I cannot find any specifics mentioned about back-end processing. It is no secret that every vendor wishes they had the latest and greatest parallel processing capabilities, but the truth is that many
At the KDD conference this week I heard a great invited presentation called How to Create a $1 billion Model in 20 days: Predictive Modeling in the Real World – A Sprint Case Study. It was presented by Tracey de Poalo from Sprint and former Kaggle President and well known
Looking forward, ten of my SAS colleagues and I are heading to New York City this weekend for KDD 2014: Data Science for the Social Good, which runs August 24-27. This event’s full name is the 20th Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data Mining,