Credit score modeling in SAS Enterprise Miner: Reject inference to solve sample bias problem

zhangEditor's note: This following post is from Xiaoyuan Zhang, presenter at an upcoming Insurance and Finance User Group (IFSUG) webinar.

Learn more about Xiaoyuan Zhang.


As a business user with limited statistical skills, I don’t think I could build a credit scorecard without the help of SAS Enterprise Miner. As you can see from the flow chart, SAS Enterprise Miner, a descriptive and predictive modeling software, does an amazing job in model developing and streamlining.

credit_score_modeling-in-sas-enterprise-minerThe flow chart presents my whole credit score modeling process, which is divided into three parts: creating the preliminary scorecard, performing reject inference, and building the final scorecard. I will cover the whole process in the Insurance and Finance Users Group (IFSUG) virtual session on Feb 3, 2017. In this blog I wanted to emphasize the second part, which is sometimes easy to ignore.

The data for preliminary scorecard is from only accepted loan applications. However, the scorecard modeler needs to apply the scorecard to all applicants, both accepted and rejected. To solve the sample bias problem reject inference is performed.

Before inferring the behavior (good or bad) of the rejected applicants, data examination is needed. I used StatExplore node to explore the data and found out that there were a significant number of missing values, which is problematic. Because in SAS Enterprise Miner regression model, the model that is used here for scorecard creation and reject inference, ignores observations that contain missing values, which reduces the size of the training data set. Less training data can substantially weaken the predictive power of the model.

To help with this problem, Impute Node is used to impute the missing values. In the Properties Panel of the node, there are a variety of choices from which the modeler could choose for the imputation. In this model, Tree surrogate is selected for class variables and Median is selected for interval variables.

However, in Impute Node data role is set as Train. In order to use the data in Reject Inference Node, data role needs to be changed into Score. A SAS Code node is put in between for this purpose, which writes as:

data &em_export_score;
      set &em_import_data;   
   run;

Last but not least, Reject Inference Node is used to infer the performance of the rejected loan applicant data. SAS Enterprise Miner offers three standard, industry-accepted methods for inferring the performance of the rejected applicant data by the use of a model that is built on the accepted applicants. We won’t explore the three methods in detail here, as the emphasis of the blog is on the process.

To hear more on this topic, please register for the IFSUG virtual session, Credit Score Modeling in SAS Enterprise Miner on February 3rd from 11am-12pm ET.


About Xiaoyuan Zhang

Xiaoyuan Zhang grew up in Zhaoyuan China on the coast of the Bohai sea. Her town is famous for its ancient gold mine, hot springs and its unusual and tasty seafood. Her undergraduate degree is from China Agricultural University in Bejing, where she majored in Marketing Intelligence and graduated with honors. She graduated, with honors, from Drexel University with a Master Degree in Finance. She has passed two CFA exams and learned Enterprise Miner in one of her courses. She specializes in efficient credit score modeling with unutilized SAS Enterprise Minor. She is using some of her post-graduation free time to study "regular SAS", to tutor and to volunteer.

 

Post a Comment

Comparing SAS/GRAPH® 9.4 capabilities with SAS/GRAPH® Version 6

SAS/GRAPH 9.4 capabilitiesI remember my grandparents talking about how hard things were for them growing up. They would say, “Things were so bad that we had to walk uphill, both ways, in the freezing snow to get to school.” It was always hard for me to relate to these statements because the school bus picked me up at the end of my driveway. Fast forward to today and people are riding on hoverboards. Through the years, advancement in transportation has made it easier for us to get where we need to be.

SAS/GRAPH® Version 6

The evolution of SAS/GRAPH® is similar. In the earlier days of SAS® software, during Version 5 and 6 of SAS/GRAPH, I understood how difficult it was to create some of the graphs that customers wanted. A customer recently asked whether I could send him the code that produced the graph below, which he found in the SAS/GRAPH® Software: Reference, Volume 1 Version 6 Edition:

sasgraph-9-4-capabilities

In Version 6, the only way to create this graph, referred to as a butterfly chart, was by using SAS/GRAPH and the Annotate facility. The annotation statements added over 60 lines of code to the program.

Below is a snippet of the Version 6 program that created the bars on the left side of the graph.

Click this link to see the entire Version 6 program.

     /* female bars on left */
    %bar(39.8, 10.5, 25.0, 20.0, blue, 0, solid);
    %bar(39.8, 20.71, 15.0, 30.7, green, 0, solid);
    %bar(39.8, 31.42, 10.0, 41.42, red, 0, solid);
    %bar(39.8, 42.14, 32.0, 52.14, blue, 0, solid);
    %bar(39.8, 52.85, 33.0, 62.85, green, 0, solid);
    %bar(39.8, 63.57, 36.0, 73.57, red, 0, solid);
    %bar(39.8, 74.28, 35.0, 84.28, blue, 0, solid);
    %bar(39.8, 85.0, 33.0, 95.0, green, 0, solid);

SAS/GRAPH® 9.4

Fast forward to SAS® 9.4. ODS Graphics and SG procedures have been part of Base SAS® since SAS® 9.3, making it much easier to create high-quality graphs without using additional software. The graph that was previously created using the Annotate facility can now be created using the Graph Template Language (GTL) and the SGRENDER procedure. Here is the program to create the entire graph:

Read More »

Post a Comment

Take a SAS security journey at SAS Global Forum 2017

Editor's note: Charyn Faenza co-authored this blog. Learn more about Charyn.

As the fun of the festive season ends, the buzz of the new year and the enchantment of SAS Global Forum 2017 begins. SAS Global Forum is a conference designed by SAS users, for SAS users, bringing together SAS professionals from all over the world to learn, collaborate and network in person. Sure, online communication is great, but it’s hard to beat the thrill of meeting fellow SAS users face-to-face for the first time. It feels like magic! To help you prepare for the event, Charyn and I wanted to share a few things including information on metadata security. Read on for more.

Start your SAS Global Forum journey now!

SUGAWant to stay up to date with SAS Global Forum activities, and get a head start on your conference networking? Join the SAS Global Forum 2017 online community. Here you can post questions, share ideas, and connect with others before the event. While you are at it, the SAS User Group for Administrators (SUGA) community also feels magical for me.  As part of the committee, we regularly get together (virtually!) to discuss and plan exciting events on behalf of SAS administrators around the world.  Join the SUGA community and watch for upcoming events, including a live meet-up at SAS Global Forum! That event is scheduled for Monday, April 3, from 6:30-8:00 p.m.

Security auditing

During his workshop at SAS Global Forum 2014, Gregory Nelson pointed out that the SAS administrator role has evolved over the years, and so has one of their key responsibilities: security auditing. Once you’ve set up an initial security plan, how do you ensure that the environment remains secure? Can you just “set it and forget it?” Probably not. Especially if you want to ensure regulatory compliance, to maintain business confidence and keep your SAS platform in line with its design specifications as your business grows and your SAS environment evolves.

Thinking about your own SAS platform:

  • What would happen in your organization if someone accessed data they shouldn’t?
  • When was your last SAS platform security project?
  • When was it last tested? How extensive was it? How long did it take?
  • Have there been any changes since it was last tested? Whether they are deliberate, accidental, expected or unexpected.
  • How do you know if it’s still secure today?

Read More »

Post a Comment

Five great analytics resolutions for 2017

analytics resolutionsThe holiday season is over – and you survived. You’ve made a lot of personal resolutions for 2017 - go to the gym, eat less sugar, save more money, visit Grandma more often. These are all great personal resolutions for 2017, but what about your analytics resolutions? If you are having trouble with your analytics resolutions then let us help you out. The 14.2 release of the SAS analytical products will help you make 2017 your best analytics year yet.

Resolution 1: Build more accurate models faster!

Now you will be able to leverage the power of the two most advanced analytics platforms on the market, SAS 9 and SAS Viya from one interface. Using SAS/Connect, users can call powerful SAS Viya analytics from within a process flow in Enterprise Miner. Would you prefer to use the super-fast, autotuned gradient boosting in SAS Viya? No problem! Call SAS Viya analytics directly from Enterprise Miner using the SAS Viya Code node. Then, from the same process flow you can also call open source models, all from one interface, SAS Enterprise Miner. Do you prefer to use SAS Studio on SAS 9? You will also be able to call SAS Viya analytics from SAS Studio as well. With SAS 9 M4, SAS gives you the ability to use both of SAS’ powerful platforms from one interface.

Resolution 2: Score your unstructured models in Hadoop without moving your data!

Got Hadoop? Got a lot of unstructured data? Now SAS Contextual Analysis allows you to score models in Hadoop using the SAS Code Accelerator add-on. Identify new insights with your unstructured text without ever having to move your data. Score it all in Hadoop. Uncover new trends and topics buried in documents, emails, social media and other unstructured text that is stored in Hadoop. You will be able to do it faster because you won’t have to move that data outside of Hadoop. SAS just keeps getting better in 2017.

Read More »

Post a Comment

Junior Professional Program helps new users attend SAS Global Forum 2017

sasgf2017_globe_150x150-002Regardless of how long they’ve used the software, there’s no better event for SAS professionals than SAS Global Forum. The conference will attract thousands of users from across the globe and is an excellent place to network with and learn from users of all skill levels. To help those relatively new users of SAS experience the conference for the first time, the conference offers the Junior Professional Award program.

The program is designed exclusively for full-time SAS professionals who have used SAS on the job for three years or less, have never attended SAS Global Forum, and whose circumstances would otherwise keep them from attending. But, don’t let the word “junior” confuse you. All “new” SAS professionals regardless of age are eligible.

The Junior Professional award provides users with a waived conference registration fee, including conference meals, a free pre-conference tutorial, and great opportunities to learn from and network in a large community of SAS users. The program does not cover other costs associated with attending the event (travel and lodging, for example, are not included).

To apply, users need to submit fill out the online application form. Award applications must be received by January 16, 2017. Questions can be directed to the Junior Professional Program Coordinator, whose contact information can be found on the website.

To learn more about the award and its benefits, I recently sat down with one of the 2015 winners, Shavonne Standifer.


junior-professional-program

Shavonne Standifer, 2015 SAS Global Forum Junior Professional Award winner

Larry LaRusso: Hello Shavonne. First of all, let me congratulate you on winning a past award. That’s a great accomplishment, for sure. So tell me, how did you first learn about the program?
Shavonne Standifer: Interestingly, I wasn’t looking specifically for the award and didn’t even really know it existed. I was searching for a SAS proceeding paper and somehow stumbled across the application. I just applied, and got it!

LL:  That’s awesome. What made you want to attend SAS Global Forum?
SS: I knew a little bit about the event and really wanted to attend so that I could take advantage of the hands-on learning opportunities. I also thought it would be super cool if I could attend the lectures of my favorite SAS authors, and I knew many of them planned to present.

LL: What were your first impressions of the event?
SS: I was amazed by how many people were there. I was also amazed by how nice and helpful everyone was. I met so many new friends.

LL: What was the best part of your Global Forum experience?
SS: The best part of my experience by far was when I met John Amrhein. We met during a networking event in the Quad. After subjecting him to a 2-minute rant about how much I loved SAS software, and all of the reasons why, he finally had a minute to introduce himself and mentioned that he was the 2017 global forum conference chair. I was completely shocked! To my complete surprise, he encouraged me to be a part of his team, to which I later applied and was accepted.

Read More »

Post a Comment

Word scatter plot with SAS

In my last blog, I showed you how to generate a word cloud of pdf collections. Word clouds show you which terms are mentioned by your documents and the frequency with which they occur in the documents. However, word clouds cannot lay out words from a semantic or linguistic perspective. In this blog I’d like to show you how we can overcome this constraint with new methods.

Word embedding has been widely used in Natural Language Processing, where words or phrases from the vocabulary are mapped to vectors of real numbers. There are several open source tools that can be used to build word embedding models. Two of the most popular tools are word2vec and Glove, and in my experiment I used Glove. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Suppose you have obtained the term frequencies from documents with SAS Text Miner and downloaded the word embedding model from http://nlp.stanford.edu/projects/glove/. Next you can extract vectors of terms using PROC SQL.

libname outlib 'D:\temp';
* Rank terms according to frequencies;
proc sort data=outlib.abstract_stem_freq;
   by descending freq;
run;quit;
 
data ranking;
   set outlib.abstract_stem_freq;
   ranking=_n_;
run;
 
data glove;
   infile "d:\temp\glove_100d_tab.txt" dlm="09"x firstobs=2;
   input term :$100. vector1-vector100;
run;
 
proc sql;
   create table outlib.abstract_stem_vector as
   select glove.*, ranking, freq  
   from glove, ranking
   where glove.term = ranking.word;
quit;

Read More »

Post a Comment

More SAS Studio Tips for SAS Grid Manager Administrators: Global Settings

We have seen in a previous post of this series how to configure SAS Studio to better manage user preferences in SAS Grid environments. There are additional settings that an administrator can leverage to properly configure a multi-user environment; as you may imagine, these options deserve special considerations when SAS Studio is deployed in SAS Grid environments.

SAS Studio R&D and product management often collect customer feedback and suggestions, especially during events such as SAS Global Forum. We received several requests for SAS Studio to provide administrators with the ability to globally set various options. The goal is to eliminate the need to have all users define them in their user preferences or elsewhere in the application. To support these requests, SAS Studio 3.5 introduced a new configuration option, webdms.globalSettings. This setting specifies the location of a directory containing XML files used to define these global options.

Tip #1

How can I manage this option?

The procedure is the same as we have already seen for the webdms.studioDataParentDirectory property. They are both specified in the config.properties file in the configuration directory for SAS Studio. Refer to the previous blog for additional details, including considerations for environments with clustered mid-tiers.

Tip #2

How do I configure this option?
By default, this option points to the directory path !SASROOT/GlobalStudioSettings. SASROOT translates to the directory where SAS Foundation binaries are installed, such as /opt/sas/sashome/SASFoundation/9.4 on Unix or C:/Program Files/SASHome/SASFoundation/9.4/ on Windows. It is possible to change the webdms.globalSettings property to point to any chosen directory.

SAS Studio 3.6 documentation provides an additional key detail : in a multi-machine environment, the GlobalStudioSettings directory must be on the machine that hosts the workspace servers used by SAS Studio. We know that, in grid environments, this means that this location should be on shared storage accessible by every node.

Read More »

Post a Comment

Transform your technical talks with an audience centered approach

melissa_marshallEditor's note: This following post is from Melissa Marshall, Principal at Melissa Marshall Consulting LLC. Melissa is a featured speaker at SAS Global Forum 2017, and on a mission to transform how scientists and technical professionals present their work.  

Learn more about Melissa.


Think back to the last technical talk you were an audience member for. What did you think about that talk? Was it engaging and interesting? Boring and overwhelming?  Perhaps it was a topic that was important to you, but it was presented in a way that made it difficult to engage with the content. As an expert in scientific presentations, I often observe a significant “disconnect” between the way a speaker crafts a presentation and the needs of the audience. It is my belief that the way to bridge this gap is for you, as a technical presenter, to become an audience centered speaker vs. a speaker centered speaker.

transform-your-technical-talks01

Here I will provide some quick tips on how to transform your content and slides using your new audience centered speaking approach!

Audience Centered vs. Speaker Centered

The default setting for most presenters is that they are speaker centered—meaning that they make choices in their presentation because it is what works primarily for themselves as a speaker. Examples include: spending a lot of time speaking about an area of the topic that gave you the most difficulty or that you spent the most amount of time working on or using terms that are familiar to you but are jargon for the audience, putting most of the words you want to say on your slides to remind you what to say during the talk so your slides are basically your speaker notes, and standing behind a podium and disconnecting yourself physically from your audience. These choices are common in presentations, but they do not set you up for success. It is a key reason why many presentations of technical information fail.

A critical insight is to realize that your success as a speaker depends entirely upon your ability to make your audience successful.  You don’t get to decide that you gave a great talk (even if no one understood it)!  That’s because presentations, by their very nature, are always made for an audience.  You need something from your audience—that is why you are giving a talk!  So, it is time to get serious about making your audience successful (so you can be too!).  I might define “audience success” as: your audience understands and views your subject in the way you wanted them to.  Strategically, if you desire to be a successful speaker, then the best thing you do is go “all in” on making your audience successful!

Audience Centered Content

To make your content more audience centered, you can ask yourself 4 critical questions ahead of time about your audience:

  • Who are they?
  • What do they know?
  • Why are they here?
  • What biases do they have?

transform-your-technical-talks02

Read More »

Post a Comment

Easier Space Management for EV Data Mart Tables in 9.4M4

The report-ready SAS Environment Manager Data Mart has been an invaluable addition to SAS 9.4 for SAS administrators. The data mart tables are created and maintained by the SAS Environment Manager Service Architecture Framework and provide a source of data for out-of-the box reports as well as custom reports that any SAS administrator can easily create. As you can imagine, the size of the tables in the data mart can grow quite large over time so balancing the desired time span of reporting and the size of the tables on disk requires some thought. The good news: SAS 9.4 M4 has made that job even easier.

The Environment Manager Data Mart (EVDM) has always provided a configuration setting to determine how many days of resource records to keep in the data mart tables. You can see below that in a fresh SAS 9.4 M4 installation, the default setting for “Number of Days of Resource Records in Data Mart” is set to 60 days. This means that EVDM data records older than 60 days are deleted from tables whenever the data mart ETL process executes.

EV Data Mart Tables in 9.4M4

The space required to house the Environment Manager Data Mart is split across three primary areas.

  • The ACM library tables contain system level information
  • The APM library tables contain audit and performance data culled from SAS logs
  • The KITS library tables contains miscellaneous tables created by data mart kits that collect specialty information about HTTP access, SAS data set access, and such.

Read More »

Post a Comment

Dr. Strangeformat or: How I Learned to Stop Joining Tables and Love the PROC

Joining tables with PROC FORMAT

The title of this post borrows from Stanley Kubrick’s 1964 comedy “Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb.” It stars the great Peter Sellers as the titular character as well as George C. Scott and Slim Pickens. The black and white film is strange and brilliant just like Kubrick was. Similarly, as I was experiencing the issue I outline below and was told of this solution, I thought two things. The first was “what a strange but brilliant solution” and the second one was “I’ll try anything as long as it works.”   Thus, a title was born. But enough about that. Why are we here?

Problem

You want to add a couple of columns of information to your already large dataset, but each time you try to join the tables you run out of memory!  For example, you want to append latitude and longitude values from Table B to an existing list of customer phone numbers in Table A.

You’ve tried this and got nowhere fast:

proc sort data = demo.tablea;
by npa nxx;
proc sort data = demo.tableb;
by npa nxx;
run;
 
data demo.aunionb;
merge demo.tablea (in=a) demo.tableb (in=b);
by npa nxx;
if a;
run;

And then you tried this and also got nowhere (albeit a little slower):

proc sql;
   	create table demo.aunionb as 
   	select (*),
	from demo.tablea a
left join demo.tableb b on (a.npa = b.npa) and (a.nxx = b.nxx);
quit;

Solution - Joining tables with PROC FORMAT

Use PROC FORMAT!

Here’s how:

Read More »

Post a Comment