Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, how to execute CAS actions, and how to summarize columns. Now it's time to focus on how to rename columns in CAS tables.
Load and explore data
In SAS Viya we can use the Samples caslib for demonstration data. I'll load the data into the distributed CAS server using my CAS connection object conn, followed by the loadTable action. The loadTable action loads a server-side file into memory. Then I'll reference the CAS table in the tbl object and view the column information using the columnInfo CAS action. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.
conn.loadTable(path = 'WATER_CLUSTER.sashdat', caslib = 'samples', casOut = dict(caslib = 'casuser')) tbl = conn.CASTable('water_cluster', caslib='casuser') tbl.columnInfo() |
Rename CAS table columns
To rename a column in a CAS table you will need to use the alterTable CAS action. The alterTable action can rename columns, but it can also rename tables, add labels to tables, drop or keep columns, and modify column attributes like column labels. This post will only focus on how use the rename feature of the action.
As of the SWAT package version 1.11.0 there is no Pandas equivalent to rename the column. Techniques like the rename method or by explicitly assigning a list of new column names to the CASTable object are not available.
Rename a single column
For example, what if you want to rename the Year column to Year_Value? To rename columns with the alterTable action you specify the columns parameter with a list of dictionaries. Each dictionary specifies the column to modify. In this example specify the columns parameter and add a list with a single dictionary. The dictionary requires the name key to specify the column name and the rename key specifies the new column name. Lastly, execute the columnInfo action to view the updated columns names.
tbl.alterTable(columns = [ {'name':'Year', 'rename':'Year_Value'} ]) tbl.columnInfo() |
Notice in the results the Year column has been renamed to Year_Value.
Rename multiple columns
To rename multiple columns you simply add additional dictionaries to the list. For example, if you want to rename the Month column to Month_Value, and the Day column to Day_Value, add two dictionaries to the list. Here I'll create the dictionary in the variable renameColumns then use that in the alterTable action.
## Columns to rename renameColumns = [ {'name': 'Month', 'rename':'Month_Value'}, {'name': 'Day', 'rename':'Day_Value'} ] ## Rename the CAS table columns tbl.alterTable(columns = renameColumns) ## View the column information of the CAS table tbl.columnInfo() |
In the results notice that the Month column has been renamed to Month_Value, and the Day column has been renamed to Day_Value.
Dynamically rename columns
What if you want to dynamically rename columns? For example, what if you want to remove all underscores in the column names and lowercase all characters? You could manually create a dictionary for each column to rename, but that's a lot of manual work. Instead, you can use a Python list comprehension to create a list of dictionaries with the columns to rename. I'll store the results of the list comprehension in the variable newColumnNames.
newColumnNames = [{'name':colName, 'rename':colName.replace("_","").lower()} for colName in tbl.columns] display(newColumnNames) # and the results [{'name': 'Year_Value', 'rename': 'yearvalue'}, {'name': 'Month', 'rename': 'month'}, {'name': 'Day', 'rename': 'day'}, {'name': 'Date', 'rename': 'date'}, {'name': 'Serial', 'rename': 'serial'}, {'name': 'Property', 'rename': 'property'}, {'name': 'Address', 'rename': 'address'}, {'name': 'City', 'rename': 'city'}, {'name': 'Zip', 'rename': 'zip'}, {'name': 'Lat', 'rename': 'lat'}, {'name': 'Long', 'rename': 'long'}, {'name': 'Property_type', 'rename': 'propertytype'}, {'name': 'Meter_Location', 'rename': 'meterlocation'}, {'name': 'Clli', 'rename': 'clli'}, {'name': 'DMA', 'rename': 'dma'}, {'name': 'Weekday', 'rename': 'weekday'}, {'name': 'Weekend', 'rename': 'weekend'}, {'name': 'Daily_W_C_M3', 'rename': 'dailywcm3'}, {'name': 'Week', 'rename': 'week'}, {'name': 'US Holiday', 'rename': 'us holiday'}, {'name': 'CLUSTER', 'rename': 'cluster'}] |
Notice it creates a list of dictionaries, one for each column to rename. The list comprehension removed all underscores if they exist in a column name, then lowercases all the characters in the column.
Now that the list of dictionaries is created, add it to the alterTable action's columns parameter.
tbl.alterTable(columns = newColumnNames) tbl.columnInfo() |
Summary
The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques to work with your distributed CAS tables. However, in this example, familiar Pandas techniques to rename columns is currently not available in the SWAT package. Instead, you can use the alterTable action to easily rename columns.