Like most people, I believed that process of diagnosing and treating cancer begins with a biopsy. If cancer is suspected, a doctor will extract a small tissue sample -- usually a tiny cylindrical "core sample" -- and examine it for cancer cells. No cancer cells found -- that's good news! But if cancer cells are present, then you have decisions to make about treatment.
A young woman named Richa Sehgal taught me that it's not so simple. There aren't just two types of cells (cancerous and non-cancerous). There are actually several types of cancer cells, and these do not all have the same importance when it comes to effective cancer treatment. I learned this from Richa during her presentation at Analytics Experience 2018 -- a remarkable talk for several reasons, not the least of which is this: Richa Sehgal is a high school student, just 17 years old. I'll have to check the record books, but this might make her the youngest-ever presenter at this premier analytics event.
Last year, Richa served as a student intern at the Canary Center at Stanford for Cancer Early Detection. That's where she learned about the biology of cancer. She was allowed (encouraged!) to attend all lab meetings – and the experience opened her eyes to the challenges of cancer detection.
The importance of cell types and how cancer works
Unlike many technical conference talks that I've attended, Richa did not dive directly into the math or the code that support the techniques she was presenting. Instead, Richa dedicated the first 25 minutes of her talk to teach the audience how cancer works. And that primer was essential to help the (standing-room only!) audience to understand the relevance and value of her analytical solution.
What we call "cancer" is actually a collection of different types of cells. Richa focused on three types: cancer stem cells (CSCs), transient amplifying cells (TACs), and terminally differentiated cells (TDCs). CSCs are the most rare type within a tumor, making up just a few percent of the total mix of cells. But because of their self-renewing qualities and their ability to grow all other types of cancer cells, these are very important to treat. CSCs require targeted therapy -- that is, you can't use the same type of treatment for all cell types. TACs usually require their own treatment, depending on the stage of the disease and the ability of a patient to tolerate the therapy. The presence of TACs can activate CSCs to grow more cancer cells, so if you can't eradicate the CSCs (and that's difficult to manage with 100% certainty, as we'll see) then it is important to treat the TACs. TDCs represent cancer cells that are no longer capable of dividing, and so generally don't require a treatment -- they will die off on their own.
(I know that my explanation here represents a simplistic view of cancer -- but it was enough of a framework to help me to understand the rest of Richa's talk.)
The inexact science of biopsies
Now that we understand that cancer is made up of a variety cell types, it makes sense to hope that when we extract a biopsy, that we get a sample that represents this cell type variability. Richa used an example of sampling a chocolate chip cookie. If you were to use a needle to extract a core sample from a chocolate chip cookie...but didn't manage to extract any portions of the (disappointingly rare) chocolate chips, you might conclude that the cookie was a simple sugar cookie. And as a result, you might treat that cookie differently. (If you encountered a raisin instead...well..that might require a different treatment altogether. Blech.)
But, as Richa told us, we don't yet know enough about the distribution and proximity of the different cell types for different types of cancers. This makes it difficult to design better biopsies. Richa is optimistic that it's just a matter of time -- medical science will crack this and we'll one day have good models of cancer makeup. And when that day comes, Richa has a statistical method to make biopsies better.
Using SAS and Python to model cancer cell clusters
Most high school students wouldn't think to pick up SAS for use in their science fair projects, but Richa has an edge: her uncle works for SAS as a research statistician. However, you don't need an inside connection to get access to SAS for learning. In Richa's case, she used SAS University Edition hosted on AWS -- nothing to install, easy to access, and free to use for any learner.
Since she didn't have real data that represent the makeup of a tumor, Richa created simulations of the cancer cells, their different types and proximity to each other in a 3D model. With this data in hand, she could use cluster analysis (PROC CLUSTER with Ward's method and then PROC TREE) to analyze a distant matrix that she computed. The result shows how close cancer cells of the same type are positioned in proximity. With that information, it's possible to design a biopsy that captures a highly variable collection of cells.
Richa then used the Python package plotly to visualize the 3D model of her cell map. (I didn't have the heart to tell her that she could accomplish this in SAS with PROC SGPLOT -- some things you just have to learn for yourself.)
A bright future -- for all of us
Clearly, Richa is an extremely accomplished young woman. When I asked about her college plans for next year, she told me that she has a long list of "stretch schools" that she's looking at. I'm having a difficult time understanding what constitutes a "stretch" for Richa -- I'm certain that any institution would love to have her.
Richa's accomplishments make me feel optimism for her, but also for the rest of us. As a father of three daughters, I'm encouraged to see young women enter technical fields and be successful. SAS is among the elite technology companies that work to close the analytics skills gap by providing free software, education, and mentoring. Throughout the Analytics Experience 2018 conference, I've heard from many attendees who also saw Richa's talk -- they were similarly impressed and inspired. Presentations like Richa's deliver on the conference tagline: "Analytics redefines innovation. You redefine the future."