Can pattern recognition software tell us if it is a Hermit Thrush or a Swainson's Thrush we've seen? A few of us have been debating an identification question at work, because we agreed to help Fulbright Scholar and Duke University PhD student Natalia Ocampo-Peñuela with research she is doing related to bird collisions with windows. A sad little band of us at SAS spent three weeks this fall doing daily perambulations of multiple buildings on the SAS campus to look around the perimeter for dead birds, casualties of run-ins with our shiny pretty glass buildings. We recorded the species if possible (sometimes predators left us scanty evidence), hence the need for identification. I can tell you that a Hermit Thrush and Swainson's Thrush look very similar. As an avid birdwatcher myself, I have several field guide apps on my iPhone, but it got me wondering what algorithmic magic was behind the search tools most of these apps now have. You input features like state, the month, size, color, etc. and the app returns a filtered list of possibilities likely to be seen. But a new app, Merlin Bird Photo ID, developed in collaboration with the Cornell Lab of Ornithology and others, takes this flow a step further using machine learning techniques from computer vision to help identify birds. You upload an image of the bird you've seen, and Merlin compares features in the photo to those expected to be seen on that day in your location, based on a data set supplied by birders who report their sightings to a site called eBird (it's decently large data - 9.5 million observations were reported in the month of May alone!).
A quick search on pattern recognition software identified many papers on machine learning for bird identification. Improved Automatic Bird Identification through Decision Tree based Feature Selection and Bagging uses audio recordings instead of images for identification. Two researchers at Queen Mary University London argue that Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning, and they are even raising money in Kickstarter to build Warblr, a birdsong recognition app. It will use machine learning to help you figure out which bird just serenaded you (or a prospective mate, really, but you can consider it a gift anyhow). In their larger study they trained and tested a random forest classifer, which is more than ironic given that certainly many of the birdsongs were recorded in forests! Of course, birdsong doesn't work in the case of our task identifying a limp little bird, but many birds are more commonly heard than seen, so this approach offers great advantages.
Technical challenges include noise (you can't exactly get birds into a sound studio) and scalability, given the computational intensity. But some kinds of pattern identification pose even greater challenges. What if you just had a footprint to use? The bird survey at SAS was initiated by a connection with the great folks at Wildtrack, who use JMP Software from SAS to analyze data for their Footprint Identification Technique, a non-invasive method used to track elusive endangered animals. Wildtrack's Zoe Jewell and Sky Allibhai have partnered with researchers from NC State University to improve upon footprint identification, and some of their work includes a Manifold learning approach to curve identification with applications to footprint segmentation. It's a tough nut but they keep working to crack it.
My own colleagues in SAS Advanced Analytics R&D are doing interesting work on pattern recognition. Patrick Hall, Ilknur Kaynar Kabul, and Jorge Silva used PROC NEURAL in SAS Enterprise Miner to extract representative features from a training set for digit recognition, a specific challenge for pattern recognition software to tackle. They built a stacked denoising autoencoder from the Mixed National Institute of Standards and Technologies (MNIST) digits data, which they describe in this paper on Machine Learning in SAS Enterprise Miner. The code is in Patrick's GitHub repo. Now if I can just get them interested in bird recognition maybe we'll be able to settle the debate about Hermit vs. Swainson's Thrush......
Additional resources
- Paper names are in italics above if you want to read them.
- And check out Brett Wujek's tips for Getting the Most from your Random Forest in SAS if you want to test drive it yourself.
Image credit: photo by Kelly Colgan Azar // attribution by creative commons