SAS Viyaがリニューアルされまして、ついにディープラーニングが登場しました! SAS ViyaのディープラーニングではオーソドックスなDeep Neural Network(DNN)から、画像認識で使われるConvolutional Neural Network(CNN、畳込みニューラルネットワーク)、連続値や自然言語処理で使われるRecurrent Neural Network(RNN、再帰的ニューラルネットワーク)まで利用可能になります。 ディープラーニングを使うことのメリットは、従来の機械学習やニューラルネットワークが苦手としている画像や文章を認識し、高い精度で分類や推論することが可能になります。 高い精度というのは、ディープラーニングのモデルによっては人間の目よりも正確に画像を分類することができるということです。 例えばコモンドールという犬種がありますが、この犬はモップのような毛並みをしていて、人間ではモップと見間違えることがあります。 これは犬? それともモップ? こういう人間だと見分けにくい画像に対しても、ディープラーニングであれば、人間よりも正確に犬かモップかを見分けることができるようになります。 というわけで、今回はSAS Viyaのディープラーニングを使って画像分類をしてみたいと思います。 ディープラーニングの仕組み 画像分類のディープラーニングではCNNを使います。 CNNは画像の特徴を探し出す特徴抽出層と特徴から画像を分類する判定層で構成されています。 特徴抽出層は主に畳込み層とプーリング層で構成されています。 畳込み層で入力画像に対し、ピクセルの特徴(横線の有無とか斜め線とか)を探し出し、プーリング層で重要なピクセルを残す、という役割分担です。 判定層は、特徴抽出層が見つけた特徴をもとに、画像の種類を分類します。 例えば犬と猫の分類であれば、特徴抽出層が入力画像から、面長で大きな鼻の特徴を見つけだし、犬と分類します。 または、丸っこい顔立ちと立った耳の特徴を見つけだし、猫と分類します。 SAS Viyaで画像を扱う SAS ViyaディープラーニングでCifar10をネタに画像分類をしてみたいと思います。 Cifar10は無償で公開されている画像分類のデータセットで、10種類の色付き画像60,000枚で構成されています。 各画像サイズは32×32で、色はRGBです。 10種類というのは飛行機(airplane)、自動車(automobile)、鳥(bird)、猫(cat)、鹿(deer)、犬(dog)、蛙(frog)、馬(horse)、船(ship)、トラック(truck)で、それぞれ6,000枚ずつ用意されています。 画像は総数60,000枚のうち、50,000枚がトレーニング用、10,000枚がテスト用です。 画像データは以下から入手することができます。 https://www.cs.toronto.edu/~kriz/cifar.html さて、Cifar10を使って画像分類をしてみます。言語はPython3を使います。 SAS Viyaで画像分類をする場合、まずは入手したデータをCASにアップロードする必要があります。 CASはCloud Analytics Servicesの略称で、インメモリの分散分析基盤であり、SAS Viyaの脳みそにあたる部分です。 SAS Viyaの分析は、ディープラーニング含めてすべてCASで処理されます。 CASではImage型のデータを扱うことができます。 Image型とは読んで字のごとくで、画像を画像フォーマットそのままのバイナリで扱えるということです。
Uncategorized
No cabe duda que la tecnología Blockchain está llamada a ser una de las grandes revoluciones en el mundo en los próximos años. Se ha llegado a afirmar que hará por las transacciones lo que internet ha hecho por la información y las comunicaciones; va a cambiar definitivamente la forma
The SAS language is large. Even after 20+ years of using SAS, there are many features that I have never used. Recently it became necessary for me to learn about DICTIONARY tables in PROC SQL (and the associated SASHELP views) because I needed to programmatically obtain the text for the
Donald Trump is pretty spry and energetic for an old guy - he's currently 71 years old, and seems to have more energy than I do (and I'm about 20 years his junior). But is he the oldest US president? Well, as with many questions ... it depends. Follow along
Driving to work today I was thinking about the "funk" that has descended upon me over the last few weeks. Shorter days and less light? Dry skin and a stuffy nose? Old and cranky? Maybe all of those are true, but as I reflect, I realize it happens every year
La creación de un Viceministerio de Economía Digital adscrito al Ministerio de Tecnologías de la Información y las Comunicaciones, ubica a Colombia en un nivel de liderazgo frente a la región, lo cual llevará al país a establecer bases importantes para la transformación de varias industrias y sectores económicos. Además, se podrán desarrollar
Happy holidays to all my readers! My greeting-card to you is an image of a self-similar Christmas tree. The image (click to enlarge) was created in SAS by using two features that I blog about regularly: matrix computations and ODS statistical graphics. Self-similarity in Kronecker products I have previously shown
It's that time of year, once again, when I take a traditional Christmas song or carol and create a fun technology-related version of it to share with all of you. This is the fourth year and the seventh song, so I hope you enjoy your holiday song for 2017: AI
From national parks and healthcare to taxes and nutrition, federal civilian agencies feature an incredibly large and diverse set of missions. These agencies oversee almost every aspect of American life with an endless sea of projects, programs and general oversight. But, as Deloitte Consulting’s Mark Urbanczyk said during a recent
Joyce Norris-Montanari defines data-driven design and asks if it's more about technology, processes or mindset.
El panorama de regulaciones deberá ocupar un espacio especial dentro de las agendas de todo el sector empresarial de Colombia en 2018. Las organizaciones no solo deberán ocuparse de atender la normativa ya impuesta, sino los requerimientos de nuevas legislaciones en el país. El crecimiento y expansión de nuevas modalidades empresariales
The primary obstacle to becoming a data-driven business is that data is not readily available, leaving valuable insights unused in data silos. To overcome this hurdle, today’s companies are creating a new role: Chief Data Officers (CDO). Responsible for unlocking insights hidden in data silos, the CDO is tasked with
Native American health continues to lag behind other populations in the US. In fact, the American Indians and Alaskan Natives (AI/AN) population lives, on average, 4.4 fewer years than the rest of the US population, and is experiencing significant disparities in a variety of health indicators. The numbers reveal this stark
Some people might think it's an urban legend that SAS gives its employees free M&M's. Well, I'm here to tell you it's true! Every Wednesday at the Cary headquarters, a bucket of M&M's shows up in each of the break rooms. I'm only half-kidding in my suspicion that this is
Are you struggling to kick start your organization’s analytics journey, especially when it comes to leveraging advanced analytics and machine learning techniques? If the answer is yes then you’re definitely not alone. Whilst most organisations today recognise the benefit of analytics and data science, many are still struggling to kick
In a previous article, I showed how to use SAS to perform mean imputation. However, there are three problems with using mean-imputed variables in statistical analyses: Mean imputation reduces the variance of the imputed variables. Mean imputation shrinks standard errors, which invalidates most hypothesis tests and the calculation of confidence
Finding a pattern like a phone number or national ID number embedded in text can be difficult and time consuming.
I recently read an interesting article about petroleum coke (petcoke). A lot of it is produced in the US, and lately a lot of it is consumed (burned) in India ... contributing to air pollution there. The article mentioned some numbers in the text, but the data was really begging to
A steady drumbeat of news coverage makes one thing clear: Opioid abuse is rising and has reached epidemic levels throughout our country. Overdoses from the diversion and abuse of prescription opioids are one cause of the surge in deaths. Overdoses from heroin and other illicit synthetic opioids (such as heroin,
The internet is rich with data, and much of that data seems to exist only on web pages, which -- for some crazy reason -- are designed for humans to read. When students/researchers want to apply data science techniques to analyze collect and analyze that data, they often turn to
데이터 매니지먼트가 중요한 이유 우리는 지금 데이터가 사회와 경제를 움직이는 ‘데이터 이코노미’ 시대에서 살고 있다. 시장조사업체 IDC는 전 세계 데이터 양은 매년 약 30% 증가해 2025년에는 현재보다 10배 늘어난 163제타바이트(ZB)에 이를 것으로 전망했다. 이처럼 폭증하는 빅데이터를 가트너(Gartner)에서는 ‘21세기 원유’로 규정하기까지 했다. 하지만 이제는 빅데이터를 단순한 ‘콘텐츠’가 아닌 ‘프로세스’와 ‘인프라’ 관점에서
Imputing missing data is the act of replacing missing data by nonmissing values. Mean imputation replaces missing data in a numerical variable by the mean value of the nonmissing values. This article shows how to perform mean imputation in SAS. It also presents three statistical drawbacks of mean imputation. How
100년만의 최악의 허리케인, 푸에르토리코를 덮치다 지난 9월 20일, 북대서양과 카리브해 사이에 있는 미국 자치령 푸에르토리코(Puerto Rico)에 초강력 허리케인 마리아(Maria)가 상륙했습니다. 마리아는 시속 185마일(295㎞) 이상의 최고 단계인 5등급 허리케인으로 100년만의 최악의 피해를 남겼습니다. 더욱이 일명 괴물 허리케인이라고 불린 5등급 허리케인 어마(Irma)에 이어 단 2주만에 불어 닥친 재해로 340만 주민들은 엄청난 충격에
Until recently state-of-the-art for trade area analytics still meant analyzing historical store sales by location, together with some Nielsen market data to select merchandise assortments and allocation. Contrast that with the upcoming holiday season where retailers know where and how demand is initiated, and use that new understanding to create
Since Trump became the US president, many people have noticed that he posts a lot of tweets. While some people choose to analyze and critique the content of those tweets, I was more curious about something a little less controversial - the timing and frequency. Follow along as I dig into
최근 국내 의료진이 정상적인 인지 기능을 가진 노인이 알츠하이머 치매에 걸릴 가능성을 예측할 수 있는 새로운 분석 지표를 개발해 큰 주목을 받았습니다. 세계적인 신경 과학 학술지 ‘사이언티픽 리포트(Scientific Reports)’에 소개되며 치매 발병에 대한 예방적 조치를 할 수 있을 것으로 기대를 모으고 있는데요. 많은 과학자들이 오늘날 초고령화 사회에서 가장 두려운 질병 중
David Loshin explains what it means to be a data-driven business by describing three different models.
There are so many reasons why SAS programmers love SAS -- as a matter of fact, I wrote a blog on it back in 2012. I now realize that I could've written a whole series, not just a single post. And with the recent publishing of my first book, Big Data
In this education analytics series of blog posts, we have been on a journey to learn how education customers are turning their data into insights to be a more data-informed and analytical organizations. In my first five posts in the education analytics blog series, we learned how education customers are using SAS,
How do you define artificial intelligence? Would you define it differently if it was your job to prevent fraud and financial crimes, where the risks are constantly shifting? In a recent meeting with banking executives responsible for fraud and financial crimes risk mitigation, Wayne Thompson, Manager of Data Science Technologies