Where can you compare your skills with those of other experts, particularly data scientists? Or have the opportunity to program informally, just because it's fun? Hackathons. I talked to Fabian Buchert, himself a data scientist, about his experiences at the last Data4good Hackathon, on 6 May in Heidelberg on the SAS campus.
How did you hear about this Data4Good Hackathon?
Via the DataViz Meet-up, because the hackathon is organised by the DataViz community around Frankfurt am Main. I find Meet-ups are an excellent way to combine personal and business interests, because they have a wide range of subjects, from visualization to machine learning, and people involved. As a side benefit, you also find out about such events as hackathons.
What exactly was the topic of the hackathon in Heidelberg, and why did you participate?
The theme was “Open Data in Action”, and it was about access to and use of Open Data. The exciting thing about this mini-hackathon is that everyone could work with their preferred tool, whether Excel, Python, Microsoft Power BI or SAS Viya. That was good, because you could really feel the pulse of the scene, and identify current themes and trends, but also have the opportunity to be very direct and unconstrained. It is not always easy to be part of the community as employees of a “traditional” software vendor like SAS. But at this event, the other participants made it very easy and open. That really inspired me. And there was a wide range of participants too, from students at TU Darmstadt, at the beginning of their data science career, to experienced users working for large companies in the region, as well as consultants, bloggers, and a few who just wanted to have fun.
What was the hackathon’s theme, and how did that fit with your approach?
The Hackathon was part of the Data4good initiative. There are projects on this all around the world. The hackathon organizers had managed to get the non-commercial organization CorrelAid involved. CorrelAid is a network of data scientists doing pro-bono projects for other NGOs. The data for the Hackathon came from openpetition.de, the German language site for online petitions. Anyone can submit or sign petitions. We got a complete data extraction from the MySQL database and were able to do what we wanted with it—we weren’t given anything specific, but everyone was encouraged to develop ideas for how the platform or its service could be improved. We only had 3 hours to do this, so it was defined as a mini-hackathon.
How did the other participants approach the data?
The beauty of a hackathon like this is that participants approach the data in very different ways, and with different tools. As groups formed, some exciting combinations emerged, which produced versatile results from the sketching of a concept on the flipchart through visualization to recommender systems and predictive models.
Can you give us more detail about what you personally did?
As I feel particularly at home analysing unstructured data, I looked at the Freedoms of the petitions. Petitions can be rejected for various reasons, for example, because they are insulting or rude. The reasons for rejection were included in the data. I did some work on free text and how to train predictive models by text mining and machine learning to predict the rejection probability just from the text. This could help the site administrators to identify suspicious content very early and, if necessary, look more closely.
What was the feedback on your question and solution?
I think the idea of connecting free text and structured information aroused quite a lot of interest, despite the short period of the mini hackathon. Of course, the quality of the model left something to be desired!
Can you recommend this format?
Hackathons are an excellent way to allow creative minds from very different backgrounds to work together on big problems. In a very short time, you can get a wide range of solutions and proposals, which will broaden the horizons of all participants and, of course, help the organisation involved. Meet-ups are a great way to get involved in your community or even find a new one. This is good for both participants and companies, who have the opportunity to hear from the community, or maybe find new talent. I can completely recommend getting involved.