2015 SURF – Rebecca C. Gazda
Designing and Building an IRC Bot That Summarizes Software Developer Conversations
Rebecca C. Gazda (Professor Megan Squire), Department of Computing Sciences
Because of the overwhelming quantity of text data available today, it is increasingly important to develop automated methods for summarizing or recapping this information. In this specific project, what we want to summarize are the topics that software developers are discussing intheir conversations with each other. Modern software development often happens electronically, in chat/meeting rooms held on a part of the Internet called IRC (Internet Relay Chat). IRC is a synchronous conversation medium, meaning that the chat happens in real time. If a part of the chat is missed, it is necessary to get caught up quickly on what happened during the absence. This purpose of this project is to design and build an IRC chat bot (a bot that responds to users in chat rooms) that can generate summaries of developer conversations, on request. Our approach for this involves using a program to automatically collect all the developer conversations for a given time frame and to write a chat bot that listens for a request from the user(s) in the chat room, and then provides back an accurate summary of the conversations based on that request. The challenge here is not collecting the conversation data or writing the bot, but in accurately summarizing the conversations. Chatting on IRC is much more casual than the proper English used in official forms of writing, making it harder to use simple off-the-shelf summarization algorithms that are based on word frequency detection. To overcome this hurdle, we are conducting research on alternative summarization algorithms, like the ones used for news articles or blogs. With this information, we can either change the techniques used so that they will work with the more casual chat data, or we can change the format of the chat data as we collect it so it can be used with current summarization techniques. With an accurate summarization bot, we can more easily get a recap of what generally happened in certain chat rooms over the course of the past hour or day. Ultimately, our goal is to generalize the bot to collect a wide variety of types conversations on a number of electronic communication platforms.