Designing and Building an IRC Bot That Summarizes Software Developer Conversations

Rebecca C. Gazda (Professor Megan Squire), Department of Computing Sciences

Because of the overwhelming quantity of text data available today, it is increasingly important to
develop automated methods for summarizing or recapping this information. In this specific
project, what we want to summarize are the topics that software developers are discussing in
their conversations with each other. Modern software development often happens electronically,
in chat/meeting rooms held on a part of the Internet called IRC (Internet Relay Chat). IRC is a
synchronous conversation medium, meaning that the chat happens in real time. If a part of the
chat is missed, it is necessary to get caught up quickly on what happened during the absence.
This purpose of this project is to design and build an IRC chat bot (a bot that responds to users in
chat rooms) that can generate summaries of developer conversations, on request. Our approach
for this involves using a program to automatically collect all the developer conversations for a
given time frame and to write a chat bot that listens for a request from the user(s) in the chat
room, and then provides back an accurate summary of the conversations based on that request.
The challenge here is not collecting the conversation data or writing the bot, but in accurately
summarizing the conversations. Chatting on IRC is much more casual than the proper English
used in official forms of writing, making it harder to use simple off-the-shelf summarization
algorithms that are based on word frequency detection. To overcome this hurdle, we are
conducting research on alternative summarization algorithms, like the ones used for news articles
or blogs. With this information, we can either change the techniques used so that they will work
with the more casual chat data, or we can change the format of the chat data as we collect it so it
can be used with current summarization techniques. With an accurate summarization bot, we can
more easily get a recap of what generally happened in certain chat rooms over the course of the
past hour or day. Ultimately, our goal is to generalize the bot to collect a wide variety of types
conversations on a number of electronic communication platforms.