Elon Computing Sciences

Collecting & Analyzing Data about Free, Libre, and Open Source Software

Presentation at Elon Student Undergraduate Research Forum, Spring 2010

Jamie Marie Schatz (Dr. Megan Squire) Department of Computing Sciences

FLOSSmole, is a project to collect data about free, libre, and open source software. It was developed in 2004 by Elon professor Megan Squire and colleagues from Syracuse University. The project "freely provides raw data about open source projects in multiple formats for anyone to download." A main goal of this project is finding the most effective method of collecting and publicizing FLOSSmole data. After collecting data, the best way to publicize it is to develop and test different information architectures. Information architecture is the science of organizing and labeling web sites, intranets, online communities and software to support usability. Some of the specific tasks that I have completed include creating wikis and blogs, processing canned reports and queries, and building web-based solutions for increasing researcher‘s awareness of the data.

Every decision about FLOSSmole design can have a significant impact on usability. For instance, after data has been collected, how that data is displayed on the page can drastically change how a user perceives the data. Good design will engage new users but bad design may steer users away. One of the reasons that open source data is so disorganized and unmanageable is because no standards have been set to specify a description of a project.

Therefore, the question that needs to be asked is: How is FLOSSmole data collected and what is the most effective method of modeling information architecture for that data? In order to answer this question about FLOSSmole data I did two things. First, I wrote a collector using spidering techniques and the PHP scripting language. Second, I constructed data models on the web using different types of information architectures. Upon completion of the project, I participated in every aspect of the distribution of FLOSSmole data, beginning with data collection and ending with release of project statistics onto the web.