Computer Science Thesis Presentations 2014

Students will be presenting their research on Monday, May 19 at 9:00 am in TCL 202.

9:00 am
Daniel Seita – Building Sum-Product Networks
Probabilistic inference is the process of determining the probability of an event given some evidence.
While techniques based on probabilistic graphical models have been used in practical settings,
exact inference can still be intractable. Sum-Product Networks (SPNs) were recently invented to
address this deficiency. SPNs are trees formed from sum and product nodes, and they can model a
probability distribution while allowing inference to be done in time linear in the number of edges.
In this thesis, we give an introduction to SPNs and investigate di?erent procedures to construct
them given observed data. Specifically, we modify a current implementation of SPN learning in two
ways. In the first, we incorporate the Independent Variable Grouping Analysis (IVGA) method
for grouping variables based on dependency relationships. In the second, we use Density-Based
Spatial Clustering of Applications with Noise (DBSCAN) to cluster similar instances. We analyze
the performance of the SPN learning algorithm with these modifications and compare it to the
original implementation that uses pairwise comparisons and Expectation-Maximization (EM) for
variable grouping and instance clustering, respectively. We find that for grouping variables, IVGA
appears to be worse than pairwise comparisons, but that for clustering instances, DBSCAN is a
viable alternative to EM, and may even result in SPNs that can better tolerate noise in data.

10:00 am
Parker Finch – Decoupling and Coalescing Race Checks
Much of our computing infrastructure utilizes multicore processors and multiprocessor hardware. Such systems can concurrently execute many software threads of control to improve responsiveness and performance, but the potential for unintentional interference between concurrent threads makes it difficult to ensure the reliability of multithreaded software. Automated tools to identify concurrency errors have the potential to make this task easier.

Perhaps the most fundamental concurrency error is a race condition. A race condition consists of two threads accessing (and at least one modifying) the same piece of data at the same time. While dynamic data race detection algorithms have improved in recent years, the overhead of race detection in array-intensive programs remains prohibitive. One promising insight is that arrays are often accessed via common patterns that enable compression of the information maintained by a dynamic race detector, as well as a reduction in the number of checks performed. However, a purely dynamic implementation of this compression technique failed to realize a corresponding decrease in run-time overhead due to the cost of inferring those access patterns at run time. We explore statically annotating programs with the array access patterns that appear at run time to eliminate the need to infer them. Our results show that this can effectively reduce the run-time overhead of race detection on programs with identifiable array access patterns.

11:00 am
Joshua Geller – Multi-Class Feature Selection
Feature selection involves identifying an “optimal” set of features in order to make a classification task more efficient or effective. There are three major approaches to feature selection in the context of classifier learning. One of these – the wrapper method – is a general class of algorithms that can be applied to any underlying classifier, but the choice of the classifier learner impacts the features selected. An interesting issue arises when the goal is multi-class classification but the underlying classifier learner is fundamentally designed for two-class problems, as is the case for support vector machines (SVMs). These types of classifiers handle multi-class tasks with an ensemble of binary classifiers. Current multi-class feature selection approaches operate on the black box level, using one uniform feature set for all underlying binary classifiers. I propose a framework for multi-class feature selection wrapper methods that includes the black box approach, but also introduces two additional multi-class feature selection approaches: the composite individual and the composite summed. I investigate a number of properties of this three-fold framework as well as multi-class feature selection in general.

Daniel Seita ’14

Parker Finch ’14

Joshua Geller ’14

About

Academics

Admission & Aid

Life at Williams

Computer Science Thesis Presentations 2014