Automatic author profiling of online chat logs

Now that the Internet has become easily accessible and more affordable, a larger number of people spend more time in front of a computer. Some spend so much time on the Internet that they develop friendships and relationships - people with whom they have regular contact via a computer screen and th...

Full description

Bibliographic Details
Main Author: Lin, Jane.
Other Authors: Martell, Craig H.
Published: Monterey, California. Naval Postgraduate School 2012
Online Access:http://hdl.handle.net/10945/3559
Description
Summary:Now that the Internet has become easily accessible and more affordable, a larger number of people spend more time in front of a computer. Some spend so much time on the Internet that they develop friendships and relationships - people with whom they have regular contact via a computer screen and the Internet. While most of the dialogue exchanged online is not harmful or illegal, ther are those with dishonest intentions lurking online. These people can be breaking the law by seducing a minor virtually or even going as far as meeting a minor in person. Terrorists can also use the Internet to facilitate communication and plan attacks. Since e-mail is one of the original means of communication on the Internet, methods for determining the author of an email have already been studied. So far, however, no significant experimentation with online chat logs exist. The first of part of this study is comprised of generating an unbiased, random, and broad corpus of online chat logs. Having a general corpus with a wide-range of topics allows the results of this research to be applied in the most general case. Because developing a complete solution fto the authorship attribution problem for chat logs is difficult, we limit our scope to predicting gender and age. The ultimate goal of the work, then, is to facilitate the jobs of law enforcers in tracking down criminals who attempt to use the Internet as a hiding place.