ABSTRACT
Systems for learning to detect anomalous email behavior, such as worms and viruses, tend to build either per user
models or a single global model. Global models leverage a larger training corpus but often model individual users
poorly. Per-user models capture fine grained behaviors but can take a long time to accumulate sufficient training data.
Approaches that combine global and per-user information have the potential to address these limitations. We use the
Latent Dirichlet Allocation model to transition smoothly from the global prior to a particular user’s empirical model
as the amount of user data grows. Preliminary results demonstratelong-term accuracy comparable to per-user models,
while also showing near-ideal performance almost immediately on new users.
vulnerability, Baye’sclassification, Latent Dirichlet Allocation, per-user mixture model, global mixture
model, SMTP engines