@ghiasvan

Automatic Classification of System Logs

, and . International Supercomputing Conference, Frankfurt, Germany, (June 2018)

Abstract

System logs are valuable source of information for analyzing computing systems behavior. The message part of each system log entry includes detailed information about its respective event. RFC5424 provides general guidelines for generating system logs. However, the message part of system logs are unstructured and every software generates its system log messages independently. Automatic methods of analysis are required to analyze the large number of system logs generated on modern computing systems. Unstructured nature of system log messages, is a main challenge towards automatic analysis. Automatic text classification is a well-known approach to address this challenge. However, to use this approach, the target classes must be predefined. The common method is to apply machine learning on pre-classification text samples to generate a specific classifier for the respective text format. Our study indicates that in case of a high repetition frequency of system logs, the automatic classification of system log messages without pre-classified samples is possible. The preliminary results of analyzing one month of system logs on a production high performance cluster, indicate a very high accuracy. The classification accuracy has a direct relation with the available amount of system logs. The proposed classification method is automatic, unsupervised, and general.

Links and resources

Tags

community

  • @ghiasvan
  • @scads.ai
@ghiasvan's tags highlighted