Skip to main content

AutoClassify

Overview

A fundamental goal of SparkLogs is to enable you to analyze and understand the behavior of your apps through their log data, without requiring upfront or ongoing configuration work.

AutoExtract makes it easy to produce structured log data from unstructured text.

AutoClassify complements this behavior by automatically classifying your log events into "patterns", so that log messages that have the same "shape" will have the same pattern. The pattern of a log event will exclude any information that is likely to be dynamic, so that the pattern represents the "core meaning" of the log event.

AutoClassify is more effective if you have conformed your log messages to the AutoExtract conventions, so that all dynamic data is properly captured into custom fields.

Additionally, AutoClassify will detect other dynamic data that are not part of custom fields, such as numbers, timestamps, variables, paths, and IP addresses, and replace these with placeholders (such as <num>, <timestamp>, <var>, <path>, <ipaddr>). The dynamic data is also captured into the custom array fields x.num, x.var (including paths), and x.ips to make it easy to filter over these dynamic values or analyze these values for patterns.

A pattern_hash is also computed from the pattern, and represents a (likely) unique brief ID for this pattern value. It represents the first letter of the first six words in the pattern, followed by a 5-digit alpha-numeric hash of the full pattern. The pattern_hash is a convenient way to filter for messages that only match a certain known (set of) pattern values, without having to match on the full/lengthy pattern value.

Try It Out

Try pasting some of your log messages here and see what pattern value it produces:

Manual override

If you wish to manually override the pattern that AutoClassify produces, set the value of the special pattern_override field. This will use this override value instead of the AutoClassify value. Note that if you attempt to specify the pattern field directly, it will rename this field to something else (e.g., pattern2). This logic ensures that the AutoClassify value is not overridden by accident.