The previous blog reviewed some guidelines that laid the foundation for security through understanding your environment and planning how elements within that environment are configured, used, accessed, and tracked. Although implementing these recommended best practices won’t make you impervious to all attacks, the concept of building a baseline can afford basic protections. This in turn can help you detect and remediate in the event of a security incident.
But what about arming your network with capabilities to analyze the data coming into your network? Data is dynamic; threats are dynamic and evolving. Trying to keep up to date is a daily challenge that has pushed the application of analytical methods beyond the traditionally static protections of legacy firewall rules. Even with IPS systems that have continuously updating signatures, it’s not possible to detect everything. A signature is usually a deterministic solution; it identifies things we know about. What is required? Newer methods based on some well-established formulas and principles (think statistics, linear algebra, vectors, etc.), which, by the way, are often used to help drive the creation of those signatures.
Here is a synopsis of the various data analytics methods applicable to the threat landscape today. When looking for solutions for your own organization, understanding their capabilities will be useful.
Deterministic Rules-Based Analysis
- Uses known formulas and behaviors applied as rule-sets to evaluate inputs such as packet headers and data.
- There is no randomness introduced in the evaluation criteria. Analysis is reactive in nature, allowing for the detection of known attack patterns.
- The output is fully determined by the parameter values and the initial conditions.
- Protocols behave in an RFC defined fashion; they use well-known ports and over follow a known set of flows
- When establishing a TCP connection, look for the three-way handshake
- Stateful firewalling expects certain parameters: TCP flags and sequence numbers
- Behaviors that map to known signatures
- Heuristics are often used to detect events in the form of a rule-based engine that can perform functions on data in real time.
- A heuristic engine goes beyond looking for known patterns by sandboxing a file and executing embedded instructions. It can also examine constructs such as processes and structures in memory, metadata, and the payload of packets.
- The advantage of heuristic analysis is detection of variants of both existing and potential zero-day attacks.
- Heuristics are often combined with other techniques, such as signature detection and reputation analysis, to increase the fidelity of results.
- Heuristics can perform contextual grouping.
- Example: Activities performed at certain times of the day or year, detection of behavioral shifts to detect new viruses.
- Statistical analysis is often used in anomaly detection.
- The goal is to identify some traffic parameters that vary significantly from the normal behavior or “baseline.”
- There are two main classes of statistical procedures for data analysis and anomaly detection:
- The first class is based on applying volumetrics to individual data points. There is some expected level of variation between telemetry information and baseline; any deviation beyond certain thresholds is defined as anomalous.
- Example: Volumetric outlier detection matching some know threat vector behavior – distinct, max, total, entropy
- The second class measures the changes in distribution by windowing the data and counting the number of events or data points to determine anomalies.
- Example: Attack characterization of malware – series of exchanges between host and CnC of small packet sizes, seen “over time” - skewness
- Methods must be chosen to reduce false positives and produce a confidence interval or probability score (>< 0.5).
Data Science Analysis – Machine Learning
- The process of collecting, correlating, organizing, and analyzing large sets of data to discover patterns and other useful information like relationships between variables.
- The sheer volume of data and the different formats of the data (structured and unstructured) that are collected across multiple telemetry sources is what characterize “Big Data.”
- Structured Data - resides in a fixed field within a record, transaction, or file, and lends itself to classification.
- Unstructured Data - webpages, PDF files, presentations, and email attachments – data that is not easily classified.
- The large volume of data is analyzed using specialized systems, algorithms, and methods for predictive analytics, data mining, forecasting, and optimization:
- Supervised Machine Learning - analysis takes place on a labeled set of data inputs and tries to infer a function (by way of a learner) based on pre-defined outputs. One common technique is walking a decision tree to classify data.
- Example: SPAM filtering examining header info, misspellings, key words.
- Unsupervised Machine Learning - looks for patterns and groupings in input/unlabeled data, determines and attempts to infer a relationship between objects. Lends itself to predictive capabilities and includes methods such as clustering.
- Example: Discovery of a new variation of a family of malware.
We’ve covered a lot of concepts in this series. In the final blog, we will look at security monitoring best practices and see how our knowledge of the theoretical can help us be practical in the real world.