Skip to main content

Normalization

Normalization is a critical stage connecting ingestion from sources and forwarding to targets used to coalesce log data from diverse sources into consistent formats, enabling unified handling across different logging systems.

Log Formats

The processor supports several widely-used log formats:

Generic

FormatNotationKey IdentifierLayout CharacteristicsExample Fields
Elastic Common Schema (ECS)Dot notation with lowercase@timestampHierarchical structuresource.ip, network.direction
Splunk Common Information Model (CIM)Underscore with lowercase_timeFlat structuresrc_ip, network_direction
Advanced Security Information Model (ASIM)PascalCaseTimeGeneratedExplicit namesSourceIp, NetworkDirection
Google SecOps Unified Data Model (UDM)Nested structuremetadata.event_timestampEntity-based hierarchyprincipal.ip, target.ip
Open Cybersecurity Schema Framework (OCSF)Nested structuretimeClass-based hierarchysrc_endpoint.ip, dst_endpoint.ip

Security-specific

FormatDescriptionKey IdentifierExample Fields
Common Event Format (CEF)ArcSight's standard formatrt (receiptTime)networkUser, sourceAddress
Log Event Extended Format (LEEF)IBM QRadar's formatdevTimenetworkUser, srcAddr
Common Security Log (CSL)Microsoft Sentinel's formatTimeGeneratedNetworkUser, SourceAddress

Format Detection

Source formats can be automatically detected using certain characteristic fields, e.g.

ContextFieldFormat
Timestamp@timestampECS
_timeCIM
TimeGeneratedASIM/CSL
metadata.event_timestampUDM
timeOCSF
SecurityrtCEF
devTimeLEEF
CSL detectionTimeGenerated + LogSeverityCSL
TimeGenerated onlyASIM
UDM detectionmetadata.event_typeUDM
OCSF detectionclass_uidOCSF

Conversion

Casing and Delimiters

Each format follows specific naming conventions:

ECS
source.ip, event.severity
CIM
src_ip, event_severity
ASIM
SourceIp, EventSeverity
CEF
sourceAddress, eventSeverity
LEEF
srcAddr, evtSev
CSL
SourceIP, EventSeverity
UDM
principal.ip, security_result.severity
OCSF
src_endpoint.ip, severity_id
caution

Complex format conversions may impact performance.

Field Mapping

There are identifiable common network fields based on context across various formats:

Context
FormatSource IPDestination IPDirection
ecssource.ipdestination.ipnetwork.direction
cimsrcdestdirection
asimSrcIpDstIpNetworkDirection
cefsrcdstnetworkDirection
leefsrcAddrdstAddrnetDir
cslSourceIpDestinationIpNetworkDirection
udmprincipal.iptarget.ipnetwork.direction
ocsfsrc_endpoint.ipdst_endpoint.ipdirection_id

Configuration

Basic

Convert from ECS to ASIM format:

normalize:
source_format: ecs
target_format: asim

Field-specific

Convert a specific network field:

normalize:
field: network_data
source_format: cef
target_format: ecs

Auto-detection

Let the processor detect the source format:

normalize:
target_format: cim

UDM Conversion

Convert ECS to Google SecOps UDM format:

normalize:
source_format: ecs
target_format: udm

OCSF Conversion

Convert to Amazon Security Lake OCSF format:

normalize:
source_format: ecs
target_format: ocsf

Preprocessing

Fields are standardized with normalize for conversion between the ECS, CIM, ASIM, CEF, LEEF, CSL, OCSF, and UDM formats (see the Log Formats and Conversion sections above). Values are formatted for uniform casing with uppercase and lowercase processors when required by the target format's naming conventions.

Postprocessing

Fields are optimized for storage and queries using format conversion with the normalize processor (see the Conversion and Field Mapping sections above). For Microsoft Sentinel integration, data is prepared by converting to the ASIM format with normalize. For Google SecOps, convert to UDM format. For Amazon Security Lake, convert to OCSF format (see Log Formats table).

Schema Enforcement

When converting to UDM or OCSF, schema enforcement rules are applied by default. These rules normalize timestamps, validate event types, and ensure field values conform to the target schema specification.

warning

Complex format conversions may impact processing performance and delivery latency.