Devices: Overview
Devices are the first stage in the DataStream processing flow. They receive telemetry from external sources and convert that data to a standardized format for pipeline processing.
Provider → Device → Preprocessing → Pipeline → Postprocessing → Target → Consumer
As such, they are defined using a standardized YAML configuration format that specifies its behavior, connection parameters, and processing options. DataStream uses devices as an abstraction layer that decouples data sources from pipelines.
Each device type provides specific configuration options detailed in their respective sections.
Definitions
Devices operate on the following principles:
- Unified Configuration Structure: All devices share a common configuration framework with device-specific properties.
- Data Collection: Devices receive data through network connections, APIs, or direct system access.
- Pipeline Integration: Devices can link to preprocessing pipelines for data transformation.
- Stateful Operation: Devices maintain their operational state and can be enabled or disabled.
Devices enable:
Authentication: Basic authentication, API keys, HMAC signing, and client certificates.
Encryption: TLS/SSL, SNMPv3 privacy, and custom encryption.
They also provide access control, and audit logging.
Device Collection Types
Devices operate in two fundamental modes that affect how data flows into DataStream:
Push-based devices listen for incoming connections and receive data sent by external sources:
- Syslog (UDP/TCP), HTTP/HTTPS, TCP, UDP, SMTP
- SNMP Traps, eStreamer, Proofpoint
- Event Hubs, RabbitMQ, Redis
Pull-based devices actively fetch data from external sources on a schedule or trigger:
- Kafka (consumer), Azure Monitor, Microsoft Sentinel
- Azure Blob Storage
- Windows/Linux Agents (collect local logs and forward to Director)
This distinction affects configuration requirements: push devices require network listener settings (address, port), while pull devices require connection credentials and polling parameters.
Configuration
All devices share the following base configuration fields:
| Field | Required | Description |
|---|---|---|
id | Y | Unique numeric identifier |
name | Y | Device name |
description | N | Optional description of the device's purpose |
type | Y | Device type identifier (e.g., http, syslog, tcp) |
tags | N | Array of labels for categorization |
pipelines | N | Array of preprocessing pipeline references (processed sequentially). On Agents, enables local processing before data reaches Director. |
status | N | Boolean flag to enable/disable the device (default: true) |
Each device type provides specific options detailed in its respective section.
Use the id of the device to refer to it in your configurations.
Example:
devices:
- id: 1
name: http_logs
type: http
properties:
port: 8080
content_type: "application/json"
This is an HTTP device listening on port 8080, and it expects the incoming data to be in JSON format.
Device-to-Pipeline Handoff
When a device receives data, it performs initial format conversion before passing to pipelines:
- Raw Input: Device receives data in its native protocol format (syslog message, HTTP POST body, Kafka record, etc.)
- Parsing: Device parses protocol-specific headers and metadata
- Normalization: Device creates a standardized event structure with common fields (
message,host,timestamp) - Pipeline Input: Normalized event is passed to any attached preprocessing pipelines (via the
pipelinesfield)
Preprocessing pipelines attached to devices execute sequentially in the order specified. This enables filtering, enrichment, and transformation before data enters the routing stage.
Device Types
The system supports the following device types:
-
Network Protocol - These devices listen for incoming network connections:
- HTTP: Accepts JSON data via HTTP/HTTPS POST requests with authentication options
- TCP: Receives messages over TCP connections with framing and TLS support
- UDP: Collects datagram-based messages with high throughput capabilities
- Syslog: Specialized for syslog format messages with RFC compliance
- SMTP: Receives email messages for log processing
-
Flow Monitoring - These devices collect network flow data:
- NetFlow: Cisco NetFlow v5/v9 network traffic analysis
- sFlow: sFlow sampling-based network monitoring
- IPFix: IP Flow Information Export (IETF standard)
-
Cloud Integration - These devices connect to cloud services:
- Amazon S3: Processes files from Amazon S3 buckets using SQS event notifications
- Amazon Security Lake: Consumes OCSF Parquet files from Amazon Security Lake via SQS notifications
- Azure Blob Storage: Pulls data from Azure Blob containers
- Azure Monitor: Collects logs from Azure Log Analytics workspaces
- Event Hubs: Consumes events from Azure Event Hubs
- Microsoft Sentinel: Pulls security data from Microsoft Sentinel
-
Message Queue - These devices consume from messaging platforms:
- Kafka: Consumes from Apache Kafka topics
- NATS: Subscribes to NATS messaging subjects
- RabbitMQ: Consumes from RabbitMQ queues
- Redis: Subscribes to Redis pub/sub channels
-
Security Integration - These devices integrate with security products:
- eStreamer: Connects to Cisco eStreamer servers
- Proofpoint: Consumes Proofpoint TAP log stream via WebSocket
- SNMP Trap: Receives SNMP trap notifications
-
System Integration - These devices interact with operating systems:
- Windows: Collects Windows events via Agent
- Linux: Collects Linux logs and metrics via Agent
-
File Transfer - These devices receive files:
- TFTP: Receives files via Trivial File Transfer Protocol
Use Cases
Devices can be used in the following scenarios:
-
Infrastructure monitoring: Provides system performance metrics, event logs, resource utilization, and service availability information.
-
Security operations: Enables security event monitoring, threat detection, compliance monitoring, and provides audit trails.
-
Application telemetry: Provides application logs and performance metrics, and enables error tracking and user activity monitoring.
-
Network monitoring: Provides network device logs and SNMP data, and enables traffic analysis and connection tracking.
Implementation Strategies
The following strategies optimize device deployment and data collection.
Monitoring
For monitoring operating systems, Director uses a unified agent-based approach with two types of deployment:
Managed (Traditional): The agent is installed and managed by system administrators. This provides persistent installation on the target system. Local data is buffered in the emergence of network issues. Director supports Windows, Linux, macOS, Solaris, and AIX.
Auto-managed (Agentless): The agent is automatically deployed and managed, no manual installation is required. Auto-managed agents provide local data buffering, network resilience, and performance optimization. This deployment type is self-healing, since the agent is automatically redeployed if the process terminates. Also, it supports remote credential management. Deployment is done using WinRM for Windows, and SSH for Linux, macOS, Solaris, and AIX.
Both approaches provide local data processing, store-and-forward capability against connectivity issues, real-time metrics and events, and native OS monitoring. The key difference is deployment and lifecycle management, not functionality.
Layered Collectors
Configure multiple devices to handle different aspects of data collection:
- External-facing HTTP endpoints for application logs
- Internal TCP/UDP listeners for network device logs
- Specialized connectors for cloud and security products