Overview

Devices are the first stage in the DataStream processing flow. They receive telemetry from external sources and convert that data to a standardized format for pipeline processing.

Provider → Device → Preprocessing → Pipeline → Postprocessing → Target → Consumer

As such, they are defined using a standardized YAML configuration format that specifies its behavior, connection parameters, and processing options. DataStream uses devices as an abstraction layer that decouples data sources from pipelines.

note

Each device type provides specific configuration options detailed in their respective sections. For GUI-based device management, see Devices Management. To apply reusable collection rules across multiple devices, see Datasets and Profiles.

Definitions

Devices operate on the following principles:

Unified Configuration Structure: All devices share a common configuration framework with device-specific properties.
Data Collection: Devices receive data through network connections, APIs, or direct system access.
Pipeline Integration: Devices can link to preprocessing pipelines for data transformation.
Stateful Operation: Devices maintain their operational state and can be enabled or disabled.

note

Devices enable:

Authentication: Basic authentication, API keys, HMAC signing, and client certificates.
Encryption: TLS/SSL, SNMPv3 privacy, and custom encryption.

They also provide access control, and audit logging.

Configuration

All devices share the following base configuration fields:

Field	Required	Description
`id`	Y	Unique numeric identifier
`name`	Y	Device name
`description`	N	Optional description of the device's purpose
`type`	Y	Device type identifier (e.g., `http`, `syslog`, `tcp`)
`tags`	N	Array of labels for categorization
`pipelines`	N	Array of preprocessing pipeline references (processed sequentially). On Agents, enables local processing before data reaches Director.
`status`	N	Boolean flag to enable/disable the device (default: `true`)

tip

Each device type provides specific options detailed in its respective section.

Use the id of the device to refer to it in your configurations.

Example:

devices:
  - id: 1
    name: http_logs
    type: http
    properties:
      port: 8080
      content_type: "application/json"

This is an HTTP device listening on port 8080, and it expects the incoming data to be in JSON format.

Device-to-Pipeline Handoff

When a device receives data, it performs initial format conversion before passing to pipelines:

Raw Input: Device receives data in its native protocol format (syslog message, HTTP POST body, Kafka record, etc.)
Parsing: Device parses protocol-specific headers and metadata
Normalization: Device creates a standardized event structure with common fields (message, host, timestamp)
Pipeline Input: Normalized event is passed to any attached preprocessing pipelines (via the pipelines field)

Preprocessing pipelines attached to devices execute sequentially in the order specified. This enables filtering, enrichment, and transformation before data enters the routing stage.

Device Types

The system supports the following device types:

Protocols — Network listeners and flow collectors:
- Syslog: Specialized for syslog format messages with RFC compliance
- TCP: Receives messages over TCP connections with framing and TLS support
- UDP: Collects datagram-based messages with high throughput capabilities
- HTTP: Accepts JSON data via HTTP/HTTPS POST requests with authentication options
- eStreamer: Connects to Cisco eStreamer servers
- SNMP Trap: Receives SNMP trap notifications
- SMTP: Receives email messages for log processing
- NetFlow: Cisco NetFlow v5/v9 network traffic analysis
- sFlow: sFlow sampling-based network monitoring
- IPFix: IP Flow Information Export (IETF standard)
- TFTP: Receives files via Trivial File Transfer Protocol
Microsoft Azure — Azure cloud service integrations:
- Azure Blob Storage: Reads and processes files from Azure storage containers
- Azure Monitor: Collects alerts, logs, and metrics from Azure Monitor
- Event Hubs: Consumes events from Azure Event Hubs
- Microsoft Graph API: Polls Microsoft Graph API for audit logs, security events, identity protection, and reports
- Microsoft Sentinel: Collects security data from Microsoft Sentinel
Amazon Web Services — AWS cloud service integrations:
- Amazon S3: Processes files from Amazon S3 buckets using SQS event notifications
- Amazon Security Lake: Consumes OCSF Parquet files from Amazon Security Lake via SQS notifications
Google Cloud (GCP) — Google Cloud service integrations:
- Google Cloud Pub/Sub: Consumes messages from Google Cloud Pub/Sub subscriptions
- Google Cloud Storage: Processes objects from Google Cloud Storage buckets using Pub/Sub notifications
Message Queues — Messaging platform consumers:
- Kafka: Consumes from Apache Kafka topics
- NATS: Subscribes to NATS messaging subjects
- RabbitMQ: Consumes from RabbitMQ queues
- Redis: Subscribes to Redis pub/sub channels
Operating Systems — Agent-based system monitoring:
- Agents: VirtualMetric Agent deployment and management
- Windows: Collects Windows events via Agent
- Linux: Collects Linux logs and metrics via Agent
Analytics — Analytics platform integrations:
- Elastic: Emulates an Elasticsearch bulk API endpoint for direct ingestion from Beats and Logstash
- Splunk HEC: Emulates Splunk HTTP Event Collector endpoints for direct ingestion from Splunk forwarders
Other — Specialized integrations:
- CCF: Collects logs from the Common Configuration Framework
- Proofpoint On Demand: Consumes Proofpoint log stream via WebSocket
- WEC: Windows Event Collector server using WS-Management protocol

Use Cases

Devices can be used in the following scenarios:

Infrastructure monitoring: Provides system performance metrics, event logs, resource utilization, and service availability information.
Security operations: Enables security event monitoring, threat detection, compliance monitoring, and provides audit trails.
Application telemetry: Provides application logs and performance metrics, and enables error tracking and user activity monitoring.
Network monitoring: Provides network device logs and SNMP data, and enables traffic analysis and connection tracking.

Implementation Strategies

The following strategies optimize device deployment and data collection.

Monitoring

For monitoring operating systems, Director uses a unified agent-based approach with two types of deployment. For full deployment details, see Agents.

Managed (Traditional): The agent is installed and managed by system administrators. This provides persistent installation on the target system. Local data is buffered in the emergence of network issues. Director supports Windows, Linux, macOS, Solaris, and AIX.

Auto-managed (Agentless): The agent is automatically deployed and managed, no manual installation is required. Auto-managed agents provide local data buffering, network resilience, and performance optimization. This deployment type is self-healing, since the agent is automatically redeployed if the process terminates. Also, it supports remote credential management. Deployment is done using WinRM for Windows, and SSH for Linux, macOS, Solaris, and AIX.

Both approaches provide local data processing, store-and-forward capability against connectivity issues, real-time metrics and events, and native OS monitoring. The key difference is deployment and lifecycle management, not functionality.

Layered Collectors

Configure multiple devices to handle different aspects of data collection:

External-facing HTTP endpoints for application logs
Internal TCP/UDP listeners for network device logs
Specialized connectors for cloud and security products

Definitions​

Configuration​

Device-to-Pipeline Handoff​

Device Types​

Use Cases​

Implementation Strategies​

Monitoring​

Layered Collectors​