Google Cloud Storage
Synopsis
Creates a target that writes log messages to Google Cloud Storage buckets with support for various file formats, authentication methods, and multipart uploads. The target handles large file uploads efficiently with configurable rotation based on size or event count.
Schema
- name: <string>
description: <string>
type: gcpstorage
pipelines: <pipeline[]>
status: <boolean>
properties:
credentials: <string>
project_id: <string>
bucket: <string>
buckets:
- bucket: <string>
name: <string>
format: <string>
compression: <string>
extension: <string>
schema: <string>
name: <string>
format: <string>
compression: <string>
extension: <string>
schema: <string>
max_size: <numeric>
batch_size: <numeric>
timeout: <numeric>
field_format: <string>
interval: <string|numeric>
cron: <string>
debug:
status: <boolean>
dont_send_logs: <boolean>
Configuration
The following fields are used to define the target:
| Field | Required | Default | Description |
|---|---|---|---|
name | Y | Target name | |
description | N | - | Optional description |
type | Y | - | Must be gcpstorage |
pipelines | N | - | Optional post-processor pipelines |
status | N | true | Enable/disable the target |
Google Cloud Storage Credentials
| Field | Required | Default | Description |
|---|---|---|---|
credentials | N | - | Service account credentials JSON. Uses Application Default Credentials if not provided |
project_id | Y | - | Google Cloud project ID |
Connection
| Field | Required | Default | Description |
|---|---|---|---|
timeout | N | 30 | Connection timeout in seconds |
field_format | N | - | Data normalization format. See applicable Normalization section |
Files
| Field | Required | Default | Description |
|---|---|---|---|
bucket | N* | - | Default GCS bucket name (acts as catch-all when buckets is also specified) |
buckets | N* | - | Array of bucket configurations for file distribution |
buckets.bucket | Y | - | GCS bucket name |
buckets.name | Y | - | File name template |
buckets.format | N | "json" | Output format: json, multijson, avro, parquet |
buckets.compression | N | - | Compression algorithm. See Compression below |
buckets.extension | N | Matches format | File extension override |
buckets.schema | N** | - | Schema definition file path (required for Avro and Parquet formats) |
name | N | "vmetric.{{.Timestamp}}.{{.Extension}}" | Default file name template (used with bucket for catch-all) |
format | N | "json" | Default output format (used with bucket for catch-all) |
compression | N | - | Default compression (used with bucket for catch-all) |
extension | N | Matches format | Default file extension (used with bucket for catch-all) |
schema | N | - | Default schema path (used with bucket for catch-all) |
max_size | N | 0 | Maximum file size in bytes before rotation |
batch_size | N | 100000 | Maximum number of messages per file |
* = Either bucket or buckets must be specified.
** = Conditionally required for Avro and Parquet formats when using buckets.
When max_size is reached, the current file is uploaded to GCS and a new file is created. For unlimited file size, set the field to 0.
Scheduler
| Field | Required | Default | Description |
|---|---|---|---|
interval | N | realtime | Execution frequency. See Interval for details |
cron | N | - | Cron expression for scheduled execution. See Cron for details |
Debug Options
| Field | Required | Default | Description |
|---|---|---|---|
debug.status | N | false | Enable debug logging |
debug.dont_send_logs | N | false | Process logs but don't send to target (testing) |
Details
The Google Cloud Storage target provides enterprise-grade cloud storage integration with comprehensive file format support. GCS offers high durability (99.999999999%), strong consistency for read-after-write operations, and integration with Google Cloud's security and analytics ecosystem.
Authentication Methods
Supports service account credentials JSON provided via the credentials field. When deployed on Google Cloud infrastructure, can leverage Application Default Credentials without explicit credentials by omitting the credentials field.
IAM Permissions
The service account requires the following IAM role:
| IAM Role | Role ID | Purpose |
|---|---|---|
Storage Object Creator | roles/storage.objectCreator | Upload (create) objects in GCS buckets |
Minimum permissions: storage.objects.create
Storage Classes
Google Cloud Storage supports multiple storage classes for cost optimization:
| Storage Class | Use Case |
|---|---|
| Standard | Frequently accessed data |
| Nearline | Data accessed less than once per month |
| Coldline | Data accessed less than once per quarter |
| Archive | Data accessed less than once per year |
Available Regions
Google Cloud Storage is available in multiple regions worldwide:
| Region Code | Location |
|---|---|
us-central1 | Iowa, USA |
us-east1 | South Carolina, USA |
us-west1 | Oregon, USA |
europe-west1 | Belgium |
europe-west2 | London, UK |
europe-west3 | Frankfurt, Germany |
asia-east1 | Taiwan |
asia-northeast1 | Tokyo, Japan |
asia-southeast1 | Singapore |
australia-southeast1 | Sydney, Australia |
Templates
The following template variables can be used in file names:
| Variable | Description | Example |
|---|---|---|
{{.Year}} | Current year | 2024 |
{{.Month}} | Current month | 01 |
{{.Day}} | Current day | 15 |
{{.Timestamp}} | Current timestamp in nanoseconds | 1703688533123456789 |
{{.Format}} | File format | json |
{{.Extension}} | File extension | json |
{{.Compression}} | Compression type | zstd |
{{.TargetName}} | Target name | my_logs |
{{.TargetType}} | Target type | gcpstorage |
{{.Table}} | Bucket name | logs |
Multiple Buckets
Single target can write to multiple GCS buckets with different configurations, enabling data distribution strategies (e.g., raw data to one bucket, processed data to another).
Schema Requirements
Avro and Parquet formats require schema definition files. Schema files must be accessible at the path specified in the schema parameter during target initialization.
Integration with Google Cloud
GCS integrates seamlessly with other Google Cloud services including BigQuery for analytics, Cloud Functions for serverless processing, and Cloud Logging for centralized logging.
Examples
Basic Configuration
The minimum configuration for a JSON GCS target:
targets:
- name: basic_gcs
type: gcpstorage
properties:
project_id: "my-project-123456"
bucket: "datastream-logs"
Service Account Authentication
Configuration with explicit service account credentials:
targets:
- name: gcs_service_account
type: gcpstorage
properties:
project_id: "my-project-123456"
credentials: |
{
"type": "service_account",
"project_id": "my-project-123456",
"private_key_id": "key-id",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "[email protected]",
"client_id": "123456789",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token"
}
bucket: "datastream-logs"
Pipeline-Based Routing
Dynamic bucket routing using pipeline processors to analyze log content and route to appropriate buckets:
targets:
- name: smart_routing_gcs
type: gcpstorage
pipelines:
- dynamic_routing
properties:
project_id: "my-project-123456"
credentials: "${GCP_CREDENTIALS_JSON}"
buckets:
- bucket: "security-events"
name: "security-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
- bucket: "application-events"
name: "app-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
- bucket: "system-events"
name: "system-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
bucket: "other-events"
name: "other-{{.Timestamp}}.json"
format: "json"
pipelines:
- name: dynamic_routing
processors:
- set:
field: "_vmetric.bucket"
value: "security-events"
if: "ctx.event_type == 'security'"
- set:
field: "_vmetric.bucket"
value: "application-events"
if: "ctx.event_type == 'application'"
- set:
field: "_vmetric.bucket"
value: "system-events"
if: "ctx.event_type == 'system'"
Multiple Buckets with Catch-All
Configuration for routing different log types to specific buckets with a catch-all for unmatched logs:
targets:
- name: multi_bucket_routing
type: gcpstorage
properties:
project_id: "my-project-123456"
credentials: "${GCP_CREDENTIALS_JSON}"
buckets:
- bucket: "security-logs"
name: "security-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
- bucket: "application-logs"
name: "app-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "json"
bucket: "general-logs"
name: "general-{{.Timestamp}}.json"
format: "json"
Multiple Buckets with Different Formats
Configuration for distributing data across multiple GCS buckets with different formats:
targets:
- name: multi_bucket_export
type: gcpstorage
properties:
project_id: "my-project-123456"
credentials: "${GCP_CREDENTIALS_JSON}"
buckets:
- bucket: "raw-data-archive"
name: "raw-{{.Year}}-{{.Month}}-{{.Day}}.json"
format: "multijson"
compression: "gzip"
- bucket: "analytics-data"
name: "analytics-{{.Year}}/{{.Month}}/{{.Day}}/data_{{.Timestamp}}.parquet"
format: "parquet"
schema: "<schema definition>"
compression: "snappy"
Parquet Format
Configuration for daily partitioned Parquet files:
targets:
- name: parquet_analytics
type: gcpstorage
properties:
project_id: "my-project-123456"
credentials: "${GCP_CREDENTIALS_JSON}"
bucket: "analytics-lake"
name: "events/year={{.Year}}/month={{.Month}}/day={{.Day}}/part-{{.Timestamp}}.parquet"
format: "parquet"
schema: "<schema definition>"
compression: "snappy"
max_size: 536870912
High Reliability
Configuration with enhanced settings:
targets:
- name: reliable_gcs
type: gcpstorage
pipelines:
- checkpoint
properties:
project_id: "my-project-123456"
credentials: "${GCP_CREDENTIALS_JSON}"
bucket: "critical-logs"
name: "logs-{{.Timestamp}}.json"
format: "json"
timeout: 60
With Field Normalization
Using field normalization for standard format:
targets:
- name: normalized_gcs
type: gcpstorage
properties:
project_id: "my-project-123456"
credentials: "${GCP_CREDENTIALS_JSON}"
bucket: "normalized-logs"
name: "logs-{{.Timestamp}}.json"
format: "json"
field_format: "cim"
BigQuery Integration
Configuration optimized for BigQuery data lake:
targets:
- name: bigquery_ready
type: gcpstorage
properties:
project_id: "my-project-123456"
credentials: "${GCP_CREDENTIALS_JSON}"
bucket: "bigquery-staging"
name: "bq-import/{{.Year}}/{{.Month}}/{{.Day}}/data-{{.Timestamp}}.json"
format: "json"
compression: "gzip"
max_size: 1073741824
Debug Configuration
Configuration with debugging enabled:
targets:
- name: debug_gcs
type: gcpstorage
properties:
project_id: "my-project-123456"
credentials: "${GCP_CREDENTIALS_JSON}"
bucket: "test-logs"
name: "test-{{.Timestamp}}.json"
format: "json"
debug:
status: true
dont_send_logs: true