Skip to content

Metrics & Monitoring

Enable Metrics

javascript
import { createApp } from 'rnode-server';

const app = createApp({ 
  logLevel: "info", 
  metrics: true  // Enable Prometheus metrics
});

Access Metrics

  • Metrics Endpoint: GET /metrics (Prometheus format)

Available Metrics

MetricTypeDescriptionLabels
http_requests_totalCounterTotal HTTP requestsmethod, path, status
http_requests_duration_secondsHistogramRequest durationmethod, path, status
rnode_server_process_cpu_usage_percentGaugeProcess CPU usage-
rnode_server_process_memory_kbGaugeProcess memory usage-
rnode_server_uptime_secondsGaugeServer uptime-
rnode_server_pending_requestsGaugePending requests count-
rnode_server_slow_requests_totalCounterSlow requests (>1s)method, path, duration_range
rnode_server_cache_hits_totalCounterCache hits-
rnode_server_cache_misses_totalCounterCache misses-
rnode_server_data_cache_operations_totalCounterTotal data cache operationsoperation, cache_type, status
rnode_server_data_cache_hits_totalCounterTotal data cache hitscache_type, key_pattern
rnode_server_data_cache_misses_totalCounterTotal data cache missescache_type, key_pattern
rnode_server_data_cache_errors_totalCounterTotal data cache errorserror_type, cache_type, operation
rnode_server_data_cache_operation_duration_secondsHistogramData cache operation durationoperation, cache_type
rnode_server_data_cache_tag_operations_totalCounterTotal data cache tag operationsoperation, cache_type
rnode_server_total_connectionsCounterTotal connections-
rnode_server_websocket_connections_totalCounterTotal WebSocket connections-
rnode_server_websocket_disconnections_totalCounterTotal WebSocket disconnections-
rnode_server_websocket_connections_activeGaugeActive WebSocket connections-
rnode_server_websocket_rooms_totalGaugeTotal WebSocket rooms-
rnode_server_websocket_room_connectionsGaugeConnections per roomroom_id, room_name
rnode_server_websocket_messages_sent_totalCounterTotal messages senttype, room_id, path
rnode_server_websocket_messages_received_totalCounterTotal messages receivedtype, room_id, path
rnode_server_websocket_connection_duration_secondsHistogramConnection durationpath, room_id
rnode_server_websocket_message_size_bytesHistogramMessage sizetype, direction
rnode_server_websocket_errors_totalCounterTotal WebSocket errorserror_type, path, room_id

PromQL Queries

Request Rate

sql
# Requests per second
rate(http_requests_total[5m])

# Requests by method
rate(http_requests_total[5m]) by (method)

# Requests by status code
rate(http_requests_total[5m]) by (status)

Response Time

sql
# 95th percentile response time
histogram_quantile(0.95, rate(http_requests_duration_seconds_bucket[5m]))

# Average response time
rate(http_requests_duration_seconds_sum[5m]) / rate(http_requests_duration_seconds_count[5m])

Error Rate

sql
# Error rate (4xx, 5xx)
rate(http_requests_total{status=~"4..|5.."}[5m])

# Error percentage
(rate(http_requests_total{status=~"4..|5.."}[5m]) / rate(http_requests_total[5m])) * 100

System Metrics

sql
# CPU usage
rnode_server_process_cpu_usage_percent

# Memory usage
rnode_server_process_memory_kb

# Server uptime
rnode_server_uptime_seconds

WebSocket Metrics

sql
# WebSocket connection rate
rate(rnode_server_websocket_connections_total[5m])

# Active WebSocket connections
rnode_server_websocket_connections_active

# WebSocket message rate
rate(rnode_server_websocket_messages_sent_total[5m])
rate(rnode_server_websocket_messages_received_total[5m])

# WebSocket error rate
rate(rnode_server_websocket_errors_total[5m])

# Average connection duration
histogram_quantile(0.95, rate(rnode_server_websocket_connection_duration_seconds_bucket[5m]))

# Average message size
histogram_quantile(0.50, rate(rnode_server_websocket_message_size_bytes_bucket[5m]))

# Messages by type
rate(rnode_server_websocket_messages_sent_total[5m]) by (type)
rate(rnode_server_websocket_messages_received_total[5m]) by (type)

# Errors by type
rate(rnode_server_websocket_errors_total[5m]) by (error_type)

# Room connections
rnode_server_websocket_room_connections

# Total rooms
rnode_server_websocket_rooms_total

Cache Metrics

sql
# Cache hit rate
rate(rnode_server_data_cache_hits_total[5m]) / (rate(rnode_server_data_cache_hits_total[5m]) + rate(rnode_server_data_cache_misses_total[5m])) * 100

# Cache operations rate
rate(rnode_server_data_cache_operations_total[5m]) by (operation, cache_type)

# Cache error rate
rate(rnode_server_data_cache_errors_total[5m]) by (error_type, cache_type)

# Cache operation duration
histogram_quantile(0.95, rate(rnode_server_data_cache_operation_duration_seconds_bucket[5m])) by (operation, cache_type)

# Average cache operation duration
rate(rnode_server_data_cache_operation_duration_seconds_sum[5m]) / rate(rnode_server_data_cache_operation_duration_seconds_count[5m]) by (operation, cache_type)



# Tag operations rate
rate(rnode_server_data_cache_tag_operations_total[5m]) by (operation, cache_type)

# Cache hits by key pattern
rate(rnode_server_data_cache_hits_total[5m]) by (key_pattern)

# Cache misses by key pattern
rate(rnode_server_data_cache_misses_total[5m]) by (key_pattern)

# Tag operations by type
rate(rnode_server_data_cache_tag_operations_total[5m]) by (operation, cache_type)

# Operations with tags vs without tags
rate(rnode_server_cache_operations_total[5m]) by (operation, cache_type, status)

# Most used tags (by tag operations)
rate(rnode_server_cache_tag_operations_total[5m]) by (operation)

Grafana Dashboard

For a complete monitoring setup, see Grafana Dashboard Configuration.

Custom Metrics

You can extend metrics by creating custom Prometheus metrics in your application:

javascript
// Example: Custom business metrics
app.get('/api/orders', (req, res) => {
  // Increment custom metric
  // orders_total.inc();
  
  res.json({ orders: [] });
});

Alerting Rules

High Error Rate

yaml
groups:
  - name: rnode-server
    rules:
      - alert: HighErrorRate
        expr: (rate(http_requests_total{status=~"4..|5.."}[5m]) / rate(http_requests_total[5m])) * 100 > 5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }}%"

High Response Time

yaml
      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_requests_duration_seconds_bucket[5m])) > 1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"
          description: "95th percentile response time is {{ $value }}s"

High CPU Usage

yaml
      - alert: HighCPUUsage
        expr: rnode_server_process_cpu_usage_percent > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage detected"
          description: "CPU usage is {{ $value }}%"

Low Cache Hit Rate

yaml
      - alert: LowCacheHitRate
        expr: (rate(rnode_server_data_cache_hits_total[5m]) / (rate(rnode_server_data_cache_hits_total[5m]) + rate(rnode_server_data_cache_misses_total[5m]))) * 100 < 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low cache hit rate detected"
          description: "Cache hit rate is {{ $value }}%"

High Cache Error Rate

yaml
      - alert: HighCacheErrorRate
        expr: rate(rnode_server_data_cache_errors_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High cache error rate detected"
          description: "Cache error rate is {{ $value }} errors/second"

High Cache Operation Duration

yaml
      - alert: HighCacheOperationDuration
        expr: histogram_quantile(0.95, rate(rnode_server_data_cache_operation_duration_seconds_bucket[5m])) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High cache operation duration detected"
          description: "95th percentile cache operation duration is {{ $value }}s"

Released under the MIT License.