See www.zabbix.com for the official Zabbix site.

Docs/specs/ZBXNEXT-473

From Zabbix.org
Jump to: navigation, search

Force check of a passive item

ZBXNEXT-473

Status: Initial draft, do not comment

Owner: Alexei

Summary

Currently it is not possible to force a check a passive item or LLD discovery rule. The proposed functionality adds ability to force checking of a selected group of items from Zabbix frontend.

Specification

New operation "Force check" will be available for items and LLD rules. When user presses button "Force check", the frontend sends a special request to the server. Trapper will create a new task which will be executed by new task manager process.

Frontend changes

Item and LLD rule forms

  • New button 'Force check' will be placed next to 'Clone' for existing passive LLD rules and items
  • It sends JSON request and then redirects to the list of items/LLD rules

Mass operations

  • New operation 'Force check' will be placed next to 'Disable selected' for LLD rules and items
  • It sends JSON request and then redirects to the list of items/LLD rules

Messages

 Successful operation: 'Force check is queued for execution'
 Error message: 'Cannot perform force check: *error message*

Zabbix server

Trapper

Zabbix trapper should accept a new request from the Zabbix frontend:

{
    "request":"task.create",
    "sid":"session ID",
    "data":[
        {
            "type":0,
            "object":0,
            "objectid":<itemid1>
        },
        {
            "type":0,
            "object":0,
            "objectid":<itemid2>
        }, ...
    ]
}

Trapper will validate:

  • general JSON formatting problems
 Invalid JSON request: *JSON error*
  • session ID
 Permission denied.
  • JSON format: mandatory fields
 Invalid task execution request: *JSON error* 
  • validate 'type' and 'object'
 Invalid task type.
 Invalid object for this task type.

It will return response == 'success' if operation was successful or response == 'failed' with error message in field 'info' in case of any validation problems.

New records for table 'tasks' will be created for each object ID.

Task manager

New server-side process, task manager, will be introduced. Only one process will be started, it will support self-monitoring.

The process will periodically (every 5 seconds, hardcoded) select all new tasks (TASK_STATE_NEW) ordered by 'ts_created' with taskid greather than the last selected taskid. Then the tasks will be bulk processed, 1000 per one round. For each object ID it will:

  • validate user permissions
 No permissions to perform this task.
  • validate 'type' and 'object'
 Invalid task type.
 Invalid object for this task type.
  • validate existence of itemid
 Item does not exist.
  • if item or LLD rule is not passive it will be ignored
 Item must be passive for this task.
  • update configuration cache with fresh data from the database (for this specific item/LLD rule only, no full update!) and set 'nextcheck' to 0
  • task state should be updated to TASK_STATE_COMPLETED or TASK_STATE_FAILED with 'error' set

Zabbix housekeeper will be extended to remove completed or failed tasks (TASK_STATE_COMPLETED or TASK_STATE_FAILED) from the table 'task' older than 24 hours (task.created < now-24h).

Additional value for the second parameter of self-monitoring item will be introduced:

 zabbix[process,**task manager**, mode, state]

The following proctitle strings will be used:

 task manager #1 [processed %d tasks in %f sec, processing tasks]
 task manager #1 [processed %d tasks in %f sec, idle 1 sec]

Task processing

Jobs

Job is a group of related tasks that can be can be either processed or sent together.

A job has the following properties:

  • hostid – the target host, 0 if the job must be processed locally
  • active – flag specifying if the job is active or passive
  • proxyid – the target proxy, 0 if the job must be processed on server
  • time_updated – the timestamp when job was created/updated (a new tasks added)
  • tasks[] - a list of the tasks to process

The following tables describes explains Job properties depending on tasks.

On server:

Task description hostid active proxyid
Force passive checks monitored directly by server 0 no 0
Force active checks monitored directly by server <hostid> yes 0
Force passive checks monitored by passive proxy <hostid> no <proxyid>
Force active checks monitored by passive proxy <hostid> no <proxyid>
Force passive checks monitored by active proxy <hostid> yes <proxyid>
Force active checks monitored by active proxy <hostid> yes <proxyid>

On proxy:

Task description hostid active proxyid
Force passive checks 0 no <proxyid>
Force active checks <hostid> yes <proxyid>

Task Manager

Task manager is a server and proxy process that handles job queue synchronization and local task processing. Every 5 seconds (hardcoded period) task manager does the following:

  1. read new unfinished tasks from database, validate, group by jobs and update pending job queue
  2. update/remove tasks from finished job queue
  3. process local jobs

Job Processing

Depending on tasks a job can be processed either by server, proxy or agent. Before job is processed it is removed from pending jobs queue and after processing it is added to finished jobs queue.

Server
Force passive checks monitored by server
The passive checks that are monitored directly by server are processed by task manager itself.
Force active checks monitored by server
Active agents requests tasks from server with configurable frequency. Upon receiving job request trapper process checks pending jobs queue for host jobs. If found the tasks are returned to the active agent, which resets next check values for requested items.
Force active/passive checks monitored by passive proxy
When processing a proxy the proxypoller process checks pending jobs queue for jobs issued to a host monitored by this proxy. If found the tasks are returned to the proxy, which stores them into database for proxy task manager to process.
Force active/passive checks monitored by active proxy
Active proxy requests tasks from server with configurable frequency. Upon receiving tasks active proxy stores them into database for proxy task manager to process.
Proxy
Force passive checks monitored by proxy
The passive checks are processed by task manager itself.
Force active checks monitored by proxy
Active agents requests tasks from proxy with configurable frequency. Upon receiving job request trapper process checks pending jobs queue for host jobs. If found the tasks are returned to the active agent, which resets next check values for requested items.

Configuration update

Ideally before force checks are processed the server (and if necessary agent or proxy) configuration cache must be updated from database. However that would be quite complicated process so maybe only 'lite' update will be performed (if any). The 'lite' update would only update related items/host and abort in the case of any conflicts (items.key_, host.host).

Translation strings

  • Force check

Database changes

New table 'task':

 TABLE|task|taskid|0
 FIELD           |taskid         |t_id           |       |NOT NULL       |0                   # unique task ID
 FIELD           |userid         |t_id           |       |NOT NULL       |0 |1 |users         # user ID
 FIELD           |type           |t_integer      |'0'    |NOT NULL       |0 # task type: TASK_TYPE_FORCECHECK (0) - force item or LLD rule check
 FIELD           |state          |t_varchar(64)  |''     |NOT NULL       |0  # TASK_STATE_NEW (0) - new, TASK_STATE_COMPLETED (1) - completed, TASK_STATE_FAILED (2)
 FIELD           |ts_created     |t_time         |'0'    |NOT NULL       |0  # time stamp when task is created
 FIELD           |object         |t_integer      |'0'    |NOT NULL       |0  # TASK_OBJECT_ITEM (0) - item, TASK_OBJECT_LLDRULE (1) - LLD rule
 FIELD           |objectid       |t_id           |'0'    |NOT NULL       |0  # item ID for TASK_OBJECT_ITEM, LLD rule ID for TASK_OBJECT_LLDRULE
 FIELD           |error          |t_varchar(2048)|''     |NOT NULL       |0  # error message
 INDEX           |1              |state, ts_created

Existing template 'Template App Zabbix Server' (including items, triggers, graphs and screens) must be enhanced to monitor 'task manager'.

Documentation

To be discussed

  • Support of this functionality for passive items monitored by proxies
  • Ability to "Force check" an item from "Latest data" page

Also discussed

  • Support of web scenarios is out of scope
  • Decided not to split table 'task' into several tables
  • Decided not to introduce number of retries
  • Decided not to validate duplicated and extra fields in incoming JSON, we will reuse API validation in the future
  • There will be no new task.type for LLD rules
  • Decided to select data every 5 seconds, no performance optimization at this moment

ChangeLog

  • N/A