See for the official Zabbix site.

Some principles and practices I consider useful (not only) when using Zabbix at larger scale

Jump to: navigation, search

Recently I was asked:

“Can you show us the best way to start monitoring our applications with Zabbix?”

Well, unfortunately I don’t know the best way to start monitoring an application by Zabbix, as this always depends on the environment and the use case. But what I can do is providing some principles and practices I consider useful and valuable, not only to newcomers by the way.

In short

  1. Start from scratch and do it yourself.
  2. Read the complete online documentation.
  3. Be part of the community.
  4. Expect to go through several iterations of rethinking and redesign.
  5. Make use of Zabbix templates.
  6. Use Zabbix user macros instead of constants.
  7. Choose reasonable Zabbix item update intervals.
  8. Develop commonly agreed naming conventions and workflows for Zabbix.
  9. Get along with as few as possible static Zabbix actions.
  10. Define the use case of each Zabbix trigger severity.
  11. Measure everything that might be useful.
  12. Measure no single metric you don’t completely understand.
  13. Get familiar with the way Zabbix works and communicates.
  14. Have fun! ;-)

The long story

First I’d like to emphasize one aspect of Zabbix that significantly separates it from many other Network monitoring systems (NMS). There are no plug-ins tailored to devices or applications you just drop and got that thing monitored. At a first glance this seems to be a shortcoming compared to the countless plug-ins available to other NMS.

Zabbix follows a different concept by automating and above all generalizing monitoring as much as possible. Data presentation, evaluation, correlation, escalation, alerting,… it’s all the same for anything. In fact the only custom part is how to connect to a monitored entity and how to obtain data from it.

Even when there are plenty Zabbix templates out there, start from scratch and do it yourself! By doing this you ensure to really understand what you do and why you do it. The art of monitoring is about identifying the right data to obtain and making meaningful evaluations based on these. Ideally the NMS is just a toolbox to enable this. Ready to use extensions, possibly even not customizable, suggest a quick win. Don’t want to say these are bad in general. But the safe feeling they provide is often just a welcomed illusion.

You already have a good idea of what you can do with Zabbix? Great! Anyway, I strongly encourage you to read the complete online documentation at least once. The aim is not to fully understand nor to remember every detail. If you come to a topic that is absolutely not of interest to you, then skip it. Anyhow, by walking over the documentation you may learn about options and possibilities you would possibly never have asked or searched for before. Finally you’ll get a comprehensive impression of Zabbix and whenever you’re faced with something new you’ll hopefully recall: “Hey, wasn’t there something related in the docs?” Believe me, it pays off in the end!

Be part of the community and start by reading "Getting help and being part of the community". This point is often underestimated, especially when not having had any serious contact to strong FOSS communities yet. Sign-up at least to Zabbix forum and Zabbix Support System. Consider also to sign-up to Zabbix wiki. Join #zabbix IRC channel on freenode. You’ll meet there upstream developers as well as experienced and ambitious community members. Don’t hesitate and ask for anything you don’t found understandably answered in the documentation or the forum. Use IRC for the instant and quick dialog. Consider the forum for comprehensive topics or when expecting complex discussions.

As soon as you feel more confident with Zabbix contribute to the community. You don’t need to spend much time. Answer now and then a question or just add your two cents in a comment. Finding out the answer to a question you can’t answer yet is a great way to learn. By the way, the source code is a reliable source for answers.

Try to publish anything that might be beneficial to someone else too. Create bug reports for everything that appears wrong or incomplete. Create a feature request for anything that Zabbix lacks of and you have a use case for. Before creating a bug report or feature request in the Zabbix Support System you should make sure that there is none of such already existing, of course.

There are usually several different ways possible in Zabbix to implement the same requirement – often not all obvious at the beginning. This may start with identifying which data to obtain and how it becomes available to Zabbix, may continue with proper correlation or dependencies and may end with concerns of scalability and automation. Expect to go through several iterations of rethinking and redesign. Begin with the most promising way of implementation quickly, as soon as you have a good enough idea of the service or application to monitor. Don’t hesitate to change or redo anything that promises improvement, irrespective of the already invested effort. Never expect to achieve a final status. Anything can and should be subject of change by doing continual quality checks and improvement. Although it’s also likely that you reach a constant state after a certain extend of improvement.

Make use of Zabbix templates. There is actually no reason to create anything on Zabbix host level, except Zabbix user macros. By nesting you can separate common elements from specific ones and merge them later almost arbitrarily. Consider to use Zabbix user macros instead of constants wherever it appears appropriate. Zabbix user macros can for instance be very useful in connection with (nested) Zabbix templates as they allow to modify elements (Zabbix items, Zabbix triggers, …) on Zabbix template or Zabbix host level without touching an element itself. But avoid nesting of Zabbix templates with different Read-Write permission due to ZBX-8018.

As long as a RDBMS is used for Zabbix item history storage one should choose reasonable Zabbix item update intervals. For instance, it is very unlikely a Zabbix item needs to be checked every 30 seconds, when a human reaction from a Zabbix trigger event will require 20 minutes.

The possibly most important but also most challenging tasks are to develop commonly agreed naming conventions and workflows for Zabbix. Zabbix enforces by design the separation of monitoring scenarios in common entities like hosts, items, triggers, events, actions, etc. The creation of these abstraction layers makes it possible to combine, correlate or automate on the respective levels without considering the technology behind of any monitored entity.

However, having great flexibility and various utilization possibilities can be a blessing or a curse. For instance, it’s best practice to get along with as few as possible static Zabbix actions which consider new Zabbix trigger without having to make any change to them.

To achieve this one could think of having Zabbix actions to consider events from Zabbix trigger...:

  • ...based on Zabbix items belonging to a certain Zabbix item application.
  • ...based on Zabbix items belonging to a Zabbix host member of a certain Zabbix host group.
  • ...with a certain key word or code in the Zabbix trigger name.
  • ...with a certain Trigger severity.
  • of the other possibilities or a combination of these.

The chosen concept of Zabbix actions will likely affect the way Zabbix items, Zabbix trigger or Zabbix host groups have to be defined resp. used.

It should actually be obvious but since it’s unfortunately quite often ignored, I suggest to define the use case of each Zabbix trigger severity which have to be generally respected by Zabbix (Super) Admins when creating Zabbix triggers.

Measure everything that might be useful regardless of being used in Zabbix triggers or not. Consider performance and resource counter, health indicators, all kind of functional and operative status, actually anything that appears informative. It often turns out not before having an issue that a particular measurement is a valuable information. Measure no single metric you don’t completely understand. Monitoring anything one does not understand is likely a waste of resources and by the way often belongs to the ‘illusion’ category of previously mentioned ‘just plug-in and get that thing monitored’ practice.

Finally prepare to fail. Even when there are a lot of validity checks, when it comes to configuration of entities in Zabbix there are still many variables that can cause a failure. Get familiar with the way Zabbix works and communicates. Learn how to identify configuration issues in the frontend (Status of Host/Item/Trigger/.., Internal events, …), be aware of Zabbix get and how to save time by using the '-t' argument of Zabbix agent. Last but not least, remember there are log files for each Zabbix component running as daemon, which also have often valuable information when something does not work as expected.

Summarized there is certainly a lot to do and even more to learn when starting with Zabbix. Most of the points can actually be considered unrelated to Zabbix and as obvious objectives or common best practices.

In the end it's of course just my personal view on these things after working quite some time with Zabbix. Take care to have fun and stay curious. The rest will happen automatically then - sooner or later ;-)