Monitoring Using SNMP
If you are using a third-party tool to monitor NetGovern functions, we have developed counters that may be useful for you.
While Management Information Bases (MIBs) are available and rich in information, sometimes a real-world suggestion for what to monitor and what it means, is required. Below is a list of the most relevant items, their meaning, and interpretations and example graphs that can be generated from them.
Note that baselines vary from system to system, and should be adjusted to fit each particular environment.
The Mail Store Connectivity counter reflects the system's ability to connect to the email store and run jobs that work in your email system, such as archiving. These are binary values, with zero being "bad" and one being "good".
220.127.116.11.4.1.27918.104.22.168.6; Exchange Powershell Access; 0|1
22.214.171.124.4.1.279126.96.36.199.4; Exchange User Access; 0|1
The following counters reflect overall system health:
License State is a binary value with zero being "good" and one being "bad".
Node Type indicates if the server is a master or worker.
Node State is a binary value with zero being "bad" and one being "good". A zero value usually means the NetGovern service is not running.
Database Connectivity is a binary value with zero being "bad" and one being "good". A "bad" value will prevent log and job report access, and means the software cannot reach the database using the defined Data Source Name (DSN).
Load Ratio reflects how busy the system is, from a software perspective. On a master server that has workers, this should be very low. On a worker, the desired value is high, to get the most of the computing resources on that server.
188.8.131.52.4.1.279184.108.40.206.2; License - State; 0|1
220.127.116.11.4.1.27918.104.22.168.3; Netmail Archive Node Type; String
22.214.171.124.4.1.279126.96.36.199.4; Node - State; 0|1
188.8.131.52.4.1.279184.108.40.206.14; Node - DB Access - State; 0|1
220.127.116.11.4.1.27918.104.22.168.17; Node - Load Ratio - Percentage
These counters show the count of jobs either running or queued, a well as a total, per server.
The values are less important than the trends. Job counts spike as jobs start, and gradually go down to zero. This cycle should repeat predictably. An erratic, or flat, pattern is worth investigating.
22.214.171.124.4.1.279126.96.36.199.6; Jobs - Queued
188.8.131.52.4.1.279184.108.40.206.7; Jobs - Running
220.127.116.11.4.1.27918.104.22.168.8; Jobs - Total
Most jobs write to disk. These counters represent the system's access to defined locations and devices. They are trinary values, with "0" meaning all are down, "1" meaning some are down, and "2" meaning all are up. If all are down, the problem might be systemic. If only some are down, the problem might be localized.
The system relies on temporary space for most data moves. As such, there needs to be enough temporary space to accommodate the total requirements per move job.
22.214.171.124.4.1.279126.96.36.199.24; Node - Device - Access - State; 0|1|2
188.8.131.52.4.1.279184.108.40.206.23; Node - Location - Access - State; 0|1|2
220.127.116.11.4.1.27918.104.22.168.11; Node - Disk - Free Space - Temp; Used space in GB
Users interact with the software through the NetGovern Search interface. Their initial perception is greatly affected by responsiveness, followed by the successful ability to see the requested data.
Search counters reflect both connectivity and timing to the indexing services responsible for handling these queries.
Index Search states are trinary values, with "0" meaning all are down, "1" meaning some are down, and "2" meaning all are up.
Index Admin represents the interface used by jobs while the search counter represents the user facing interface.
Index Search timings of under 300ms feel responsive to users.
Search Connectivity represents the current number of user connections across the NetGovern Search interface.
22.214.171.124.4.1.279126.96.36.199.20; Node - Index Search - State; 0|1|2
188.8.131.52.4.1.279184.108.40.206.19; Node - Index Admin - State; 0|1|2
220.127.116.11.4.1.27918.104.22.168.21; Node - Index Search - Time; Time in milliseconds
22.214.171.124.4.1.279126.96.36.199.16; Search Connectivity - State; 0|1|2; 0|1|2
188.8.131.52.4.1.279184.108.40.206.220.127.116.11; DP - Connections; Count