Utilizing the WSO2 Runtime Diagnostic Tool¶
WSO2 Runtime Diagnostic Tool is a lightweight and easy-to-use tool for generating diagnostic details. It simplifies the data collection process to minimize user involvement. The tool is capable of preemptive data collection for certain types of issues, such as OOM errors. It also captures significant changes in Passthrough metrics for better insight into specific issues.
The tool can be configured using the Configuration Guide provided below. The output of the tool can be analyzed using the Analysis Guide provided below.
Components¶
There are four main components in the tool:
- Memory Watcher: Monitors the memory usage of the server and executes the configured Action Executors when the memory usage exceeds the threshold.
- CPU Watcher: Monitors the CPU usage of the server and executes the configured Action Executors when the CPU usage exceeds the threshold.
- Traffic Analyzer: Monitors the traffic of the server and generates logs when the traffic pattern suddenly changes significantly.
- Log Watcher: Monitors the error logs of the server and executes the configured Action Executors when the log pattern matches the configured pattern.
Configuration Guide¶
The tool is packaged inside the product distribution with default configurations. The configurations can be customized based on user requirements.
Server Configurations¶
The table given below describes the server configurations.
Configuration | Description |
---|---|
deployment_toml_path |
Path to the deployment.toml file in the WSO2 server |
logs_directory |
Path to the logs directory. |
updates_config_path |
Path to the updates config file. |
diagnostic_log_file_path |
Path to write the diagnostic log file. |
carbon_log_file_path |
Path to the carbon error log (wso2error.log) file. |
process_id_path |
Path to the process id file (wso2carbon.pid). |
server_name |
Name of the WSO2 server. |
server_version |
Version of the WSO2 server. |
Given below is a sample configuration for the WSO2 Micro Integrator.
[server_configuration]
deployment_toml_path = "../conf/deployment.toml"
logs_directory = "../repository/logs"
updates_config_path = "../updates/config.json"
diagnostic_log_file_path = "logs/diagnostics.log"
carbon_log_file_path = "../repository/logs/wso2error.log"
process_id_path = "../wso2carbon.pid"
server_name = "WSO2 Micro Integrator"
server_version = "#.#.#"
Action Executor Configurations¶
Currently, the tool supports the following action executors.
Action Executor | Description |
---|---|
ThreadDumper |
Runs the jstack tool to take thread dump and writes the output to a file. |
MemoryDumper |
Takes a heap dump |
OpenFileFinder |
Finds the open files by the server process and writes the output to a file. |
Netstat |
Dumps the network statistics of the server to a file. |
ServerInfo |
Dumps the server information such as name, version, etc. |
MetricsSnapshot |
Takes a current snapshot of the Passthrough transport metrics in synapse |
ThreadDumper¶
Configuration | Description |
---|---|
count |
Number of thread dumps to be taken. |
delay |
Delay between each thread dump in milliseconds. |
Given below is a sample configuration for the ThreadDumper
action executor.
[[action_executor_configuration]]
executor = "ThreadDumper"
count = "5"
delay = "2000"
MemoryDumper¶
Given below is a sample configuration for the MemoryDumper
action executor.
[[action_executor_configuration]]
executor = "MemoryDumper"
OpenFileFinder¶
Given below is a sample configuration for the OpenFileFinder
action executor.
[[action_executor_configuration]]
executor = "OpenFileFinder"
Netstat¶
Given below is a sample configuration for the Netstat
action executor.
[[action_executor_configuration]]
executor = "Netstat"
command = "netstat -lt"
ServerInfo¶
Given below is a sample configuration for the ServerInfo
action executor.
[[action_executor_configuration]]
executor = "ServerInfo"
MetricsSnapshot¶
Given below is a sample configuration for the MetricsSnapshot
action executor.
[[action_executor_configuration]]
executor = "MetricsSnapshot"
Watcher Configurations¶
Currently, the tool supports the following watchers.
Watcher | Description |
---|---|
cpu_watcher |
Watches the CPU usage of the server. |
memory_watcher |
Watches the memory usage of the server. |
log_watcher |
Watches the logs for specific error patterns and triggers actions. |
traffic_analyzer |
Analyzes the Passthrough server traffic and records in the diagnostic log file. |
cpu_watcher¶
Configuration | Description |
---|---|
enabled |
Whether the watcher is enabled or not. |
threshold |
The threshold value for the CPU usage. |
attempts |
The number of attempts before triggering the action executors (This resets every hour). |
interval |
The interval between each check in seconds. |
action_executors |
The action executors to be triggered when the threshold is reached. (Comma separated) |
Given below is a sample configuration for cpu_watcher
.
[cpu_watcher]
enabled = "true"
threshold = "20"
attempts = "2"
interval = "5"
action_executors = "ThreadDumper,MetricsSnapshot,ServerInfo"
memory_watcher¶
Configuration | Description |
---|---|
enabled |
Whether the watcher is enabled or not. |
threshold |
The threshold value for the memory usage. |
attempts |
The number of attempts before triggering the action executors (This resets every hour). |
interval |
The interval between each check in seconds. |
action_executors |
The action executors to be triggered when the threshold is reached. (Comma separated) |
Given below is a sample configuration for memory_watcher
.
[memory_watcher]
enabled = "true"
threshold = "30"
attempts = "2"
interval = "5"
action_executors = "ThreadDumper,MetricsSnapshot,ServerInfo"
log_watcher¶
Configuration | Description |
---|---|
enabled | Whether the watcher is enabled or not. |
interval | The interval between each check in seconds. |
Given below is a sample configuration for log_watcher
.
[log_watcher]
enabled = "true"
interval = "0.1"
Log error patterns¶
Configuration | Description |
---|---|
regex |
Directory to store the zip files |
executors |
The action executors to be triggered when the threshold is reached. (Comma separated) |
reload_time |
Continuous error logs that match the regex pattern won't be processed repeatedly unless the reload_time has elapsed. An error log which matches a certain regex pattern will only be processed after the reload time interval whereas a similar error log was processed before. |
Given below is a sample configuration.
[[log_pattern]]
regex = "(.*)org.apache.synapse.transport.passthru(.*)"
executors = "MetricsSnapshot,Netstat,OpenFileFinder,ThreadDumper,ServerInfo"
reload_time = "30"
Traffic Analyzer Configurations¶
Configuration | Description |
---|---|
last_second_requests_enabled |
Whether the last second requests watcher is enabled or not. |
last_second_requests_windows_size |
The window size for the last second requests watcher. |
last_second_requests_delay |
The delay for the last second requests watcher. |
last_second_requests_interval |
The interval for the last second requests watcher. |
last_fifteen_seconds_requests_enabled |
Whether the last fifteen seconds requests watcher is enabled or not. |
last_fifteen_seconds_requests_window_size |
The window size for the last fifteen seconds requests watcher. |
last_fifteen_seconds_requests_delay |
The delay for the last fifteen seconds requests watcher. |
last_fifteen_seconds_requests_interval |
The interval for the last fifteen seconds requests watcher. |
last_minutes_requests_enabled |
Whether the last minutes requests watcher is enabled or not. |
last_minutes_requests_window_size |
The window size for the last minutes requests watcher. |
last_minutes_requests_delay |
The delay for the last minutes requests watcher. |
last_minutes_requests_interval |
The interval for the last minutes requests watcher. |
notify_interval |
The interval for the traffic analyzer to notify the user. |
Given below is a sample configuration for the traffic analyzer.
[traffic_analyzer]
last_second_requests_enabled = "false"
last_second_requests_windows_size = "300"
last_second_requests_delay = "60"
last_second_requests_interval = "1"
last_fifteen_seconds_requests_enabled = "true"
last_fifteen_seconds_requests_window_size = "100"
last_fifteen_seconds_requests_delay = "4"
last_fifteen_seconds_requests_interval = "15"
last_minutes_requests_enabled = "true"
last_minutes_requests_window_size = "100"
last_minutes_requests_delay = "1"
last_minutes_requests_interval = "60"
notify_interval = "300"
Post Action Executors¶
Zip File Configurations¶
Configuration | Description |
---|---|
output_directory |
Directory to store the zip files |
max_count |
Maximum number of zip files to maintain. When the count exceeds, the older files will be deleted. |
Given below is a sample configuration.
[zip_file_configuration]
output_directory = "data"
max_count = "50"
FTP Configurations¶
Configuration | Description |
---|---|
enabled |
Whether the FTP is enabled or not. |
host |
The FTP host. |
port |
The FTP port. |
username |
The FTP username. |
password |
The FTP password. |
directory |
The FTP directory. |
Given below is a sample configuration.
[ftp_configuration]
enabled = "true"
host = "ftp.example.com"
port = "21"
username = "user"
password = "password"
directory = "diagnostics"
SFTP Configurations¶
Configuration | Description |
---|---|
enabled |
Whether the SFTP is enabled or not. |
host |
The SFTP host. |
port |
The SFTP port. |
username |
The SFTP username. |
password |
The SFTP password. |
directory |
The SFTP directory. |
Given below is a sample configuration.
[sftp_configuration]
enabled = "true"
host = "sftp.example.com"
port = "22"
username = "user"
password = "password"
directory = "diagnostics"
Log4j2 Configurations¶
The log4j2.properties
file can be used to configure the logging level of the tool. The default log level is set to INFO. The log4j2.properties
file can be found in the conf
directory.
Analysing Runtime Issues with the WSO2 Runtime Diagnostic Tool¶
Data¶
The zip files are generated in the <WSO2_HOME>/diagnostics-tool/data
directory. The zip file is named as <processId>-<timestamp>.zip
. The zip file contains the following files:
- deployment.toml: The
deployment.toml
file of the server which contains the configurations. - diagnostics.log: The log file of the diagnostics tool which contains logs related to traffic pattern to the server. The logs are explained in the Diagnostics Log section.
- log.txt: The log line that triggered the Action Executors.
- logs.zip: The log directory of the server in zip format. This may contain the heap dump file if it is generated.
- lsof-output.txt [Optional]: The output of the lsof command which contains the open files by the server process during the error time.
- netstat-output.txt [Optional]: The output of the netstat command which contains the network statistics of the server during the error time.
- server-info.txt: The server information such as name, version etc.
- thread-dump-
- : The thread dump of the server taken during different time intervals during the error time..txt - metrics-snapshot.txt: The current snapshot of the Passthrough transport metrics in synapse runtime.
Diagnostics Log¶
The diagnostics log contains the following logs:
Memory Watcher¶
The memory watcher logs are prefixed with [MemoryWatcher]
.
For example: MemoryWatcher Heap usage is above threshold. Heap usage: 87, Retry count: " + count
.
Here, the log indicates that the heap usage is above the threshold and the heap usage is 87%. The MemoryWatcher retries a couple of times before executing the Action Executors.
CPU Watcher¶
The CPU watcher logs are prefixed with [CPUWatcher]
.
For example: CPUWatcher CPU usage is above threshold. CPU usage: 91, Retry count: " + count
.
Here, the log indicates that the CPU usage is above the threshold and the CPU usage is 91%. The CPUWatcher retries a couple of times before executing the Action Executors.
Traffic Analyzer¶
The traffic analyzer logs are prefixed with [TrafficAnalyzer]
.
For example: TrafficAnalyzer Attribute Last15SecondRequests of type http-listener increased more than the threshold, old value: 2, new value: 227, threshold: 115.22752880979914
.
Here, the log indicates that the Last15SecondRequests attribute of type http-listener increased more than the threshold. The old value is 2, the new value is 227 and the threshold is 115.22752880979914. The threshold is calculated based on the standard deviation of a Simple Moving Average window.
Log Watcher¶
Following are examples of log watcher logs:
[Interpreter] Executing the action executors for the log line matching the regex pattern (.*)org.apache.synapse.transport.passthru(.*)
ServerInfo [INFO] ServerInfo executed successfully.
OpenFileFinder [INFO] OpenFileFinder executed successfully.
Netstat [INFO] Netstat executed successfully
ZipFileExecutor [INFO] Zipping the folder at /Users/user/wso2mi-4.3.0/diagnostics-tool/temp/2024-03-01_14:21:06.743
ZipFileExecutor [INFO] Diagnosis Dumped in :/Users/user/wso2mi-4.3.0/diagnostics-tool/data/96970_2024-03-01_14:21:06.743.zip
ThreadDumper [INFO] Thread dump execution is completed for 96970, thread dump count: 5, delay: 2000ms
MetricsSnapshot [INFO] MetricsSnapshot executed successfully.
ZipFileExecutor [INFO] Zipping the folder at /Users/user/wso2mi-4.3.0/diagnostics-tool/temp/2024-03-01_14:21:06.838
ZipFileExecutor [INFO] Diagnosis Dumped in :/Users/user/wso2mi-4.3.0/diagnostics-tool/data/96970_2024-03-01_14:21:06.838.zip
Here, the log indicates that the log line matching the regex pattern (.*)org.apache.synapse.transport.passthru(.*)
is found. The ServerInfo
, OpenFileFinder
, Netstat
, MetricsSnapshot
, and ThreadDumper
Action Executors are executed. The executors are executed in parallel and the zip file is generated with the output at the end.