In this quick walkthrough we're going show you how to review guided troubleshooting and root cause analysis features of ControlUp Advanced Monitoring for Horizon.

Overview

Section 1: Connect to all monitored resources.
Section 2: Focus on the TestDrive-vmwtd.com folder.
Section 3: Select the HOSTS object in the main dashboard grid view.
Section 4: Invoke the Virtual Expert for guided troubleshooting and root cause analysis.
Section 5: Review the Actions available at the Session and Process Level.

Before you Begin

In order to complete this product walkthrough please make sure you have the following:

A valid account in the VMware TestDrive environment, sign up here if you do not have one.
TCP & UDP ports 80, 443, 8443; and if using PCoIP, both TCP & UDP 4172
Latest Horizon Client installed, available via direct download here.
A ControlUp user account on TestDrive. See this article for info on how to access ControlUp on TestDrive and create an account.

Here is a short video demo of the steps in this walkthrough.

ControlUp Advanced Monitoring for Horizon - Guided Troubleshooting and Root Cause Analysis Demo

SECTION 1: Connect to all monitored resources.

Right Click on the TestDrive-vmwtd.com folder in the left pane, then select "Connect" in the context menu to connect to all the monitored resources.

Section 2: Focus on the TestDrive-vmwtd.com folder.

In the left pane, right click on the upper most folder that is titled "TestDrive-vmwtd.com" and select "Focus" in the context menu.

TestDrive_FocusonTestDrive-vmwtd.com.png

Section 3: Select the Hosts object in the main dashboard grid view.

In the main dashboard grid view, select the Hosts object to view the monitored hosts. Click and grab the scroll bar at the bottom, and scroll over until the metric "Stress Level" is right next to the Hosts Name column. The default view should be sorted by the "Stress Level" column. If not, click on the "Stress Level" column until it is sorted by the Hosts with the highest "Stress Level" at the top.

"Stress Level" is an aggregated metric that is available at every object level. The metric includes several other metrics, each with a pre-defined threshold and load (that can be customized). Each "Stress Level" is optimized to fit the object. "Stress Level" for Hosts looks at different metrics with different thresholds than "Stress Level" for Machines, or Sessions. Read this Knowledgebase article to learn more about Stress Level and how to modify and customize it. See this video for information on how to customize the "Stress Level".

Section 4: Invoke the Virtual Expert for guided troubleshooting and root cause analysis.

Five columns over from the "Stress Level" column, you should see the DataStore R/W IOPS metric column. The value for the Host with the highest "Stress Level" should be red. Datastore R/W IOPS is one of the metrics that is included in Hosts "Stress Level".

To troubleshoot this issue, invoke the ControlUp Virtual Expert by clicking on the three bars to the right of the Datastore R/W IOPS metric for the Host with the highest stress level.

After clicking on the three blue bars, the Virtual Expert wizard will pop up. This wizard will include the value of the metric from which the Virtual Expert is invoked as well as a definition.

The Virtual Expert will suggest the next step for troubleshooting for this issue. The next step is to click "Machines - (Detailed I\O)". This view will show all the virtual machines that are running on this particular host, sorted by the metric that is most exceeding threshold. In this case, the only metrics that will be shown are I/O metrics as to avoid cluttering up the data.

On the next screen, you will see the Host at the top with the metric that you invoked the Virtual Expert on circled in blue. On the lower part of the screen, you will see all the virtual machines running on that host, sorted by the metric that is most exceeding threshold. The metric that is most exceeded threshold for the virtual machines will also be encircled in blue.

In this specific case, you will see one virtual machine that is in read for Virtual Disk Read IOPS. Click the three blue bars next to this metric to invoke the virtual expert again.

The next recommended step is to click on "Sessions (Default)". This step will take you to the active sessions on this virtual machine.

The virtual machine is now in the top part of the screen with the metric you invoked the Virtual Expert on again circled in blue. The active sessions on this virtual machine are displayed in the bottom part of the screen, sorted by the metric most exceeding threshold, which is I/O Read Operations.

Click the three blue bars next to this metric to invoke the Virtual Expert again.

Once the Virtual Expert is visible, you see the next choice is to drill down into the Processes on this specific virtual machine. You may recall that this menu option was available when you invoked the Virtual Expert on the Host, which means you could have done the complete drill down to the root cause in two clicks. Click on Processes for the next step.

Now you see the session in the top part of the screen, and the processes running inside the session in the bottom part of the screen, sorted by the metric that is most exceeding threshold. You can see the the problem process is Dynamo.exe, and the metric is is exceeding threshold on is I/O Read Operations. You are now at the root of the problem. Now what do you do?

Section 5: Review the Actions Available at the Session and Process Level.

With most EUC monitoring tools, you are on your own to figure out the next step. ControlUp provides you a number of additional steps you can take to put the issue in context and decide what the best course of action is. Right click on the session in the top part of the screen to see some of the actions available at the session level.

The VMware TestDrive environment is heavily locked down to prevent any one user creating an issue that impacts the uptime of this shared environment. This requires the ControlUp console to be heavily locked down as well. You can still ,however, see what actions would be possible if you had full admin rights within TestDrive on your own ControlUp environment.

When you right click on the session, you see a number of options.

You can "Chat" with a user right from the ControlUp console to ask them what they are doing right now that is impacting other users and sessions, and if this is something that could be done after hours. The Manage Programs and Updates" and "Manage Registry" options allow you to start the Controllers pane, which is a powerful difference tool for windows machines.You could invoke Microsoft's Remote Assistance from the console as well.

Other options include "RDP to Machine". If you know that machine is tightly locked down by group policy, you have the option to "Kill Group Policy" and "Reapply Group Policy". You can also Logoff or Disconnect an existing RDP under the Remote Desktop Services menu.

You also have the option to take action at the Processes level.

When you right click on Processes, then go down to the processes menu option, you see that your options include three different ways to kill a process. You also have the option to "Set Process Affinity", Set Process Priority (you could lower the priority from Normal to below Normal for instance), or you could Start CPU Throttling.

This demo scenario highlights how you can use ControlUp for guided troubleshooting and root cause analysis as well as take action at multiple levels with your Horizon environment.