Virtual SAN Health Alerts and Warnings

Virtual SAN powered by VMware is a storage virtualization software. This solution is becoming one of the ideal options for a huge number of companies.

Together with the cloud virtualization platform called vSphere, you get the opportunity, using one platform, to manage both all computing and data storage. vSAN pools all devices used for storage in a cluster into a common data pool. This allows companies, depending on their needs, to both increase and decrease scalability in a short time.

Virtual SAN Health Alerts and Warnings

Health validation services such as vSphere Health as well as vSAN Health have proven to be incredibly important for companies as they help to check environments for configuration consistency, gain insight into best practices, and quickly discover any pain points and vulnerabilities.

Combining two services under one roof

To date, these two services have already been combined into a single service called Skyline Health. Two powerful features are under the same roof now. It is available to all customers, regardless of the support level. It is available if you have Basic, Production, or Premier.

The vendor decided to implement this feature because they wanted to provide customers with a way to simplify a lot of tasks, as well as get advanced features and a huge number of ways to check the functionality and get advice in case of vulnerabilities. Now administrators can find the sources of issues much faster, as well as resolve them promptly.

Health history check

Among other things, administrators can view the health history. Many people are well aware that there are both serious issues and small issues that become a consequence of the appearance of serious issues. If you know what originally became the source of minor problems, then this source can be eliminated and all minor problems will automatically disappear.

Thus, administrators can select a date on the timeline to get the information presented in the table about previous checks. Health checks are saved for the last 30 days. When performing a check, the save checks feature is enabled automatically. If you don’t need this feature, you can easily turn it off. However, it is useful for understanding and fixing vulnerabilities.

The connection between emerging issues

System health plays a critical role in the operation of a software-defined data center. Health helps monitor the performance of the entire cluster and can also notify you when any failures occur. Specialists should monitor the occurrence of such alerts to take timely measures to eliminate them.

However, things are not always so simple. Sometimes there are situations where one major failure can cause many other validation categories to generate warning signs. Moreover, some of them may turn red.

Initially, looking at the health check tab, it can be difficult for administrators to determine which issue needs to be fixed first. Sometimes it happens that several warning alerts can be the result of a single serious issue. Thus, the process of establishing the source of issues becomes much more complicated.

If you have the third update of vSAN 7 installed, then the process of establishing the source is greatly simplified for you. Health Check Correlation was introduced in this update, which automatically establishes relationships and dependencies between all possible issues. This feature helps administrators identify and fix a fundamental issue as soon as possible. Thus, by fixing it, administrators, when retesting, may notice how other issues that were symptoms of the main issue disappear.

How to run a health check

To start the health check process, you need to go to the “Monitor” tab, and then select the “Skyline Health” section. If you want all features to work correctly, then you need the CEIP (Customer Experience Improvement Program) setting to be enabled. If you didn’t enable this setting at the installation stage, then you can go to the “Administration” section in your virtual platform to enable it.

When you run the health check process, you will not only receive notifications of warnings and failures, but you may also receive information about issues and their solutions. A ready-made knowledge base will help you with this.

When you find a warning sign next to some health check, you will see two groups of options: “Details” and “Info”.

Details

Here you will see the numerical values as well as the various objects in your virtual infrastructure that are relevant to the alert.

Info

In this tab, you will receive an explanation of the essence of the issue, as well as recommendations for its elimination. In the upper right corner, you can notice the “Ask VMware” button. By clicking on it, you can open an article in the knowledge base, where you can read in detail about the issue that has appeared and how to fix it.

After fixing the issue, you can click on “Retest” to run the health check again to see that everything runs smoothly.

How to view correlated health issues

There are 2 ways to view related issues, where the elimination of one will lead to the disappearance of other possible issues. The first way is to click on the actual error and the second way is to click on “Overview”.

By clicking on “Overview”, you will see the overview output, where you can observe the total number of health checks, as well as the total number of errors. Below you will get a list of all related issues. At the top you will see “Primary issue”, and below it “Likely impacts”

For example, “All hosts have a vSAN vmknic configured” may become the main issue, and “vSAN cluster partition” as well as “vSAN object health” may become possible impacts. Fixing the main issue will solve all the others. To solve it, use the “Info” tab to get an explanation.

Conclusion

The health checks include a large list of detailed health checks that help identify the most vulnerable areas. One serious issue can give rise to many others. If earlier it was difficult for administrators to decide which of them is the main one, now in vSphere 7 Update 3, they can see both primary issues and likely impacts. Thus, administrators can quickly troubleshoot by getting help from the Info tab and reading the knowledge base article.