Tanzu Quick Tips: Default Storage Class

Recently I tinkered around with helm charts on Tanzu Kubnernetes clusters and run into some issues. This is a quick tip if you run into similar issues.

When deploying helm a helm chart I run into the issue, that my pods would go into a crash loop. After investigating I figured out, that the persistent volumes would not be created. The chart did create some PVCs but could not create the associated PVs with it. After describing the PVC I found the following error:

no persistent volumes available for this claim and no storage class is set

So while TKG Clusters have the vSphere CSI as a PV provisioner, the PVC needs to have an associated storage class so that it knows where to create the volumes. With helm charts you can not specify which storage class to use when installing a chart, as you would do with deployments.

Set default storage class

If you set a default storage class this means that all the PVCs that are created without a storage class set will use this one by default. And in theory it is easy to set a storage class as default in kubernetes by running the following:

Figure out which storage classes you have by running kubectl get sc:

kubectl get sc
NAME                          PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
sp-tanzu-global               csi.vsphere.vmware.com   Delete          Immediate              true                   9d
sp-tanzu-global-latebinding   csi.vsphere.vmware.com   Delete          WaitForFirstConsumer   true                   9d

As you can see there are two storage classes, but as the second one is the same as the first just with a different binding mode. No class is set as default, as this would be indicated with (default) here.

Now you can set it as default by patching the class:

kubectl patch storageclass sp-tanzu-global -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Where you need to replace sp-tanzu-global with your storage class name. If you now run kubectl get sc again you should see this:

kubectl get sc
NAME                          PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
sp-tanzu-global (default)     csi.vsphere.vmware.com   Delete          Immediate              true                   9d
sp-tanzu-global-latebinding   csi.vsphere.vmware.com   Delete          WaitForFirstConsumer   true                   9d

So far so good, if you now run your helm chart immediately it should work. But I've seen a different issue:
After about a minute the storage class returns to being non default which means that subsequent installed helm charts throw the error again, that there is no storage class set for the PVC.

This is special with Tanzu clusters. Because the cluster is provisioned with a YAML file the config of the cluster doesn't change if you change the default storage class in the cluster itself. And from time to time tanzu checks the actual config of the cluster and the YAML cluster config and fixes any differences (where the YAML takes priority). This means we have to set the default storage class there:

So change your context to the vSphere namespace and show your clusters (kubectl config use-context <namespace-name>). Depending on which API version was used to deploy the cluster this could be different:

If you get:

kubectl get tkc
No resources found in cl-lab-core-ns-mma-tkg2 namespace.

That means that you have deployed your cluster with v1beta1. If you see your cluster here it means it is deployed with v1alpha3 or older. Depending on which it is, see the respective config:

Default storage class in v1beta1

Run kubectl edit cluster <cluster name>
The config probably is quite long. What you're looking for is the block spec.topology.variables. With a lot ommited here it should look something like that:

spec:
  clusterNetwork:
    services:
      cidrBlocks: ["198.51.100.0/12"]
    pods:
      cidrBlocks: ["192.0.2.0/16"]
    serviceDomain: "cluster.local"
  topology:
    class: tanzukubernetescluster
    version: v1.26.5+vmware.2-fips.1
    controlPlane:
      replicas: 3
    workers:
      machineDeployments:
        - class: node-pool
          name: node-pool-1
          replicas: 3
    variables:
      - name: TKR_DATA    
        ...
      - name: vmClass
        value: best-effort-xsmall
      - name: storageClass
        value: tkg-storage-policy
      - name: defaultStorageClass
        value: tkg-storage-policy

Look for storageClass and defaultStorageClass. This is most likely missing, so just add it anywhere on that level. After that you should have a default storage class so check in your cluster with kubectl get sc

Default storage class in v1alpha3

If your cluster is of the type TKC (as figured out above) you can not edit it with the kubectl edit cluster command and need to use kubectl edit tkc instead.

What you're looking for here is the spec.settings.storage level and in the end it should look something like this (shortened):

spec:
  topology:
    controlPlane:
      replicas: 1
      vmClass: best-effort-xsmall
      storageClass: tkg-storage-policy
      tkr:
        reference:
          name: v1.26.5+vmware.2-fips.1
    nodePools:
    - replicas: 3
      name: node-pool
      vmClass: best-effort-xsmall
      storageClass: tkg-storage-policy
      tkr:
        reference:
          name: v1.26.5+vmware.2-fips.1
  settings:
    storage:
      defaultClass: tkg-storage-policy

You are probably missing the defaultClass value so add that with your storage class. After that you should see your storage class if you switch back to the cluster and run kubectl get sc again.

Volumes are still not created

Still after doing that some of my PVCs had volumes associated to them but some didn't. If after all that you still run into the problem, that no PVs are created for your deployment look at the age of your PVCs when you look at them with kubectl get PVC -A. If the problematic PVC are still there even after you deleted and redeployed the helm chart they might not be created by a persistentVolumeClaim kind of manifest in the chart but by a job. Because of that they don't get automatically deleted when you uninstall the helm chart. So delete them manually and try again. This was enough to solve my issue.