OpenShift Troubleshooting Common Issues in Cluster

This is a general troubleshooting and health verification for OpenShift Cluster.

Verifying Health of OpenShift Nodes

oc get nodes - display the status of each node. Node(s) status must be ready. oc adm top nodes - display current CPU and memory of each node. oc describe node <node name> - display resources are available and used. Important headings are Capacity, Allocatable and Allocated resources. Conditions indicates whether the node is under memory pressure, disk pressure or some other condition that would prevent the node from starting new containers.

Reviewing Cluster Version Resource

Export KUBECONFIG variable

$ export KUBECONFIG=INSTALL_DIR/auth/kubeconfig

$ oc login -u kubeadmin -p <password>

Retrieve the cluster version.

$ oc get clusterversion

To obtain more details about the cluster status.

$ oc describe clusterversion

Reviewing Cluster Operators

To retrieve list of cluster operators

$ oc get clusteroperators

Displaying OpenShift Node(s) Logs

To retrieve CRI-O container engine logs

$ oc adm node-logs -u crio <node_name>

To retrieve Kubelet logs

$ oc adm node-logs -u kubelet <node_name>

To display all journal logs of a node

$ oc adm node-logs <node_name>

Using Shell Prompt on OpenShift Nodes

To run local commands directly from the node

$ oc debug node/<node_name>

Troubleshooting Container Engine

To get low level information about containers running in node.

$ oc debug node/<node_name>
$ chroot /host
$ crictl ps