☕ Blog of Stephan Höhn

Push-based Uptime Monitoring of Self-Hosted K8s Cluster Over Tailscale

  ·   4 min read

Introduction

On my self-hosted homelab cluster, I run several services like Home Assistant and Tandoor Recipes. Especially because of my Home Assistant, which is already critical in my home when unavailable, I want to ensure I get notified when my homelab is no longer reachable.

Before I changed the drive in my master node from an HDD to an SSD, I had some downtime because the hard drive caused the master node to stop working. But aside from that, anything could happen, like the disk getting full or some unknown event causing the cluster to stop working properly, requiring manual investigation.

Current plan

Current Plan
Current Plan

I have one rented KVM server at 1blu, which should have enough uptime to run a monitoring solution. I’ve decided to use Uptime Kuma because it’s seems like the most recommended solution on r/selfhosted. It supports many monitoring types like Ping, Push, HTTP requests, … and has dozens of notification options. My plan is to use the Push method, where the self-hosted cluster calls an HTTP endpoint at regular intervals.

My rented KVM server and my self-hosted homelab are connected in the same Tailscale network through the tailscale-operator. The 1blu server should provide Uptime Kuma through the tailscale-operator (see Exposing a Service using Ingress), so it’s only accessible in the Tailscale network. This also means my self-hosted cluster needs to push its availability to this Tailscale ingress endpoint.

Setup

In general, I use ArgoCD in combination with Kustomize to deploy my services.

Deploy Uptime Kuma on 1blu Server

Unfortunately, the author of Uptime Kuma does not provide any Helm chart, but I found this Helm chart by dirsigler, which serves my purpose. I’ve used the default configuration.

The following Kustomization file was deployed:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

helmCharts:
  - name: uptime-kuma
    releaseName: uptime-kuma
    namespace: uptime-kuma-system
    repo: https://helm.irsigler.cloud
    valuesFile: values.yaml
    version: 2.19.3

resources:
  - ingress.yaml

namespace: uptime-kuma-system

In addition to that, I’ve added my own ingress, which uses tailscale as ingressClassName:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: uptime-kuma-ingress
  namespace: uptime-kuma-system
spec:
  defaultBackend:
    service:
      name: uptime-kuma
      port:
        number: 3001
  ingressClassName: tailscale
  tls:
  - hosts:
    - uptime-kuma

After both were deployed on my KVM server, the service was reachable at https://uptime-kuma.taild178a.ts.net/ . After entering this, it was possible to add a new monitor. I chose the monitor type “Push” and increased the heartbeat interval to 300 seconds (5 minutes):

Add new monitor screenshot

Uptime Kuma will then automatically generate a Push URL. When this URL is called, Uptime Kuma should register the service as up.

CronJob in Homelab

For any pod or CronJob to access a Tailscale service, we need to implement a cluster egress (see https://tailscale.com/kb/1438/kubernetes-operator-cluster-egress).

As outlined in the Tailscale documentation, I’ve added the following service:

apiVersion: v1
kind: Service
metadata:
  annotations:
    tailscale.com/tailnet-fqdn: "uptime-kuma.taild178a.ts.net"
  name: uptime-kuma-service
spec:
  externalName: placeholder
  type: ExternalName

Annoyingly, when you try to access this service uptime-kuma-service within a pod using wget, you will encounter a TLS handshake error. The Tailscale documentation also mentions this; I suspect the TLS mechanism requires the IP to match the actual called DNS name. So, as explained in the documentation, I’ve also added the DNSConfig custom resource:

apiVersion: tailscale.com/v1alpha1
kind: DNSConfig
metadata:
  name: ts-dns
spec:
  nameserver:
    image:
      repo: tailscale/k8s-nameserver
      tag: unstable

This deploys a nameserver, and the live object should now contain the nameserver IP. The documentation mentions adjusting the CoreDNS / kube-dns config, but to be honest, I don’t want to do that just for this small monitoring task.

Interestingly, Kubernetes offers options in the Pod specs to define a customized dns config. So instead of adjusting my CoreDNS config, you can use the dnsPolicy and dnsConfig options to define it per pod.

To test this, I created the following pod:

apiVersion: v1
kind: Pod
metadata:
  name: custom-dns-pod
spec:
  dnsPolicy: "None"  # Will only use the nameservers mentioned in dnsConfig
  dnsConfig:
    nameservers:
      - 10.43.91.129 # Tailscale Nameserver IP
    searches:
      - ts.net       # Domain ending of the Tailscale Uptime Kuma service
  containers:
  - name: uptime-kuma-check
    image: alpine
    command: ["sleep", "infinity"]

When entering the pod and accessing the Push URL with wget, I got a successful response from Uptime Kuma.

To regularly access the Push URL from my K8s homelab, I added the following CronJob to my homelab:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: uptime-kuma-check
  namespace: uptime-kuma-system
spec:
  successfulJobsHistoryLimit: 1
  concurrencyPolicy: "Replace"
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          dnsPolicy: "None"
          dnsConfig:
            nameservers:
              - 10.43.91.129
            searches:
              - ts.net
          containers:
          - name: uptime-kuma-check
            image: alpine:3.20
            command:
            - /bin/sh
            - -c
            - wget "{Uptime Kuma Push URL}" -O /dev/null
          restartPolicy: OnFailure

Conclusion

Dealing with the DNS configs was the hardest part. I also struggled with another TLS error, which probably had something to do with the BusyBox image I was using before. I had no issues with the Alpine image. So far, the monitoring works quite well.