Opinionated Kubernetes 5: Certificate Manager and NFS Provisioner with Helm

If you’ve not already read them, check out Part 1 where we deploy our Kubernetes master node using Kubeadm, Part 2 where we setup cluster networking with kube-router, Part 3 where we setup loadbalancers with MetalLB, and Part 4 where we deploy the Helm package manager, and use it to deploy the nginx ingress service.

Introduction

In this post, we’ll cover deploying two new services to our cluster using the Helm package manager. We’ll deploy Cert Manager, a service to handle TLS certificate issuance with Let’s Encrypt and the NFS Provisioner, a service to provide dynamic volumes to others services within your cluster.

Installing NFS Provisioner

Lets start with the NFS Provisioner. The NFS Provisioner will create a NFS server running within your cluster, and will integrate into Kubernetes as a StorageClass. This allows for applications to request a volume, and have it provided automatically. For my use case, this is perfect. However - If you’re serious about running a production Kubernetes cluster, you may want to consider something like an external Ceph RBD cluster.

At the time of writing, I’ve written - but not yet published - my nfs-provisioner helm chart. I’ll be submitting this to the Kubernetes charts repository in the next few days, in the meantime, I’ve released the chart to my own charts repo.

Lets start by adding my repository:

$ helm repo add kiall https://charts.macinnes.ie/
"kiall" has been added to your repositories

$ helm search nfs
NAME                                  CHART VERSION APP VERSION DESCRIPTION                                       
kiall/nfs-provisioner                 0.1.0         1.0.8       nfs-provisioner is an out-of-tree dynamic provi...

Great - the repository has been added, and we can see the nfs-provisioner chart is available. Lets create our nfs-provisioner-values.yaml file for Helm:

persistence:
  enabled: true
  storageClass: "-"
  size: 1000Gi

storageClass:
  defaultClass: true

nodeSelector:
  kubernetes.io/hostname: dl165g7

tolerations:
 - key: "node-role.kubernetes.io/master"
   operator: "Exists"
   effect: "NoSchedule"

We need to be careful when deploying the NFS Provisioner about how we configure it’s storage. In this case, we’re using persistence.storageClass = -. This is a common convention in Helm to tell a chart to NOT provision it’s own storage. This is important as, if we choose to deploy multiple replicas of this chart, and we choose to make this out default class, we would end up using the first replica of nfs-provisioner to provide storage for the second, third, etc replica of nfs-provisioner. Additonally, we want this storage to be as safe as possible - since it’s going to be backing all our other volumes.

Also important to note is the nodeSelector I’ve used. That’s because, to provide storage to the nfs-provisioner itself, I’m going to use a hostPath volume. This is a volume type that cannot be moved between nodes, so we must ensure the pod is always running on the same machine. This is totally unsuitable for a real production deployment, however, it’s fine for my home setup!

Okay - We’re ready to install the chart, so lets do it:

$ helm install kiall/nfs-provisioner --name nfs-provisioner --namespace kube-system -f nfs-provisioner-values.yaml
NAME:   nfs-provisioner
LAST DEPLOYED: Sun Feb 25 09:44:46 2018
NAMESPACE: kube-system
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME             TYPE       CLUSTER-IP    EXTERNAL-IP  PORT(S)                                 AGE
nfs-provisioner  ClusterIP  172.31.14.78  <none>       2049/TCP,20048/TCP,51413/TCP,51413/UDP  0s

==> v1beta2/StatefulSet
NAME             DESIRED  CURRENT  AGE
nfs-provisioner  1        1        0s

==> v1/Pod(related)
NAME               READY  STATUS             RESTARTS  AGE
nfs-provisioner-0  0/1    ContainerCreating  0         0s

==> v1/StorageClass
NAME           PROVISIONER                    AGE
nfs (default)  cluster.local/nfs-provisioner  0s

==> v1/ServiceAccount
NAME             SECRETS  AGE
nfs-provisioner  1        0s

==> v1/ClusterRole
NAME             AGE
nfs-provisioner  0s

==> v1/ClusterRoleBinding
NAME             AGE
nfs-provisioner  0s

If we inspect the pods in the kube-system namespace, we’ll find our nfs-provisioner-0 pod stuck in ContainerCreating status:

$ kubectl -n kube-system get pods
NAME                                             READY     STATUS              RESTARTS   AGE
etcd-dl165g7                                     1/1       Running             1          10d
kube-apiserver-dl165g7                           1/1       Running             1          10d
kube-controller-manager-dl165g7                  1/1       Running             1          10d
kube-dns-6f4fd4bdf-tzgsb                         3/3       Running             3          20h
kube-proxy-cg5g2                                 1/1       Running             1          10d
kube-router-zns2d                                1/1       Running             1          9d
kube-scheduler-dl165g7                           1/1       Running             1          10d
nfs-provisioner-0                                0/1       ContainerCreating   0          51s
nginx-ingress-controller-d7fb49479-d99f7         1/1       Running             1          3d
nginx-ingress-default-backend-7544489c4b-kd2dk   1/1       Running             1          3d
tiller-deploy-6446fbc7f6-jzb9w                   1/1       Running             1          3d

This is OK, we actually expect this - the Pod is waiting for it’s storage to become available, we can see this by looking at the PersistentVolumeClaim’s in in the kube-system namespace:

$ kubectl -n kube-system get pvc
NAME                     STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-nfs-provisioner-0   Pending                                                      2m

We’ll need to hand provision a volume for this to use. In my case, I’m going to use a hostPath volume, however, this could be any kind of volume you want - it could be Ceph, or a OpenStack Cinder volume, or anything else Kubernetes supports. hostPath volumes are “toys” within Kubernetes, they can’t move around between machines to provide for HA etc, so don’t do this in a production environment! Lets create a nfs-provisioner-pv.yaml to hold to definition of this volume:

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data-nfs-provisioner-0
spec:
  capacity:
    storage: 1000Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /srv/volumes/data-nfs-provisioner-0
  claimRef:
    namespace: kube-system
    name: data-nfs-provisioner-0

We’ve named this volume data-nfs-provisioner-0, and importantly, cross-referenced the volume claim called data-nfs-provisioner-0 in the kube-system namespace. This will ensure the PersistentVolume and PersistentVolumeClaim are correctly linked together. Lets create the volume now:

$ kubectl apply -f nfs-provisioner-pv.yaml
persistentvolume "data-nfs-provisioner-0" created

And, inspect that everything was linked up correctly and that the pod has started - this might take a few minutes:

$ kubectl -n kube-system get pvc
NAME                     STATUS    VOLUME                   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-nfs-provisioner-0   Bound     data-nfs-provisioner-0   1000Gi     RWO                           5m

$ kubectl -n kube-system get pods
NAME                                             READY     STATUS    RESTARTS   AGE
etcd-dl165g7                                     1/1       Running   1          10d
kube-apiserver-dl165g7                           1/1       Running   1          10d
kube-controller-manager-dl165g7                  1/1       Running   1          10d
kube-dns-6f4fd4bdf-tzgsb                         3/3       Running   3          20h
kube-proxy-cg5g2                                 1/1       Running   1          10d
kube-router-zns2d                                1/1       Running   1          9d
kube-scheduler-dl165g7                           1/1       Running   1          10d
nfs-provisioner-0                                1/1       Running   0          7m
nginx-ingress-controller-d7fb49479-d99f7         1/1       Running   1          3d
nginx-ingress-default-backend-7544489c4b-kd2dk   1/1       Running   1          3d
tiller-deploy-6446fbc7f6-jzb9w                   1/1       Running   1          3d

Perfect - the pod has started!

Testing the NFS provisioner deployment:

Okay, it’s time to check it’s all working as expected! Dynamic volumes in kubernetes are created automatically when a PersistentVolumeClaim with the right options is defined - lets create a test-nfs-provisioner.yaml file to do this:

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nfs-provisioner-test
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Mi
  storageClassName: "nfs"

We’ll go ahead and apply this PersistentVolumeClaim, wait a few seconds, and see if the nfs-provisioner automatically created a matching PersistentVolume for us:

$ kubectl apply -f test-nfs-provisioner.yaml 
persistentvolumeclaim "nfs-provisioner-test" created

$ kubectl -n kube-system get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                STORAGECLASS   REASON    AGE
data-nfs-provisioner-0                     1000Gi     RWO            Retain           Bound     kube-system/data-nfs-provisioner-0                            9m
pvc-ce48022d-1a12-11e8-a782-d4856451b830   100Mi      RWO            Delete           Bound     default/nfs-provisioner-test         nfs                      15s

Yep - there it is. Lets cleanup:

$ kubectl delete -f test-nfs-provisioner.yaml 
persistentvolumeclaim "nfs-provisioner-test" deleted

$ kubectl -n kube-system get pvc
NAME                     STATUS    VOLUME                   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-nfs-provisioner-0   Bound     data-nfs-provisioner-0   1000Gi     RWO                           3m

$ kubectl -n kube-system get pv
NAME                     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                STORAGECLASS   REASON    AGE
data-nfs-provisioner-0   1000Gi     RWO            Retain           Bound     kube-system/data-nfs-provisioner-0                            8m

Installing Cert Manager

Lets move on and get Cert Manager installed - this service will integrate with Let’s Encrypt, or your own DIY Certificate Authority, to issue TLS certificates to services running within your cluster. Again, we’re going to use Helm to do the deployment. As usual, lets start with a cert-manager-values.yaml file:

fullnameOverride: cert-manager

ingressShim:
  extraArgs:
    - --default-issuer-name=letsencrypt-dns
    - --default-issuer-kind=ClusterIssuer
    - --default-acme-issuer-challenge-type=dns01
    - --default-acme-issuer-dns01-provider-name=cloudflare-dns

tolerations:
 - key: "node-role.kubernetes.io/master"
   operator: "Exists"
   effect: "NoSchedule"

Well - that was easy! Kinda! As it turns out, the upstream cert-manager chart has no support for specifying tolerations - I’ve a Pull Request open to fix this, for now, lets use my chart repository again where I have the chart with this PR included:

$ helm install kiall/cert-manager --name cert-manager --namespace kube-system -f cert-manager-values.yaml
LAST DEPLOYED: Sun Feb 25 10:40:55 2018
NAMESPACE: kube-system
STATUS: DEPLOYED

RESOURCES:
==> v1beta1/ClusterRoleBinding
NAME          AGE
cert-manager  20h

==> v1beta1/Deployment
NAME          DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
cert-manager  1        2        1           1          20h

==> v1/Pod(related)
NAME                           READY  STATUS             RESTARTS  AGE
cert-manager-74df9dc4c5-hbmrr  2/2    Running            0         16h
cert-manager-fc9cc748b-dftq2   0/2    ContainerCreating  0         0s

==> v1/ServiceAccount
NAME          SECRETS  AGE
cert-manager  1        20h

==> v1beta1/CustomResourceDefinition
NAME                               AGE
certificates.certmanager.k8s.io    20h
clusterissuers.certmanager.k8s.io  20h
issuers.certmanager.k8s.io         20h

==> v1beta1/ClusterRole
cert-manager  20h


NOTES:
cert-manager has been deployed successfully!

In order to begin issuing certificates, you will need to set up a ClusterIssuer
or Issuer resource (for example, by creating a 'letsencrypt-staging' issuer).

More information on the different types of issuers and how to configure them
can be found in our documentation:

https://github.com/jetstack/cert-manager/tree/v0.2.3/docs/api-types/issuer

For information on how to configure cert-manager to automatically provision
Certificates for Ingress resources, take a look at the `ingress-shim`
documentation:

https://github.com/jetstack/cert-manager/blob/v0.2.3/docs/user-guides/ingress-shim.md

Astute readers might notice some of age’s listed above are about a day old.. I actually deployed this one yesterday, and am fudging the output here a little :)

Lets wait for the cert-manager pod to become ready:

$ kubectl -n kube-system get pods
NAME                                             READY     STATUS    RESTARTS   AGE
cert-manager-fc9cc748b-dftq2                     2/2       Running   0          1m
etcd-dl165g7                                     1/1       Running   1          10d
kube-apiserver-dl165g7                           1/1       Running   1          10d
kube-controller-manager-dl165g7                  1/1       Running   1          10d
kube-dns-6f4fd4bdf-tzgsb                         3/3       Running   3          21h
kube-proxy-cg5g2                                 1/1       Running   1          10d
kube-router-zns2d                                1/1       Running   1          9d
kube-scheduler-dl165g7                           1/1       Running   1          10d
nfs-provisioner-0                                1/1       Running   0          30m
nginx-ingress-controller-d7fb49479-d99f7         1/1       Running   1          3d
nginx-ingress-default-backend-7544489c4b-kd2dk   1/1       Running   1          3d
tiller-deploy-6446fbc7f6-jzb9w                   1/1       Running   1          3d

Next, we’ll need to configure the service. To do this, we create either a ClusterIssuer, or Issuer resource. The former is a cluster-wide, the latter is specific to a given namespace. Both are otherwise identical. In my case, 99% of what I deploy is going use *.macinnes.ie hostnames, provided by Let’s Encrypt, so I’m going to create a cluster-wide ClusterIssuer thats used by all my deployments. I’m also using CloudFlare for my DNS, so I can enable the use of Let’s Encrypt’s DNS-01 based validation, this is IMO a better validation method that then more common HTTP-01 as it allows for wildcard TLS certificates to be obtained, and does not require that CloudFlare be able to reach the services it is issuing certs for. Perfect for services running in my cluster that aren’t public!

cert-manager supports HTTP (generic, but requires LE be able to make a HTTP request to your service), and 3 DNS providers: Google CloudDNS, Amazon Route53, and Cloudflare.

So - lets create a cert-manager-issuer.yaml file, with 2 ClusterIssuer’s - both using the Let’s Encrypt staging API, one for HTTP validations, and one for DNS validations and a Secret containing our CloudFlare API key:

---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-http
spec:
  acme:
    server: https://acme-staging.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsncrypt-http
    http01: {}
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns
spec:
  acme:
    server: https://acme-staging.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsncrypt-dns
    dns01:
      providers:
      - name: cloudflare-dns
        cloudflare:
          email: [email protected]
          apiKeySecretRef:
            name: cloudflare-dns-api-key
            key: api-key
---
apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-dns-api-key
  namespace: kube-system
type: Opaque
data:
  api-key: {BASE64-ENCODED-API-KEY-HERE}

Now we apply this to our cluster:

$ kubectl apply -f cert-manager-issuer.yaml 
clusterissuer "letsencrypt-http" created
clusterissuer "letsencrypt-dns" created
secret "cloudflare-dns-api-key" created

Testing the cert manager deployment:

We’ll need to create a Certificate definition, in test-certifcate.yaml:

---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: testing.macinnes.ie
spec:
  secretName: testing.macinnes.ie-tls
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt-dns
  commonName: testing.macinnes.ie
  acme:
    config:
      - dns01:
          provider: cloudflare-dns
        domains:
          - testing.macinnes.ie

Next, apply to the cluster:

$ kubectl apply -f test-certificate.yaml 
certificate "testing.macinnes.ie" created

And, come back a minute later to check up on it:

$ kubectl describe certificate testing.macinnes.ie
Name:         testing.macinnes.ie
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"certmanager.k8s.io/v1alpha1","kind":"Certificate","metadata":{"annotations":{},"name":"testing.macinnes.ie","namespace":"default"},"spec...
API Version:  certmanager.k8s.io/v1alpha1
Kind:         Certificate
Metadata:
  Cluster Name:        
  Creation Timestamp:  2018-02-25T11:00:14Z
  Generation:          0
  Resource Version:    1157627
  Self Link:           /apis/certmanager.k8s.io/v1alpha1/namespaces/default/certificates/testing.macinnes.ie
  UID:                 0ea37c25-1a1b-11e8-a782-d4856451b830
Spec:
  Acme:
    Config:
      Dns 01:
        Provider:  cloudflare-dns
      Domains:
        testing.macinnes.ie
  Common Name:  testing.macinnes.ie
  Dns Names:    <nil>
  Issuer Ref:
    Kind:       ClusterIssuer
    Name:       letsencrypt-dns
  Secret Name:  testing.macinnes.ie-tls
Status:
  Acme:
    Authorizations:
      Account:  https://acme-staging.api.letsencrypt.org/acme/reg/5646433
      Domain:   testing.macinnes.ie
      Uri:      https://acme-staging.api.letsencrypt.org/acme/challenge/LRb_bQJB3vWRa9OAQs54Unh76eWn4EJv4VIMe8G142s/104445269
  Conditions:
    Last Transition Time:  2018-02-25T11:01:23Z
    Message:               Certificate issued successfully
    Reason:                CertIssueSuccess
    Status:                True
    Type:                  Ready
Events:
  Type     Reason                 Age                From                     Message
  ----     ------                 ----               ----                     -------
  Warning  ErrorCheckCertificate  1m                 cert-manager-controller  Error checking existing TLS certificate: secret "testing.macinnes.ie-tls" not found
  Normal   PrepareCertificate     1m                 cert-manager-controller  Preparing certificate with issuer
  Normal   PresentChallenge       1m                 cert-manager-controller  Presenting dns-01 challenge for domain testing.macinnes.ie
  Normal   SelfCheck              1m                 cert-manager-controller  Performing self-check for domain testing.macinnes.ie
  Normal   ObtainAuthorization    33s                cert-manager-controller  Obtained authorization for domain testing.macinnes.ie
  Normal   IssueCertificate       33s                cert-manager-controller  Issuing certificate...
  Normal   CeritifcateIssued      31s                cert-manager-controller  Certificated issued successfully
  Normal   RenewalScheduled       31s (x3 over 31s)  cert-manager-controller  Certificate scheduled for renewal in 1438 hours

Here we can see the “Certificated issued successfully” event - perfect! The Warning with message “Error checking existing TLS certificate: secret “testing.macinnes.ie-tls” not found” is normal, it just means it tried to look for a pre-existing cert and couldn’t find one. Ignore it when you are issuing a certificate for the very first time.

Lets cleanup:

$ kubectl delete -f test-certificate.yaml 
certificate "testing.macinnes.ie" deleted

Finally, now that everything is working, we can switch from the Let’s Encrypt staging APIs to their production APIs. Lets update our cert-manager-issuer.yaml to use the production URLs:

---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-http
spec:
  acme:
    server: https://acme-v01.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsncrypt-http
    http01: {}
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns
spec:
  acme:
    server: https://acme-v01.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsncrypt-dns
    dns01:
      providers:
      - name: cloudflare-dns
        cloudflare:
          email: [email protected]
          apiKeySecretRef:
            name: cloudflare-dns-api-key
            key: api-key
---
apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-dns-api-key
  namespace: kube-system
type: Opaque
data:
  api-key: {BASE64-ENCODED-API-KEY-HERE}

And apply again:

$ kubectl apply -f cert-manager-issuer.yaml 
clusterissuer "letsencrypt-http" configured
clusterissuer "letsencrypt-dns" configured
secret "cloudflare-dns-api-key" unchanged

And we’re done!


In part 6, we’re going to deploy CoreOS’s Dex to provide for multi-user authentication for the cluster.