CrashLoopBackOff in image-builder on an self-hosted Kubernetes

Hello there!

I am trying to run gitpod on my self-hosted vanilla Kubernetes using the helm chart provided.

We run a fairly large instance of Gitlab (with a proper SSL/TLS cert), so we would like to plug gitlab into our Gitlab Docker registry. This is how the registry.yaml

gitpod:
  components:
    imageBuilder:
      registryCerts: []
      registry:
        # name must not end with a "/"
        name: foo.bar.int:4567/moo/gitpod
        secretName: image-builder-registry-secret
        path: secrets/registry-auth.json

    workspace:
      pullSecret:
        secretName: image-builder-registry-secret

  docker-registry:
    enabled: false

gitpod_selfhosted:
  variants:
    customRegistry: true

The secrets/registry-auth.json as you would expect:

{
“auths”: {
“foo.bar.int:4567”: {
“auth”: “DELETED”
}
},
“HttpHeaders”: {
“User-Agent”: “Docker-Client/19.03.6 (linux)”
}
}

Despite all our efforts, the image-builder keeps CrashLoopBackOff’ing.

On the logs of the service container of the image-builder pod we can see this:

{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","error":"Get https://registry-1.docker.io/v2/: x509: certificate is valid for foo.bar.int, www.foo.bar.int, not registry-1.docker.io","message":"self-build failed","serviceContext":{"service":"image-builder","version":""},"severity":"error","time":"2020-09-29T22:15:35Z"}
        {"message":"Get https://registry-1.docker.io/v2/: x509: certificate is valid for foo.bar.int, www.foo.bar.int, not registry-1.docker.io","serviceContext":{"service":"image-builder","version":""},"severity":"fatal","time":"2020-09-29T22:15:35Z"}

As you can see, registry-1.docker.io is somehow resolving to the IP address of our registry, but it is not changing the hostname somewhere.

Am I missing something?

Just to make sure: Could you please provide how the image-builder-config configmap looks like?

$ kubectl get configmap image-builder-config -o jsonpath="{.data}"

Hi there!

map[image-builder.json:{
    "builder": {"dockerCfgFile": "/config/pull-secret.json","gitpodLayerLoc": "/app/workspace-image-layer.tar.gz",
        "baseImageRepository": "foo.bar.int:4567/moo/gitpod/base-images",
        "workspaceImageRepository": "foo.bar.int:4567/moo/gitpod/workspace-images",
        "imageBuildSalt": ""
    },
    "refCache": {
        "interval": "6h",
        "refs": ["gitpod/workspace-full:latest"]
    },
    "pprof": {
        "address": ":6060"
    },
    "prometheus": {
        "address": ":9500"
    },
    "service": {
        "address": ":8080"
    }
}]

Thanks!

Thanks. That looks good.

There seems to be a problem during the pull of the images from the official Docker registry (docker.io). We pull an alpine image as well as the workspace image from there.

Could you please check if

$ kubectl exec -it -c dind image-builder-<PODNAME> -- sh -c "DOCKER_HOST=tcp://localhost:2375 docker pull alpine:3.9"

works (replace <PODNAME>)?

Error response from daemon: Get https://registry-1.docker.io/v2/: x509: certificate is valid for foo.bar.int, www.foo.bar.int, not registry-1.docker.io
command terminated with exit code 1

I am really lost at this point.

OMG. Damn Alpine. Damn musl.

So.

  • So, we have our Gitlab server in foo.bar.int
  • We host our Gitlab Pages under *.io.bar.int
  • *.io.bar.int is an A record that points to the same IP address of the Gitlab server.
  • In addition to that, we have a search bar.int in our DNS.

Literally all the *.io are broken in Alpine for us.

/ # nslookup literally.anything.io
nslookup: can’t resolve ‘(null)’: Name does not resolve

Name: literally.anything.io
Address 1: 10.x.y.z foo.bar.int

Any idea?

I don’t think we have DNS problems here.

The problem is, that the image builder container tries to pull an image from the official docker registry but the cert does not fit. When I google

Get https://registry-1.docker.io/v2/: x509: certificate is valid for

I get some results (like this) that say that the reason could be a proxy server. Is there a proxy/firewall in your network?

I have removed search bar.int from the resolv.conf of my kubernetes nodes and suddenly image-builder started working…

So random…

1 Like