Node didn't have free ports and ws-manager has unattached volumes

I’m trying to install GitPod 0.6.0beta and the installation with helm has worked so far, only now some pods are in pending status. When I run kubectl describe -n gitpod <node-name>
it says at every pending pod:

1 node(s) didn't have free ports for the requested pod ports, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.

I reach the page in the meantime, but only get one with a black background and after a short time this message:

trouble erstes lebenszeichen

Pods and Services:

alles pending wegen port


Additionally ws-sync doesn’t Creating.

It gets alle the time errors like:

  Normal   Scheduled    37m                    default-scheduler  Successfully assigned gitpod/ws-sync-r8sh4 to worker-node-1
  Warning  FailedMount  31m                    kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[tls-certs ws-sync-token-cdn2g working-area config containerd-socket node-fs0 node-mounts]: timed out waiting for the condition
  Warning  FailedMount  26m                    kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[config containerd-socket node-fs0 node-mounts tls-certs ws-sync-token-cdn2g working-area]: timed out waiting for the condition
  Warning  FailedMount  24m                    kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[node-mounts tls-certs ws-sync-token-cdn2g working-area config containerd-socket node-fs0]: timed out waiting for the condition
  Warning  FailedMount  22m (x3 over 33m)      kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[ws-sync-token-cdn2g working-area config containerd-socket node-fs0 node-mounts tls-certs]: timed out waiting for the condition
  Warning  FailedMount  20m                    kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[containerd-socket node-fs0 node-mounts tls-certs ws-sync-token-cdn2g working-area config]: timed out waiting for the condition
  Warning  FailedMount  17m (x2 over 35m)      kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[working-area config containerd-socket node-fs0 node-mounts tls-certs ws-sync-token-cdn2g]: timed out waiting for the condition
  Warning  FailedMount  17m (x18 over 37m)     kubelet            MountVolume.SetUp failed for volume "containerd-socket" : hostPath type check failed: /run/containerd/containerd.sock is not a socket file
  Warning  FailedMount  16m                    kubelet            MountVolume.SetUp failed for volume "config" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount  14m                    kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[config containerd-socket node-fs0 node-mounts tls-certs ws-sync-token-cdn2g working-area]: timed out waiting for the condition
  Warning  FailedMount  9m29s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[node-fs0 node-mounts tls-certs ws-sync-token-cdn2g working-area config containerd-socket]: timed out waiting for the condition
  Warning  FailedMount  4m57s (x2 over 7m14s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[working-area config containerd-socket node-fs0 node-mounts tls-certs ws-sync-token-cdn2g]: timed out waiting for the condition
  Warning  FailedMount  2m41s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[containerd-socket node-fs0 node-mounts tls-certs ws-sync-token-cdn2g working-area config]: timed out waiting for the condition
  Warning  FailedMount  108s (x15 over 16m)    kubelet            MountVolume.SetUp failed for volume "containerd-socket" : hostPath type check failed: /run/containerd/containerd.sock is not a socket file
  Warning  FailedMount  25s (x2 over 11m)      kubelet            Unable to attach or mount volumes: unmounted volumes=[containerd-socket], unattached volumes=[node-mounts tls-certs ws-sync-token-cdn2g working-area config containerd-socket node-fs0]: timed out waiting for the condition

I got the port problem fixed by going back to version 0.5.0.

However, the problem with the FailedMount remains and now I don’t even have a sign of life from GitPod when I try to access the page.

1 Like

Hey supersebe, sorry for the silence. The team has been working on getting a new version of self-hosted out, but we didn’t manage to do that before the holidays so it will slip into January.

1 Like

No problem. :slight_smile:

I got the error fix with using

dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true

and

kubectl taint nodes --all node-role.kubernetes.io/master-

but now ive no free ports again. I think, that some pods use the same port (I think it was port 9500).
And if im looking at my used ports on the worker-node, i see that the port is used by a pod from gitpod.
I reported it here: https://github.com/gitpod-io/gitpod/issues/2622

Ive this issue to with server and ws-manager-bridge:

When i look, which process is using the port, it says ws-manager.

The pods:

Unbenannt

With gitpod0.6.0beta1 i cant even pull all images.

OK, it’s defintly port 9500 and i can’t even do anything about that. I tried to edit it everywhere where i could found this port, but ws-manager-node, ws-sync and some more are all trying to using port 9500.

1 Like

Ok, it was my mistake. hostNetwork does not need to be set and dnsPolicy can remain unchanged.

The ingress was configured wrong.

Now I can see the page but I have this error message when I try to create a workspace:

build failed: cannot push workspace image: dial tcp: lookup *.sub.domain.com: no such host.
(domainname changed)

There is problem with the DNS resolution. It’s pretty hard to debug. Can you please provide as much information as you can about your setup (e.g. how did you set-up Kubernetes, how did you set-up the registry, how many nodes do you have, on which nodes are the pods etc.).

The problem is, that i don’t have access to nginx (i just tell someone to configure it), so i can’t see the full configuration.

I have a vanilla Kubernetes Cluster in a private Network with 3 Worker-Nodes (+1 Master, so 4).

The DNS is configured for sub-domain + sub-sub-domains (is this the right word for those? :smiley: )

The problem why I didn’t see the site was, that the domains were port-forwarded to 80, but the service was and still is on 30100 (or something like that). Changed that and now I see the site.
Would an ingress controller have helped?

But the error with “cannot push workspace image”.
remains until i create an internal hostname that points to the ip of the pc with the registry and i enter this hostname in the image builder under /etc/hosts while the gitpod containers are running.

But then I get problems with HTTP server and HTTPS client, which I tried to fix with self signed certificates, which didn’t work either and now I tried it with my own publicly accessible registry, but with that I get the following errors:

ws-sync:

{"error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 10.105.65.24:22999: connect: connection refused\"","instanceId":"94139256-27ff-46b0-a57b-e9705ee6c444","message":"backup canary unavailable - maybe because of workspace shutdown","serviceContext":{"service":"ws-sync","version":""},"severity":"debug","time":"2021-01-12T11:13:58Z","userId":"c933ee03-8988-4e05-839c-922a5a504f2b","workspaceId":"c173927a-a562-4ace-8de6-3e5d4c7f3277"}

ws-scheduler:

reflector.go:98: Failed to list *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: connect: connection refused

image-builder dind:

time="2021-01-10T18:24:41.976708509Z" level=warning msg="Running modprobe nf_nat failed with message: `ip: can't find device 'nf_nat'\nnf_nat                 40960  4 ip6table_nat,xt_nat,xt_MASQUERADE,iptable_nat\nnf_conntrack          139264  6 xt_nat,ip_vs,xt_conntrack,xt_MASQUERADE,nf_conntrack_netlink,nf_nat\nlibcrc32c              16384  6 xfs,ip_vs,nf_nat,nf_conntrack,btrfs,raid456\nmodprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1"
time="2021-01-10T18:24:41.980651125Z" level=warning msg="Running modprobe xt_conntrack failed with message: `ip: can't find device 'xt_conntrack'\nxt_conntrack           16384  9 \nnf_conntrack          139264  6 xt_nat,ip_vs,xt_conntrack,xt_MASQUERADE,nf_conntrack_netlink,nf_nat\nx_tables               40960 14 xt_multiport,ip6_tables,ipt_REJECT,xt_statistic,xt_nat,xt_tcpudp,iptable_mangle,xt_comment,xt_mark,xt_conntrack,xt_MASQUERADE,xt_addrtype,iptable_filter,ip_tables\nmodprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1"
time="2021-01-10T18:24:43.736199643Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2021-01-10T18:24:44.572483946Z" level=info msg="Loading containers: done."
time="2021-01-10T18:24:46.518066213Z" level=info msg="Docker daemon" commit=d7080c1 graphdriver(s)=overlay2 version=18.06.3-ce
time="2021-01-10T18:24:46.537501896Z" level=info msg="Daemon has completed initialization"
time="2021-01-10T18:24:47.031181972Z" level=warning msg="Could not register builder git source: failed to find git binary: exec: \"git\": executable file not found in $PATH"
time="2021-01-10T18:24:47.038510765Z" level=info msg="API listen on 127.0.0.1:2375"
time="2021-01-10T18:24:55Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/0e53b1c70df72ea486aa49882643ee8479c2d8fd4f1a699c53704fd096db3643/shim.sock" debug=false pid=216
time="2021-01-10T18:24:58Z" level=info msg="shim reaped" id=0e53b1c70df72ea486aa49882643ee8479c2d8fd4f1a699c53704fd096db3643

And i think this is the real problem:

ws-manager service:

{"a":{"All":false,"Explicit":["<changedFromMeToHideIt>"]},"message":"registry not allowed","ref":{},"reg":"docker.io","serviceContext":{"service":"image-builder","version":""},"severity":"debug","time":"2021-01-12T11:13:49Z"}

<changedFromMeToHideIt> = my registry url

Ive only 1 error message from ws-sync or the daemon:

cannot pull image: rpc error: code = Unknown desc = Error response from daemon: Get https://internregistry.com/v2/: dial tcp: lookup internregistry.com on 8.8.8.8:53: server misbehaving.

I changed the intern /etc/hosts from image-builder because im using an intern registry.

Do i need to add the hostname to any other deployment or something?

There was a closed bug for this in 0.5.0 but I don’t think it was ever resolved. I just ran into this error on 0.6.0 as well.
cannot initialize workspace: cannot connect to ws-sync · Issue #2029 · gitpod-io/gitpod (github.com)

If i do that, I don’t can even open the website because a lot of pods want port 9500.

Okay that makes sense. What we have learned is Gitpod needs its own dedicated K8s cluster for each deployment. There are just too many hooks into the K8s nodes to run this alongside any other application, including multiple Gitpod deployments.

I don’t know why it use 8.8.8.8. I have configured my own hostname for this url.

I can add it in image-builder and it works but i dont know, how i need to add this hostname too.

I don’t have a ws-sync deployment, where i can add the hostname.

It can build the image and push it into the registry but can’t pull the made image from the registry.

I just have Gitpod. But a lot of Pods from Gitpod want to use Port 9500.

Hi @supersebe,

I just have Gitpod. But a lot of Pods from Gitpod want to use Port 9500.

These pods are used to export metrics. You can change them in the Helm templates.

I did that. I changed every 9500 that i could find, but it still want to use port 9500.