Clean Install fails to start workspace

I recently did a clean install on to self hosted fedora coreos VMs.
I am using an external docker registry which seems to be working just fine alone with image builder.

When i try to start a workspace, it builds and uploads the image to the registry just fine. However I get an error that states unable to mount Theia after it successfully assigns a node. I have checked through the logs of all the containers and there do not seem to be any errors at all before the unable to mount Theia error shows up.

The only error that shows up consistently is an error that happens every few seconds stating:
Unable to fetch sampling strategy, connection refused 0.0.0.0:5778

I would really appreciate any help!

Hi methazubin,

It would seem that Theia doesn’t get copied to the node properly. Doing that is the task of a DaemonSet called node-daemon which copies Theia to the node using an init container. Do you see at least one node-daemon pod in your installation?
Also, could you please post the output of kubectl describe daemonset node-daemon? That should help narrow things down.

I have two node-daemon pods that come up. Heres the describe output:

Name:           node-daemon
Selector:       app=gitpod-selfhosted,component=node-daemon,kind=daemonset,stage=production,subcomponent=node-daemon
Node-Selector:  <none>
Labels:         app=gitpod-selfhosted
                component=node-daemon
                io.cattle.field/appId=gitpod-selfhosted
                kind=daemonset
                stage=production
                subcomponent=node-daemon
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 2
Current Number of Nodes Scheduled: 2
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status:  2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=gitpod-selfhosted
                    component=node-daemon
                    kind=daemonset
                    stage=production
                    subcomponent=node-daemon
  Service Account:  node-daemon
  Init Containers:
   theia:
    Image:      gcr.io/gitpod-io/theia-server:v0.4.0
    Port:       <none>
    Host Port:  <none>
    Args:
      --copy-to
      /mnt/theia-storage/theia/theia-v0.4.0
      -d
      /theia
    Limits:
      memory:  250Mi
    Requests:
      cpu:        5m
      memory:     250Mi
    Environment:  <none>
    Mounts:
      /mnt/theia-storage from theia-storage (rw)
   node-init:
    Image:      alpine:3.7
    Port:       <none>
    Host Port:  <none>
    Command:
      sh
      -c
      apk add findutils
      trap end 15; end() {
        echo "[node] Received SIGTERM, exiting with 0";
        exit 0;
      }; echo "[node] Start";
      (
        echo "[node] Patching node..." &&
        sysctl -w net.core.somaxconn=4096 &&
        sysctl -w "net.ipv4.ip_local_port_range=5000 65000" &&
        sysctl -w "net.ipv4.tcp_tw_reuse=1" &&
        sysctl -w fs.inotify.max_user_watches=1000000 &&
        sysctl -w "kernel.dmesg_restrict=1"
      ) && echo "[node] done!" || echo "[node] failed!" &&
      echo "[node] Initialized."
      
    Limits:
      memory:  50Mi
    Requests:
      cpu:        5m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /mnt/theia-storage from theia-storage (rw)
  Containers:
   node:
    Image:      gcr.io/gitpod-io/node-daemon:v0.4.0
    Port:       <none>
    Host Port:  <none>
    Limits:
      memory:  50Mi
    Requests:
      cpu:     5m
      memory:  50Mi
    Environment:
      KUBE_STAGE:                     production
      KUBE_NAMESPACE:                  (v1:metadata.namespace)
      VERSION:                        v0.4.0
      HOST_URL:                       <>
      GITPOD_REGION:                  local
      GITPOD_INSTALLATION_LONGNAME:   production.gitpod.local.00
      GITPOD_INSTALLATION_SHORTNAME:  local-00
      EXECUTING_NODE_NAME:             (v1:spec.nodeName)
    Mounts:                           <none>
  Volumes:
   theia-storage:
    Type:          HostPath (bare host directory volume)
    Path:          /var/gitpod
    HostPathType:  DirectoryOrCreate
   node-exporter:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/node_exporter/textfile_collector
    HostPathType:  DirectoryOrCreate
Events:
  Type    Reason            Age        From                  Message
  ----    ------            ----       ----                  -------
  Normal  SuccessfulCreate  <invalid>  daemonset-controller  Created pod: node-daemon-gqr4g
  Normal  SuccessfulCreate  <invalid>  daemonset-controller  Created pod: node-daemon-288rk

So the problem is happening with the pod thats created for the container. For some off reason, for these volume mounts:

  volumes:
  - hostPath:
      path: /var/gitpod/theia/theia-v0.4.0
      type: Directory
    name: vol-this-theia
  - hostPath:
      path: /var/gitpod/workspaces/d31b4897-5b6c-4395-a397-fb0ff1ccab83
      type: DirectoryOrCreate
    name: vol-this-workspace

fails with a message saying that mount volume failed since /var/gitpod/theia/theia-v0.4.0 is not a directory.

I double checked by logging into the host and running the stat command that it is indeed reported as a directory.

Now whats even weirder is if i changed the the type to DirectoryOrCreate, the pod boots up like normal with all the theia contents in it. With that change i was able to get to initializing workspace step. I’m not entirely sure why the directory isn’t detected by the workspace pod as a directory even though the host identifies it as one. Any ideas?

Thanks!!