docssetupcluster shell

Cluster shell

Periscope ships an in-browser cluster-wide kubectl REPL — the Shell button in the cluster page header opens a WebSocket-attached terminal backed by a per-session ephemeral pod that runs on the target cluster. Every ac…

Periscope ships an in-browser cluster-wide kubectl REPL — the Shell button in the cluster page header opens a WebSocket-attached terminal backed by a per-session ephemeral pod that runs on the target cluster. Every action the operator takes inside the shell impersonates the human's identity and lands as their tier in apiserver RBAC.

This page is the operator guide: when it's available, how to enable it, the RBAC scaffolding the chart installs, how the per-session pod is constructed, the audit shape, and what to do when things misbehave. The design lands the GitHub epic at issue #104.


What you see

placeholder: Shell button in the cluster header next to "+ apply yaml"
placeholder: Shell button in the cluster header next to "+ apply yaml"

Click shell on any cluster page. Periscope provisions an ephemeral pod in periscope-system on the target cluster, mounts a per-session Secret carrying a kubeconfig wired with your impersonation identity, and attaches the same drawer the pod-exec terminal uses. The tab is labeled shell (vs <pod>/<container> for pod-exec); the info expander shows kind: cluster shell, mode: bash, and (cluster-scoped) instead of namespace/pod/container.

The shell streams stdin / stdout / stderr through the same hello/stdin/stdout/closed/error WebSocket frame protocol that pod-exec uses, so reconnect / idle / heartbeat / cap behavior all match.

placeholder: live cluster-shell session inside the drawer
placeholder: live cluster-shell session inside the drawer

1. Default behavior

Cluster shell is opt-in and off by default — opposite stance from pod-exec. The feature requires auth.authorization.mode=tier (it doesn't make sense without tier-narrow impersonation), and chart install fails fast with a clear recipe message if you enable cluster-shell while leaving authz in raw or shared mode.

Backend support matrix

BackendCluster shell works?Notes
eks (Pod Identity / IRSA)YesDirect apiserver dial; pod created on the EKS cluster itself
kubeconfigYesSame as eks
in-clusterYesPod lands in Periscope's own cluster, on the periscope-system namespace the chart provisioned
agentYesPod-create / Secret-create / exec stream route through the agent tunnel to the target cluster; transparent to the operator. See Operator notes for agent-backed clusters below.

Operator notes for agent-backed clusters

On the agent-managed cluster, the agent's ServiceAccount becomes the apiserver caller for the pod-create / secret-create / sa-token mint requests Periscope main issues through the tunnel. The periscope-agent chart's clusterShell.enabled=true block installs:

  • A namespace-scoped Role granting pods / secrets / serviceaccounts/token write verbs on periscope-system, bound to the agent SA via a RoleBinding.
  • The same per-tier ServiceAccount + impersonator ClusterRole primitives the main chart installs (so the kubeconfig delivered into the shell pod looks the same regardless of which side provisioned the pod).

Server-side clusterShell.tiers and agent-side clusterShell.tiers must match. A tier listed on the server but missing on the agent would 403 when a session tries to mint its SA token.

CA discovery for kubeconfig generation

The kubeconfig the shell pod uses needs the target apiserver's CA bundle. Periscope main reads this differently per backend:

BackendSource
in-cluster/var/run/secrets/kubernetes.io/serviceaccount/ca.crt on the Periscope main pod
eks / kubeconfigThe cluster's existing kubeconfig entry
agentkube-public/cluster-info ConfigMap on the target cluster, fetched through the tunnel and cached per cluster

The agent-backend path mirrors the kubeadm-style discovery contract: the cluster-info ConfigMap is readable unauthenticated and carries the apiserver CA. No additional config is required on the agent chart for CA delivery.

Per-cluster opt-out

Cluster shell is currently a server-wide toggle — when clusterShell.enabled=true on the central server, it's available on every registered cluster whose target chart also has clusterShell.enabled=true. A future release will expose a per-cluster override in clusters[i].clusterShell; until then, omit the agent chart's clusterShell.enabled=true on any managed cluster you want to keep out of scope.

When clusterShell.enabled: false on the server:

  • The SPA hides the shell action everywhere (the per-cluster meta field clusterShellEnabled returns false).
  • A direct WebSocket request to /api/clusters/{c}/shell returns HTTP 403 with body {"error": "E_CLUSTER_SHELL_DISABLED"}.

Required RBAC

Two layers:

  1. The operator must belong to a tier listed in clusterShell.tiers. The handler rejects non-listed tiers with E_FORBIDDEN. Default tiers: [admin].
  2. Periscope main's SA (or the agent SA on tunnel-managed clusters) needs pod / secret / serviceaccounts-token write verbs on clusterShell.namespace. Both chart's clusterShell blocks install this Role + RoleBinding automatically when clusterShell.enabled=true.

The per-tier impersonator ClusterRoles are tier-narrow — the group impersonation rule uses resourceNames: ["periscope-tier:<tier>"] so a stolen SA token cannot impersonate across tiers. See Security posture below for the threat model.


2. Enable it on the central server

# values.yaml for the periscope chart on the central server
auth:
  authorization:
    mode: tier             # REQUIRED — chart install fails if mode != tier
    groupTiers:
      platform-admins: admin

clusterRBAC:
  adminTier:
    enabled: true          # The admin tier ClusterRoleBinding to cluster-admin

clusterShell:
  enabled: true
  mode: bash               # bash (default). kubectl-only ships in a follow-up release.
  tiers: [admin]           # Only these tiers may open a session
  namespace: periscope-system
  idleSeconds: 1200        # 20 min — longer than pod-exec's 10 min default
  podStartTimeoutSeconds: 30
  transcriptMaxBytes: 1048576
  maxSessionsPerUser: 2
  maxSessionsTotal: 10
  image:
    repository: ghcr.io/gnana997/periscope-shell
    tag: ""                # defaults to Chart.AppVersion
    pullPolicy: IfNotPresent

Each value renders to a PERISCOPE_CLUSTER_SHELL_* environment variable on the Periscope pod. Cross-reference:

Helm valueEnv varCode default
clusterShell.enabledPERISCOPE_CLUSTER_SHELL_ENABLEDfalse
clusterShell.modePERISCOPE_CLUSTER_SHELL_MODEbash
clusterShell.tiersPERISCOPE_CLUSTER_SHELL_TIERSadmin
clusterShell.namespacePERISCOPE_CLUSTER_SHELL_NAMESPACEperiscope-system
clusterShell.idleSecondsPERISCOPE_CLUSTER_SHELL_IDLE_SECONDS1200 (20 min)
clusterShell.podStartTimeoutSecondsPERISCOPE_CLUSTER_SHELL_POD_START_TIMEOUT_SECONDS30
clusterShell.transcriptMaxBytesPERISCOPE_CLUSTER_SHELL_TRANSCRIPT_MAX_BYTES1048576 (1 MiB)
clusterShell.maxSessionsPerUserPERISCOPE_CLUSTER_SHELL_MAX_SESSIONS_PER_USER2
clusterShell.maxSessionsTotalPERISCOPE_CLUSTER_SHELL_MAX_SESSIONS_TOTAL10
clusterShell.image.repository + tagPERISCOPE_CLUSTER_SHELL_IMAGEghcr.io/gnana997/periscope-shell:<AppVersion>
clusterShell.image.pullPolicyPERISCOPE_CLUSTER_SHELL_IMAGE_PULL_POLICYIfNotPresent

3. Enable it on each managed cluster

For every agent-backed cluster that should accept cluster-shell sessions, set the matching values in the periscope-agent chart:

# values.yaml for the periscope-agent chart on each managed cluster
clusterShell:
  enabled: true
  namespace: periscope-system  # MUST match the server's clusterShell.namespace
  tiers: [admin]               # MUST match the server's clusterShell.tiers

For in-cluster backend (Periscope main targeting the cluster it runs in), no separate agent chart install is needed — the main chart already provisions everything because clusterShell.enabled=true triggers the namespace + per-tier RBAC + provisioner Role rendering in the main chart too.

For eks / kubeconfig backends, you must install the RBAC manifests on the target cluster manually. The shipped templates in deploy/helm/periscope-agent/templates/cluster-shell-* are a reference; apply equivalent manifests via your existing GitOps flow.


4. The per-session pod

Each session creates one ephemeral pod in clusterShell.namespace (default periscope-system) with these properties:

  • Image: clusterShell.image.repository:tagdebian:bookworm-slim runtime carrying bash, kubectl, helm, nano, jq, coreutils, curl, and ca-certificates. The binaries are SHA-pinned in the Dockerfile.
  • ServiceAccount: periscope-shell-<tier> (e.g. periscope-shell-admin) — owns ONLY the per-tier impersonate rules. Token-stealing nets nothing beyond what the operator could already do.
  • Mounted Secret: periscope-shell-<session-id> carries a kubeconfig whose users[0] entry has the SA bearer token AND the as: <operator-sub>, as-groups: [periscope-tier:<tier>], and as-user-extra: { audit.periscope.io/session-id: [<uuid>], audit.periscope.io/actor: [<sub>] } impersonation fields baked in.
  • Entrypoint: cmd/periscope-shell/main.go reads PERISCOPE_SHELL_SESSION_ID / PERISCOPE_SHELL_MODE / PERISCOPE_SHELL_AUDIT_FILE, then syscall.Execs into /bin/bash --login with KUBECONFIG=/etc/periscope/kubeconfig.
  • Audit wrapper: /usr/local/bin/kubectl AND /usr/local/bin/helm are both symlinks to periscope-audit-exec, a tiny Go wrapper that keys off its own argv[0] to figure out which real binary to invoke (kubectl-real / helm-real under /opt/periscope/bin/). For every call it appends a {ts, pid, argv} JSON line to the in-pod audit file before syscall.Exec-ing the real binary. Best-effort — audit write failure does NOT block the command. Adding a new wrapped tool is a one-line allow-list entry in the wrapper plus a matching symlink in Dockerfile.shell.
  • KUBE_EDITOR=nano pinned in the image so kubectl edit (and other editor-using subcommands) work without operators having to set the variable themselves. The image only ships nano; vi/vim are not installed.

The pod is deleted on session close (clean exit / Ctrl-D / WS close / idle-timeout). Pod + Secret cleanup is idempotent and runs even on error paths.


5. Concurrency caps

Two caps gate session creation; both return HTTP 429 with {"error": "E_CAP_USER"} / {"error": "E_CAP_CLUSTER"} and an activeSessions body field:

CapDefaultHelm value
Per OIDC subject, all clusters2clusterShell.maxSessionsPerUser
Per cluster, all subjects10clusterShell.maxSessionsTotal

Caps are deliberately tighter than pod-exec's (5 / 50) — each cluster-shell session burns a full pod + Secret on the target cluster, vs pod-exec's zero-side-effect attach.


6. Lifecycle: idle, heartbeat, warn, close

The handler reuses internal/exec.Run for WebSocket lifecycle plumbing — so heartbeat (20s default) and idle-warn (30s lead) behavior is identical to pod-exec. The only difference is the idle-cut timeout itself:

Knobpod-exec defaultcluster-shell default
Idle before cut10 min (exec.serverIdleSeconds)20 min (clusterShell.idleSeconds)
Idle-warn lead30s30s
Heartbeat20s20s

Activity = any stdin or stdout byte. The longer cluster-shell default reflects the typical session — kubectl get / helm list loops, reading describe output — vs the tighter pod-exec pattern.


7. Audit

Three verbs land in the audit pipeline; the cross-reference key that joins the SPA-side audit row to the apiserver's own audit log is audit.periscope.io/session-id:

VerbWhen emittedBody fields
cluster_shell_openAfter cap checks pass, before WS upgradecluster, mode, tier, session_id
cluster_shell_command(reserved — currently bulk-on-close in body of close)
cluster_shell_closeAfter session ends, regardless of causecluster, mode, duration_ms, exit_code, bytes_in, bytes_out, close_reason, commands: [{timestamp, argv, pid}]

The commands slice on cluster_shell_close is read from the in-pod audit file (PERISCOPE_SHELL_AUDIT_FILE) via a final exec stream during teardown. It captures every kubectl and helm invocation made through the session (both wrapped by the periscope-audit-exec binary). Other commands (cat, jq, grep, bash builtins) don't write per-invocation rows here — they still contribute to the bytes_in / bytes_out counters, and any K8s API calls they trigger show up in the apiserver audit log keyed by the session UUID.

Best-effort: a pod that died before the readback completes loses its command log, but the open / close envelopes are durable.


8. Security posture

Three properties limit blast radius:

  1. Tier-narrow impersonator ClusterRoles. The per-tier periscope-shell-impersonator-<tier> rule uses resourceNames: ["periscope-tier:<tier>"] on the groups impersonate rule. An admin-tier SA token cannot escalate to a different tier's group; users impersonation stays wildcard (the user identity is the operator's OIDC sub anyway).
  2. Audit-extras impersonation. Every kubectl call from the shell carries audit.periscope.io/session-id + actor as user-extras. The apiserver audit log records these, and Periscope's own cluster_shell_close envelope carries the same session id — joining the two logs is one grep of the UUID.
  3. No shared kubeconfig. The Secret is per-session and deleted on close. There is no persistent SA token left on the cluster after teardown; the per-tier SA carries only the impersonate rules and no cluster read.

The full RBAC posture writeup is in docs/security/rbac-posture.md.


9. Troubleshooting

Cross-cutting issues (chart-versions OOM, scanner false-positives, local-dev TLS, image-pull behind a corporate proxy) live in troubleshooting.md.

SymptomLikely causeFix
Shell button missing on the cluster headerBackend says clusterShellEnabled=false on this clusterCheck /api/clusters response; confirm clusterShell.enabled=true on the server helm release
403 E_FORBIDDEN on clickOperator's tier not in clusterShell.tiersEither add the operator to a listed tier or add their tier to the list
429 E_CAP_USER / E_CAP_CLUSTERCap reachedClose an old session, or bump maxSessionsPerUser / maxSessionsTotal
Pod stays Pending for >30sImage not pulled / scheduling issuekubectl -n periscope-system describe pod periscope-shell-* for events. Common: image pull from ghcr.io/gnana997/periscope-shell rejected by an air-gapped cluster — mirror to your registry and set clusterShell.image.repository accordingly
Pod runs but kubectl auth whoami shows nothingOperator's group claim missingVerify auth.authorization.groupsClaim matches your IdP, then re-log-in for the new claim
kubectl get returns 403Tier's ClusterRoleBinding not installed on target clusterSet clusterRBAC.enabled=true (and adminTier.enabled=true if your tier maps to admin) on the periscope-agent chart
Agent-backed cluster: Forbidden on pod-createProvisioner Role/RoleBinding missing on managed clusterSet clusterShell.enabled=true on the periscope-agent chart so the chart installs the namespace-scoped Role binding the agent SA
Session disconnects after ~20 min of typing pauseHit idleSeconds cutExpected — re-open the shell, or bump clusterShell.idleSeconds for long-running incidents

For deeper agent-tunnel diagnostics — request IDs, pod-create failures observed on the agent side — see docs/architecture/agent-tunnel.md.