Troubleshooting

Troubleshooting guides and steps.

Something went wrong with my GameServer

If there is something going wrong with your GameServer, there are a few approaches to determining the cause:

Run with the local SDK server

A good first step for seeing what may be going wrong is replicating the issue locally. To do this you can take advantage of the Agones local SDK server , with the following troubleshooting steps:

  1. Run your game server as a local binary against the local SDK server
  2. Run your game server container against the local SDK server. It’s worth noting that running with docker run --network=host ... can be an easy way to allow your game server container(s) access to the local SDK server)

At each stage, keep an eye on the logs of your game server binary, and the local SDK server, and ensure there are no system errors.

Run as a GameServer rather than a Fleet

A Fleet will automatically replace any unhealthy GameServer under its control - which can make it hard to catch all the details to determine the cause.

To work around this, instantiate a single instance of your game server as a
GameServer within your Agones cluster.

This GameServer will not be replaced if it moves to an Unhealthy state, giving you time to introspect what is going wrong.

Introspect with Kubernetes tooling

There are many Kubernetes tools that will help with determining where things have potentially gone wrong for your game server. Here are a few you may want to try.

kubectl describe

Depending on what is happening, you may want to run kubectl describe <gameserver name> to view the events that are associated with that particular GameServer resource. This can give you insight into the lifecycle of the GameServer and if anything has gone wrong.

For example, here we can see where the simple-game-server example has been moved to the Unhealthy state due to a crash in the backing GameServer Pod container’s binary.

kubectl describe gs simple-game-server-zqppv
$$Name: simple-game-server-zqppv Namespace: default Labels: Annotations: agones.dev/sdk-version: 1.0.0-dce1546 API Version: agones.dev/v1 Kind: GameServer Metadata: Creation Timestamp: 2019-08-16T21:25:44Z Finalizers: agones.dev Generate Name: simple-game-server- Generation: 1 Resource Version: 1378575 Self Link: /apis/agones.dev/v1/namespaces/default/gameservers/simple-game-server-zqppv UID: 6818adc7-c06c-11e9-8dbd-42010a8a0109 Spec: Container: simple-game-server Health: Failure Threshold: 3 Initial Delay Seconds: 5 Period Seconds: 5 Ports: Container Port: 7654 Host Port: 7058 Name: default Port Policy: Dynamic Protocol: UDP Scheduling: Packed Template: Metadata: Creation Timestamp: Spec: Containers: Image: us-docker.pkg.dev/agones-images/examples/simple-game-server:0.38 Name: simple-game-server Resources: Limits: Cpu: 20m Memory: 32Mi Requests: Cpu: 20m Memory: 32Mi Status: Address: 35.230.59.117 Node Name: gke-test-cluster-default-590db5e4-4s6r Ports: Name: default Port: 7058 Reserved Until: State: Unhealthy Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal PortAllocation 72s gameserver-controller Port allocated Normal Creating 72s gameserver-controller Pod simple-game-server-zqppv created Normal Scheduled 72s gameserver-controller Address and port populated Normal RequestReady 67s gameserver-sidecar SDK state change Normal Ready 66s gameserver-controller SDK.Ready() complete Warning Unhealthy 34s health-controller Issue with Gameserver pod$$

The backing Pod has the same name as the GameServer - so it’s also worth looking at the details and events for the Pod to see if there are any issues there, such as restarts due to binary crashes etc.

For example, you can see the restart count on the us-docker.pkg.dev/agones-images/examples/simple-game-server:0.38 container is set to 1, due to the game server binary crash

kubectl describe pod simple-game-server-zqppv
$$Name: simple-game-server-zqppv Namespace: default Priority: 0 PriorityClassName: Node: gke-test-cluster-default-590db5e4-4s6r/10.138.0.23 Start Time: Fri, 16 Aug 2019 21:25:44 +0000 Labels: agones.dev/gameserver=simple-game-server-zqppv agones.dev/role=gameserver Annotations: agones.dev/container: simple-game-server agones.dev/sdk-version: 1.0.0-dce1546 cluster-autoscaler.kubernetes.io/safe-to-evict: false Status: Running IP: 10.48.1.80 Controlled By: GameServer/simple-game-server-zqppv Containers: simple-game-server: Container ID: docker://69eacd03cc89b0636b78abe47926b02183ba84d18fa20649ca443f5232511661 Image: us-docker.pkg.dev/agones-images/examples/simple-game-server:0.38 Image ID: docker-pullable://gcr.io/agones-images/simple-game-server@sha256:6a60eff5e68b88b5ce75ae98082d79cff36cda411a090f3495760e5c3b6c3575 Port: 7654/UDP Host Port: 7058/UDP State: Running Started: Fri, 16 Aug 2019 21:26:22 +0000 Last State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 16 Aug 2019 21:25:45 +0000 Finished: Fri, 16 Aug 2019 21:26:22 +0000 Ready: True Restart Count: 1 Limits: cpu: 20m memory: 32Mi Requests: cpu: 20m memory: 32Mi Liveness: http-get http://:8080/gshealthz delay=5s timeout=1s period=5s #success=1 #failure=3 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from empty (ro) agones-gameserver-sidecar: Container ID: docker://f3c475c34d26232e19b60be65b03bc6ce41931f4c37e00770d3ab4a36281d31c Image: gcr.io/agones-mark/agones-sdk:1.0.0-dce1546 Image ID: docker-pullable://gcr.io/agones-mark/agones-sdk@sha256:4b5693e95ee3023a2b2e2099d102bb6bac58d4ce0ac472e58a09cee6d160cd19 Port: Host Port: State: Running Started: Fri, 16 Aug 2019 21:25:48 +0000 Ready: True Restart Count: 0 Requests: cpu: 30m Liveness: http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=3 Environment: GAMESERVER_NAME: simple-game-server-zqppv POD_NAMESPACE: default (v1:metadata.namespace) Mounts: /var/run/secrets/kubernetes.io/serviceaccount from agones-sdk-token-vr6qq (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: empty: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: agones-sdk-token-vr6qq: Type: Secret (a volume populated by a Secret) SecretName: agones-sdk-token-vr6qq Optional: false QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2m32s default-scheduler Successfully assigned default/simple-game-server-zqppv to gke-test-cluster-default-590db5e4-4s6r Normal Pulling 2m31s kubelet, gke-test-cluster-default-590db5e4-4s6r pulling image "gcr.io/agones-mark/agones-sdk:1.0.0-dce1546" Normal Started 2m28s kubelet, gke-test-cluster-default-590db5e4-4s6r Started container Normal Pulled 2m28s kubelet, gke-test-cluster-default-590db5e4-4s6r Successfully pulled image "gcr.io/agones-mark/agones-sdk:1.0.0-dce1546" Normal Created 2m28s kubelet, gke-test-cluster-default-590db5e4-4s6r Created container Normal Created 114s (x2 over 2m31s) kubelet, gke-test-cluster-default-590db5e4-4s6r Created container Normal Started 114s (x2 over 2m31s) kubelet, gke-test-cluster-default-590db5e4-4s6r Started container Normal Pulled 114s (x2 over 2m31s) kubelet, gke-test-cluster-default-590db5e4-4s6r Container image "us-docker.pkg.dev/agones-images/examples/simple-game-server:0.38" already present on machine$$

Finally, you can also get the logs of your GameServer Pod as well via kubectl logs <pod name> -c <game server container name>, for example:

kubectl logs simple-game-server-zqppv -c simple-game-server
$$2019/08/16 21:26:23 Creating SDK instance 2019/08/16 21:26:24 Starting Health Ping 2019/08/16 21:26:24 Starting UDP server, listening on port 7654 2019/08/16 21:26:24 Marking this server as ready$$

The above commands will only give the most recent container’s logs (so we won’t get the previous crash), but you can use kubectl logs --previous=true simple-game-server-zqppv -c simple-game-server to get the previous instance of the containers logs, or use your Kubernetes platform of choice’s logging aggregation tools to view the crash details.

kubectl events

The “Events” section that is seen at the bottom of a kubectl describe is backed an actual Event record in Kubernetes, which can be queried - and is general persistent for an hour after it is created.

Therefore, even a GameServer or Pod resource is no longer available in the system, its Events may well be.

kubectl get events can be used to see all these events. This can also be grepped with the GameServer name to see all events across both the GameServer and its backing Pod, like so:

kubectl get events | grep simple-game-server-v992s-jwpx2
$$2m47s Normal PortAllocation gameserver/simple-game-server-v992s-jwpx2 Port allocated 2m47s Normal Creating gameserver/simple-game-server-v992s-jwpx2 Pod simple-game-server-v992s-jwpx2 created 2m47s Normal Scheduled pod/simple-game-server-v992s-jwpx2 Successfully assigned default/simple-game-server-v992s-jwpx2 to gke-test-cluster-default-77e7f57d-j1mp 2m47s Normal Scheduled gameserver/simple-game-server-v992s-jwpx2 Address and port populated 2m46s Normal Pulled pod/simple-game-server-v992s-jwpx2 Container image "us-docker.pkg.dev/agones-images/examples/simple-game-server:0.38" already present on machine 2m46s Normal Created pod/simple-game-server-v992s-jwpx2 Created container simple-game-server 2m45s Normal Started pod/simple-game-server-v992s-jwpx2 Started container simple-game-server 2m45s Normal Pulled pod/simple-game-server-v992s-jwpx2 Container image "gcr.io/agones-images/agones-sdk:1.7.0" already present on machine 2m45s Normal Created pod/simple-game-server-v992s-jwpx2 Created container agones-gameserver-sidecar 2m45s Normal Started pod/simple-game-server-v992s-jwpx2 Started container agones-gameserver-sidecar 2m45s Normal RequestReady gameserver/simple-game-server-v992s-jwpx2 SDK state change 2m45s Normal Ready gameserver/simple-game-server-v992s-jwpx2 SDK.Ready() complete 2m47s Normal SuccessfulCreate gameserverset/simple-game-server-v992s Created gameserver: simple-game-server-v992s-jwpx2$$

Other techniques

For more tips and tricks, the Kubernetes Cheatsheet: Interactive with Pods also provides more troubleshooting techniques.

How do I see the logs for Agones?

If something is going wrong, and you want to see the logs for Agones, there are potentially two places you will want to check:

  1. The controller: assuming you installed Agones in the agones-system namespace, you will find that there is a single pod called agones-controller-<hash> (where hash is the unique code that Kubernetes generates) that exists there, that you can get the logs from. This is the main controller for Agones, and should be the first place to check when things go wrong.
    1. To get the logs from this controller run:
      kubectl logs --namespace=agones-system agones-controller-<hash>
  2. The SDK server sidecar: Agones runs a small gRPC + http server for the SDK in a container in the same network namespace as the game server container to connect to via the SDK.
    The logs from this SDK server are also useful for tracking down issues, especially if you are having trouble with a particular GameServer.
    1. To find the Pod for the GameServer look for the pod with a name that is prefixed with the name of the owning GameServer. For example if you have a GameServer named simple-game-server, it’s pod could potentially be named simple-game-server-dnbwj.
    2. To get the logs from that Pod, we need to specify that we want the logs from the agones-gameserver-sidecar container. To do that, run the following:
      kubectl logs simple-game-server-dnbwj -c agones-gameserver-sidecar

Agones uses JSON structured logging, therefore errors will be visible through the "severity":"info" key and value.

Enable Debug Level Logging for the SDK Server

By default, the SDK Server binary is set to an Info level of logging.

You can use the sdkServer.logLevel to increase this to Debug levels, and see extra information about what is happening with the SDK Server that runs alonside your game server container(s).

See the GameServer reference for configuration details.

Enable Debug Level Logging for the Agones Controller

By default, the log level for the Agones controller is “info”. To get a more verbose log output, switch this to “debug” via the agones.controller.logLevel Helm Configuration parameters at installation.

The Feature Flag I enabled/disabled isn’t working as expected

It’s entirely possible that Alpha features may still have bugs in them (They are alpha after all 😃), but the first thing to check is what the actual Feature Flags states were passed to Agones are, and that they were set correctly.

The easiest way is to check the top info level log lines from the Agones controller.

For example:

$ kubectl logs -n agones-system agones-controller-7575dc59-7p2rg  | head
{"filename":"/home/agones/logs/agones-controller-20220615_211540.log","message":"logging to file","numbackups":99,"severity":"info","source":"main","time":"2022-06-15T21:15:40.309349789Z"}
{"logLevel":"info","message":"Setting LogLevel configuration","severity":"info","source":"main","time":"2022-06-15T21:15:40.309403296Z"}
{"ctlConf":{"MinPort":7000,"MaxPort":8000,"SidecarImage":"gcr.io/agones-images/agones-sdk:1.23.0","SidecarCPURequest":"30m","SidecarCPULimit":"0","SidecarMemoryRequest":"0","SidecarMemoryLimit":"0","SdkServiceAccount":"agones-sdk","AlwaysPullSidecar":false,"PrometheusMetrics":true,"Stackdriver":false,"StackdriverLabels":"","KeyFile":"/home/agones/certs/server.key","CertFile":"/home/agones/certs/server.crt","KubeConfig":"","GCPProjectID":"","NumWorkers":100,"APIServerSustainedQPS":400,"APIServerBurstQPS":500,"LogDir":"/home/agones/logs","LogLevel":"info","LogSizeLimitMB":10000},"featureGates":"Example=true\u0026NodeExternalDNS=true\u0026PlayerAllocationFilter=false\u0026PlayerTracking=false","message":"starting gameServer operator...","severity":"info","source":"main","time":"2022-06-15T21:15:40.309528802Z","version":"1.23.0"}
...

The ctlConf section has the full configuration for Agones as it was passed to the controller. Within that log line there is a featureGates key, that has the full Feature Gate configuration as a URL Query String (\u0026 is JSON for &), so you can see if the Feature Gates are set as expected.

I uninstalled Agones before deleted all my GameServers and now they won’t delete

Agones GameServers use Finalizers to manage garbage collection of the GameServers. This means that if the Agones controller doesn’t remove the finalizer for you (i.e. if it has been uninstalled), it can be tricky to remove them all.

Thankfully, if we create a patch to remove the finalizers from all GameServers, we can delete them with impunity.

A quick one liner to do this:

kubectl get gameserver -o name | xargs -n1 -P1 -I{} kubectl patch {} --type=merge -p '{"metadata": {"finalizers": []}}'

Once this is done, you can kubectl delete gs --all and clean everything up (if it’s not gone already).

I’m getting Forbidden errors when trying to install Agones

Ensure that you are running Kubernetes 1.12 or later, which does not require any special clusterrolebindings to install Agones.

If you want to install Agones on an older version of Kubernetes, you need to create a clusterrolebinding to add your identity as a cluster admin, e.g.

# Kubernetes Engine
kubectl create clusterrolebinding cluster-admin-binding \
  --clusterrole cluster-admin --user `gcloud config get-value account`
# Minikube
kubectl create clusterrolebinding cluster-admin-binding \
  --clusterrole=cluster-admin --serviceaccount=kube-system:default

On GKE, gcloud config get-value accounts will return a lowercase email address, so if you are using a CamelCase email, you may need to type it in manually.

I’m getting stuck in “Terminating” when I uninstall Agones

If you try to uninstall the agones-system namespace before you have removed all of the components in the namespace you may end up in a Terminating state.

kubectl get ns
$$NAME STATUS AGE agones-system Terminating 4d$$

Fixing this up requires us to bypass the finalizer in Kubernetes (article link), by manually changing the namespace details:

First get the current state of the namespace:

kubectl get namespace agones-system -o json >tmp.json

Edit the response tmp.json to remove the finalizer data, for example remove the following:

"spec": {
    "finalizers": [
        "kubernetes"
    ]
},

Open a new terminal to proxy traffic:

 kubectl proxy
$$Starting to serve on 127.0.0.1:8001$$

Now make an API call to send the altered namespace data:

 curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/agones-system/finalize

You may need to clean up any other Agones related resources you have in your cluster at this point.


Last modified June 13, 2025: Promote FeatureRollingUpdateFix to Beta (#4205) (30237be)