Docker Exit Code 137 (OOMKilled): Causes and Real Fixes

Your container was running fine, and then it wasn’t. The logs cut off mid-sentence. docker ps -a shows it exited with code 137. No stack trace, no graceful shutdown, nothing in the application logs explaining why.

That abruptness is the whole clue. Exit code 137 means your process didn’t decide to quit — something killed it from the outside with SIGKILL, the one signal a process can’t catch or ignore. In nine cases out of ten, that “something” is the Linux OOM killer reclaiming memory. But not always, and bumping the memory limit without checking which case you’re in is how people end up paying for 8GB containers that still die.

Let me walk through what 137 actually means, how to tell which flavor of it you’ve hit, and the fix that matches each one.

Where the number 137 comes from

When a Linux process is terminated by a signal, its exit code is 128 + signal_number. SIGKILL is signal 9. So 128 + 9 = 137. That’s it — there’s no Docker-specific magic here.

This decoding works for the whole family. You’ll occasionally see exit code 143, which is 128 + 15 (SIGTERM) — a graceful shutdown request, often from docker stop or a Kubernetes rollout. And 139 is 128 + 11 (SIGSEGV), a segfault. So when you see 137, read it as “a SIGKILL landed on this process.” The next question is who sent it.

There are really only three senders worth considering:

The kernel’s OOM killer, because the container hit its memory limit.
The kernel’s OOM killer again, but because the whole host ran out of memory.
A human or orchestrator that ran docker kill, hit a stop timeout, or had a liveness probe give up.

The first one is by far the most common, so start there.

Step one: confirm it was actually an OOM kill

Don’t guess. Docker records whether the OOM killer fired, and it takes one command to check:

docker inspect <container> --format '{{.State.OOMKilled}}'

If that prints true, you have your answer — the container exceeded its memory limit and the kernel stepped in. If it prints false but you still got a 137, that points away from a container-limit OOM and toward a host-level kill or an external docker kill. Hold that thought, because the fix is different.

While you’re at it, docker inspect also shows the exit code and the OOM flag together:

docker inspect <container> --format '{{.State.ExitCode}} {{.State.OOMKilled}}'

For a live container that’s creeping toward the ceiling, docker stats shows real-time memory against the limit:

docker stats --no-stream

Watch the MEM USAGE / LIMIT column. If usage is brushing up against the limit right before the kill, you’ve confirmed it. And if you want the kernel’s own account of what happened, dmesg keeps the receipt:

dmesg -T | grep -i -E 'oom|killed process'

A line mentioning “Memory cgroup out of memory” tells you it was a cgroup-level (container) kill. A plain “Out of memory: Killed process” with no cgroup reference points to the host running dry. That distinction is the fork in the road for everything below.

Cause #1: the memory limit is too low for honest work

This is the boring, common case. Your app genuinely needs more memory than you gave it, and it hits the ceiling under normal load — not a leak, just a mismatch between the limit and reality.

If you set the limit too tight (or your platform set a low default), the fix is to right-size it. With docker run:

docker run --memory=1g --memory-swap=1g myimage

Setting --memory-swap equal to --memory disables swap for the container, which is usually what you want — swapping a containerized app to disk turns an OOM into a slow, mysterious latency problem instead, which is arguably worse. In Compose:

services:
  api:
    image: myimage
    deploy:
      resources:
        limits:
          memory: 1g

How do you pick the number? Run the container under a realistic load, watch docker stats for the steady-state and peak usage, then add headroom — I usually go 25–50% above the observed peak. Don’t eyeball it from a 10-second idle reading; memory use during startup or a heavy request can be double the resting value.

One trap worth calling out: a lot of runtimes don’t see your container limit unless you tell them. The JVM has handled cgroup limits automatically for years now, but Node still defaults its old-space heap based on the host’s memory, not the container’s. So a Node app in a 512MB container can happily try to grow its heap to a couple of gigabytes and get killed. Pin it:

node --max-old-space-size=400 server.js

Keep that number comfortably under the container limit — the heap isn’t the only thing using memory in the process.

Cause #2: a genuine memory leak

Here’s how you tell this apart from Cause #1: a too-small limit kills the container fast and consistently, often within seconds or minutes of the same workload. A leak kills it slowly — the container runs for hours or days, memory climbs in a sawtooth that never fully comes back down, and eventually it crosses the line. If you find yourself raising the limit, getting a few more hours of uptime, then raising it again, you don’t have a sizing problem. You have a leak, and a bigger limit just buys a longer fuse.

Raising the ceiling forever is the wrong move here. Profile the app instead. For Node, take heap snapshots a few minutes apart and diff them:

node --inspect server.js
# then connect Chrome DevTools, Memory tab, take two snapshots under load and compare

For Python, tracemalloc or memray will point at the allocation sites that keep growing. For the JVM, a heap dump into Eclipse MAT shows you the retained set. The specific tool matters less than the discipline: find what’s holding references it shouldn’t, and fix the code. The usual suspects are unbounded caches, event listeners that never get removed, and connection pools that grow without a cap.

A bigger memory limit is still worth setting as a safety net so a leak degrades gracefully instead of taking the host down with it. But treat that as a seatbelt, not a fix.

Cause #3: the host itself is out of memory

Now back to the case where OOMKilled was false or dmesg showed a non-cgroup kill. Here the container didn’t exceed its limit — the whole machine ran out of memory, and the kernel picked a victim to survive. Your container may just have been the unlucky one with the highest OOM score.

This shows up a lot on CI runners and small VMs where several containers share a host with no per-container limits set. The fix isn’t on the dying container; it’s on the host. Set memory limits on every container so one greedy process can’t starve the rest, and give the box enough RAM (or fewer concurrent containers) for the real workload. Check overall pressure with:

free -h
docker stats --no-stream

There’s a build-time variant of this that trips people up too. If docker build dies with 137 — often during a webpack, tsc, or npm step — it’s usually the build process blowing past Docker Desktop’s VM memory allocation, not a runtime limit at all. Bump the memory in Docker Desktop’s resource settings, or for the same step in CI, give the runner more RAM or cap the build tool’s own heap (NODE_OPTIONS=--max-old-space-size=...).

Kubernetes: the same kill, with more bookkeeping

In Kubernetes the kill is identical at the kernel level, but the orchestration around it adds nuance worth understanding, because two genuinely different events both surface as “OOMKilled” or a 137.

First, confirm it the k8s way:

kubectl describe pod <pod> | grep -A5 "Last State"
kubectl get events --field-selector reason=OOMKilling

The Last State block will show Reason: OOMKilled with Exit Code: 137 if a container hit its own memory limit. That’s the container-level kill — it’s set by resources.limits.memory, and it fires regardless of anything else going on in the cluster:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"

The single biggest source of confusion here is the difference between requests and limits. The request is what the scheduler uses to place the pod — it’s a reservation, the guaranteed floor. The limit is the hard ceiling that triggers the OOM kill. Set the limit too low relative to what the app actually uses and you get container-level OOMKills even on a host with plenty of free RAM. People stare at a half-empty node wondering why their pod keeps dying; the answer is the pod hit its own limit, and the node’s spare memory is irrelevant to that.

The second, separate event is node-level pressure. When the whole node runs low on memory, the kubelet doesn’t wait for the kernel — it proactively evicts pods to keep the node alive, and it chooses victims by QoS class. The pecking order:

BestEffort pods (no requests or limits at all) get evicted first.
Burstable pods (requests lower than limits) go next, worst offenders above their request first.
Guaranteed pods (requests equal to limits) are evicted last.

So setting requests == limits to land in the Guaranteed class isn’t just tidy — it materially lowers the odds your pod is the one chosen when the node is under pressure. The flip side: a BestEffort pod with no limits set is both first in line for eviction and able to grow until it triggers the mess in the first place. Setting limits on everything is the cheapest reliability win in most clusters.

If you’re on a recent cluster (1.28+ with cgroup v2), there’s a behavior change worth knowing: memory.oom.group is set, so when the OOM killer fires inside a container it kills all the processes in that container’s cgroup together, not just the single fattest one. That’s a good thing — it stops you from ending up with a half-dead container where the main process survived but a worker got reaped, leaving the thing limping in an undefined state. It does mean the kill is more all-or-nothing than it used to be.

A checklist to stop fighting 137

You don’t want to keep rediscovering this during incidents. A few habits make 137 a rare, quickly-diagnosed event instead of a recurring mystery:

Set an explicit memory limit on every container and every pod. Unlimited containers are how one process takes down a host.
In Kubernetes, set requests and limits, and for anything important make them equal to land in the Guaranteed QoS class.
Size limits from observed peak usage plus headroom, not from a guess or a copied YAML snippet.
Tell your runtime about the limit — --max-old-space-size for Node, sane -Xmx for the JVM if you’re not relying on cgroup awareness.
Monitor memory and alert before the kill. A pod sitting at 95% of its limit for an hour is a warning you can act on; a 3am OOMKill is not.
When 137 does hit, check OOMKilled and dmesg first to separate a container-limit kill from a host-level one. The fix lives in different places.

The thing to internalize is that 137 is a symptom, and the most common mistake is treating “raise the limit” as the cure for all three causes. It’s the right fix for exactly one of them. Next time a container dies with 137, run the docker inspect OOM check before you touch the memory setting — that one command tells you whether you’re sizing, hunting a leak, or rescuing a starved host.

Sources: