Why Your Node.js Containers Don't Shut Down Gracefully (And How npm start Is Silently Breaking Everything) 🐳

I still remember my first week at the company. Fresh start, new codebase, eager to learn. Then I noticed something odd. Every deployment — without fail — our error monitoring would light up. 502 errors. Timeouts. Failed requests. For about 30-60 seconds, then everything would stabilize. The team treated it as normal. "Yeah, that's just how deployments are," they said. "It passes quickly."

But something didn't sit right with me. Why would deployments cause errors? We were using ECS with blue-green deployments. Health checks were passing. The infrastructure looked solid. This shouldn't be happening.

So I started digging. I checked the deployment logs. Watched containers stop. Monitored the error patterns. And then I found it. Our Dockerfile looked completely normal:

FROM node:22-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install --production
COPY . .
CMD ["npm", "start"]

There it was. That innocent-looking CMD ["npm", "start"] was the culprit. When ECS sent SIGTERM to gracefully shut down our containers during deployment, the signal went to npm — not our Node.js process. npm doesn't forward signals. It just dies immediately, taking our application with it. No cleanup. No graceful shutdown. Just instant death.

I pitched the fix to the team: change one line to CMD ["node", "server.js"] and add proper graceful shutdown handling. They were skeptical at first — "if it was that simple, why wouldn't everyone do it?" — but agreed to try it in staging.

Next deployment? Flawless. Zero errors. The monitoring dashboard stayed green throughout the entire rollout. Containers finished their work and shut down cleanly within seconds. The team was stunned. We'd been living with deployment errors for months, thinking it was just "the cost of doing deployments."

That investigation taught me: The way you start your application in Docker determines whether it can shut down gracefully. And npm start is a trap that most teams don't even realize they've fallen into.


🚨 The Problem: npm Doesn't Forward Signals

Why This Breaks Everything

In container orchestration (ECS, Kubernetes), graceful shutdown is critical:

  1. New containers spin up
  2. Traffic shifts to new containers
  3. SIGTERM sent → Old containers told to shut down
  4. Grace period → 30 seconds to finish work
  5. SIGKILL → Force kill if not stopped

When you use npm start, step 3 fails silently.

The Signal Flow

With npm start:

ECS sends SIGTERM → Container
  └── PID 1: npm (receives SIGTERM)
      └── PID 23: node server.js (NEVER receives signal)
          └── npm exits immediately
          └── Node.js gets SIGKILL
          └── No graceful shutdown

Result:

  • In-flight requests aborted mid-processing
  • Database transactions incomplete
  • WebSocket connections severed
  • File writes corrupted
  • Users see errors during every deployment

Why npm Doesn't Work

npm was designed as a package manager, not a process supervisor. When it receives SIGTERM:

  • Exits immediately
  • Doesn't forward signals to child processes
  • Leaves your Node.js process orphaned
  • Gets forcefully killed

These all have the same problem:

❌ CMD ["npm", "start"]
❌ CMD ["npm", "run", "start"]
❌ CMD ["yarn", "start"]
❌ CMD ["pnpm", "start"]
❌ CMD ["sh", "-c", "npm start"]

✅ The Solution: Direct Process Execution

The Right Way

# ✅ GOOD - Node.js is PID 1
CMD ["node", "server.js"]

Now SIGTERM goes directly to your application:

ECS sends SIGTERM → Container
  └── PID 1: node server.js (receives SIGTERM)
      └── Your app handles graceful shutdown
      └── Finishes in-flight requests
      └── Closes connections cleanly
      └── Exits with code 0

🎯 Implementing Graceful Shutdown

Express/Vanilla Node.js

const express = require("express");
const app = express();

// Your routes
app.get("/api/data", async (req, res) => {
  // Handle request
  res.json({ data: "response" });
});

const server = app.listen(3000, () => {
  console.log("Server started on port 3000");
});

// Graceful shutdown handler
process.on("SIGTERM", async () => {
  console.log("SIGTERM received, shutting down gracefully...");

  // Stop accepting new connections
  server.close(() => {
    console.log("HTTP server closed");
  });

  // Force exit after 25s (before ECS 30s timeout)
  const forceExitTimer = setTimeout(() => {
    console.error("Forced shutdown after timeout");
    process.exit(1);
  }, 25000);

  try {
    // Close database
    await database.close();
    console.log("Database closed");

    // Close Redis
    await redis.quit();
    console.log("Redis closed");

    // Cleanup complete
    clearTimeout(forceExitTimer);
    console.log("Graceful shutdown completed");
    process.exit(0);
  } catch (error) {
    console.error("Error during shutdown:", error);
    process.exit(1);
  }
});

// Also handle SIGINT (Ctrl+C)
process.on("SIGINT", () => process.emit("SIGTERM"));

NestJS: Built-in Lifecycle Hooks

NestJS provides lifecycle hooks that make graceful shutdown even easier:

// main.ts
import { NestFactory } from "@nestjs/core";
import { AppModule } from "./app.module";

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  // Enable graceful shutdown hooks
  app.enableShutdownHooks();

  await app.listen(3000);
  console.log("Application is running on port 3000");
}
bootstrap();

What enableShutdownHooks() does:

  • Automatically listens for SIGTERM and SIGINT
  • Calls onModuleDestroy() on all modules
  • Calls beforeApplicationShutdown() hook
  • Calls onApplicationShutdown() hook
  • Closes HTTP server
  • Exits gracefully

Implementing Cleanup in NestJS Services

import { Injectable, OnModuleDestroy } from "@nestjs/common";

@Injectable()
export class DatabaseService implements OnModuleDestroy {
  private connection: DatabaseConnection;

  async onModuleDestroy() {
    console.log("Closing database connections...");
    await this.connection.close();
    console.log("Database connections closed");
  }
}

@Injectable()
export class CacheService implements OnModuleDestroy {
  private redisClient: Redis;

  async onModuleDestroy() {
    console.log("Closing Redis connection...");
    await this.redisClient.quit();
    console.log("Redis connection closed");
  }
}

@Injectable()
export class QueueService implements OnModuleDestroy {
  private queue: Queue;

  async onModuleDestroy() {
    console.log("Closing queue connections...");
    await this.queue.close();
    console.log("Queue connections closed");
  }
}

Advanced NestJS: Custom Shutdown Logic

import {
  Injectable,
  OnApplicationShutdown,
  BeforeApplicationShutdown,
} from "@nestjs/common";

@Injectable()
export class WorkerService
  implements BeforeApplicationShutdown, OnApplicationShutdown
{
  private activeJobs = new Set<Promise<void>>();

  async processJob(job: Job) {
    const jobPromise = this.handleJob(job);
    this.activeJobs.add(jobPromise);

    try {
      await jobPromise;
    } finally {
      this.activeJobs.delete(jobPromise);
    }
  }

  async beforeApplicationShutdown(signal?: string) {
    console.log(`Received ${signal}, waiting for active jobs...`);

    // Wait for all active jobs to complete
    if (this.activeJobs.size > 0) {
      console.log(`Waiting for ${this.activeJobs.size} active jobs...`);
      await Promise.all(Array.from(this.activeJobs));
      console.log("All jobs completed");
    }
  }

  async onApplicationShutdown(signal?: string) {
    console.log("Application shutdown complete");
  }
}

NestJS Shutdown Hooks Execution Order:

  1. beforeApplicationShutdown() - Called before shutdown starts
  2. onModuleDestroy() - Called on all modules
  3. HTTP server closes (stops accepting new requests)
  4. In-flight requests complete
  5. onApplicationShutdown() - Called after shutdown completes

🏗️ Complete Dockerfile Examples

❌ Wrong Approach

FROM node:22-alpine
WORKDIR /app

COPY package*.json ./
RUN npm install --production

COPY . .

# BAD: npm is PID 1, won't forward SIGTERM
CMD ["npm", "start"]

Problems:

  • npm doesn't forward signals
  • Node.js never receives SIGTERM
  • No graceful shutdown possible
  • Deployments cause errors

✅ Right Approach: Express

FROM node:22-alpine
WORKDIR /app

# Install dependencies (better caching)
COPY package*.json ./
RUN npm install --production

# Copy application
COPY . .

# Direct execution - Node.js is PID 1
CMD ["node", "server.js"]

✅ Right Approach: NestJS

FROM node:22-alpine
WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm install --production

# Copy application
COPY . .

# Build NestJS app
COPY tsconfig*.json ./
RUN npm run build

# Direct execution of compiled output
CMD ["node", "dist/main.js"]

🎯 Production-Ready with Health Checks

FROM node:22-alpine
WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm install --production

# Copy and build
COPY . .
RUN npm run build

# Health check for orchestrator
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s \
  CMD node healthcheck.js || exit 1

# Direct execution for proper signal handling
CMD ["node", "dist/main.js"]

🔄 How It Works in ECS Deployments

Without Graceful Shutdown (npm start)

T+0s:   New tasks healthy
T+30s:  Load balancer stops routing to old tasks
T+60s:  ECS sends SIGTERM to npm (PID 1)
T+60s:  npm exits immediately
T+60s:  Node.js killed (SIGKILL)
T+60s:  In-flight requests ABORTED

Error rate: 10-15%
User impact: 502 errors, failed requests

With Graceful Shutdown (direct execution)

T+0s:   New tasks healthy
T+30s:  Load balancer stops routing to old tasks
T+60s:  ECS sends SIGTERM to Node.js (PID 1)
T+60s:  App stops accepting new requests
T+62s:  In-flight requests complete
T+62s:  Database connections close
T+63s:  App exits cleanly (code 0)

Error rate: 0%
User impact: None - zero-downtime deployment

📊 Real Impact

Before Fix (npm start)

  • Deployment error rate: 12%
  • Failed requests per deployment: 10-15
  • Incomplete database transactions: 3-5
  • User complaints: Frequent

After Fix (node + graceful shutdown)

  • Deployment error rate: 0%
  • Failed requests per deployment: 0
  • Incomplete transactions: 0
  • User complaints: None

What changed: One Dockerfile line + graceful shutdown implementation


🎯 Best Practices

1. Always Use Direct Execution

# ✅ Node.js
CMD ["node", "server.js"]

# ✅ NestJS
CMD ["node", "dist/main.js"]

# ❌ NEVER
CMD ["npm", "start"]

2. Implement Graceful Shutdown

Express: Add SIGTERM handler manually NestJS: Use app.enableShutdownHooks() + lifecycle hooks

3. Set Proper Timeouts

// App: 25 seconds (leaves 5s buffer)
const SHUTDOWN_TIMEOUT = 25000;

// ECS Task Definition: 30 seconds
{
  "stopTimeout": 30
}

// ALB Target Group: 30 seconds
{
  "deregistration_delay": 30
}

4. Test Your Shutdown

# Build and run
docker build -t my-app .
docker run -p 3000:3000 my-app

# Send SIGTERM
docker kill --signal=SIGTERM <container-id>

# Check logs - should see:
# "SIGTERM received..."
# "HTTP server closed"
# "Database closed"
# "Graceful shutdown completed"

5. Monitor Shutdown Process

Log these events:

  • SIGTERM received
  • Connections closing
  • In-flight requests count
  • Resources cleaned up
  • Time taken to shut down
  • Exit code

🔍 Quick Debugging

Check Process Tree

# Inside container
docker exec <container-id> ps aux

# Should see:
PID 1: node server.js

# NOT:
PID 1: npm
PID 23: node server.js

Test Signal Handling

docker kill --signal=SIGTERM <container-id>
docker logs -f <container-id>

# Should see shutdown messages
# Should exit within seconds
# Should exit with code 0

🚀 Migration Checklist

  • Update Dockerfile: CMD ["node", "server.js"]
  • Add SIGTERM handler (Express) or enable shutdown hooks (NestJS)
  • Implement resource cleanup
  • Set shutdown timeout (25 seconds)
  • Test locally with SIGTERM
  • Deploy to staging
  • Monitor deployment error rates
  • Verify zero errors during deployment
  • Deploy to production

✨ Final Thoughts

One line in your Dockerfile determines whether your deployments cause user-facing errors or are completely invisible to users.

Key Takeaways:

  1. npm start breaks signal handling - Use direct execution instead
  2. NestJS makes it easier - Built-in shutdown hooks handle most cases
  3. Graceful shutdown is mandatory - Not optional for production
  4. Test before deploying - Don't discover issues in production
  5. Monitor your deployments - Track error rates during rollouts

Remember: I discovered this by questioning why "normal" deployment errors were happening at my new company. Don't accept deployment errors as normal. Don't wait for a crisis to investigate.

Change CMD ["npm", "start"] to CMD ["node", "server.js"], implement graceful shutdown, and never worry about deployment errors again.

Your containers will be stopped. Make sure they know how to exit gracefully.