Systemd: The Complete Guide from Zero to Hero — Architecture, Units, cgroups, Logging, and Real-World Examples

Introduction: Why SysVinit Died and What Systemd Fixed

Imagine a chef cooking dinner for 10 guests but making each dish completely from scratch, one at a time — starting the salad only after the soup is fully served. That's essentially how SysVinit worked: it started services one by one, in a fixed order, regardless of whether they were actually dependent on each other.

As Linux systems grew more complex, this became a serious bottleneck:

Slow boot times. Service A waits for Service B to finish, even if there's zero dependency between them.
No process tracking. Init launched a script and moved on. A child process crashed? SysVinit had no idea.
Log chaos. Every service wrote logs wherever it wanted — /var/log/nginx/, syslog, /tmp/ — no unified interface.
Brittle shell scripts. The /etc/init.d/ scripts were fragile, hard to maintain, and inconsistent across distros.

In 2010, Lennart Poettering introduced systemd to solve all of these problems simultaneously: parallel startup, dependency graphs, control groups, and centralized logging. The community response was controversial (to put it mildly), but today systemd is the de-facto standard on Fedora, Debian, Ubuntu, Arch, RHEL, and most other major distributions.

Let's break it down piece by piece.

Part 1. Systemd Architecture — What's Under the Hood

1.1 PID 1 — The Ruler of All Processes

When the Linux kernel boots, it launches the very first user-space process with PID 1. On systemd systems, that process IS the systemd daemon. It's the direct parent of everything else in the system.

This matters for two reasons:

If PID 1 crashes, the system panics. Hence systemd is written to be extremely robust.
All orphaned processes (whose parent died) are automatically reparented to PID 1.

Linux Kernel
    └── systemd (PID 1)
            ├── journald (logging)
            ├── udevd (device management)
            ├── networkd (networking)
            ├── nginx.service (your web server)
            ├── postgresql.service (database)
            └── ... all other services

1.2 Key Components

systemd (PID 1) The conductor of the whole orchestra. It reads unit files, builds a dependency graph, launches processes in the right order, and tracks them via cgroups.

systemctl Your control panel. When you type systemctl start nginx, this tool does NOT start nginx directly. It sends a D-Bus message to the systemd daemon, which does the actual work. This is a fundamental difference from running a script.

journald Centralized logging daemon. It captures stdout and stderr from all services, enriches each entry with structured metadata (PID, UID, unit name, hostname), and stores everything in a binary format that supports complex queries — think SQL for logs.

udevd Device manager. When you plug in a USB drive, udevd creates /dev/sdb, loads the appropriate kernel modules, and can trigger specific services.

networkd, timedated, logind Specialized daemons for network management, system time, and user sessions. They all communicate with PID 1 via D-Bus.

1.3 D-Bus — The Communication Backbone

D-Bus is an inter-process communication (IPC) system bus — think of it as an internal messaging platform between processes. Instead of processes calling each other's functions directly (which is unsafe), they send structured messages through the bus.

Example flow for systemctl start nginx:

systemctl forms a D-Bus message: "Call the StartUnit method with argument nginx.service"
The message goes onto the system bus
The systemd daemon receives and processes it
Returns the result through the same bus

This provides security (permissions checked at the D-Bus level), flexibility (any program can manage services), and extensibility.

Part 2. Units — The Building Blocks of Systemd

A unit is a description of any system resource as a declarative configuration file. Think of it as the "passport" for a service, socket, timer, or mount point.

2.1 Where Units Live

Path	Purpose	Priority
`/usr/lib/systemd/system/`	Units installed by package manager	Lowest
`/etc/systemd/system/`	Your custom units and overrides	High
`/run/systemd/system/`	Temporary units (gone after reboot)	Highest

Important: Never edit files in /usr/lib/systemd/system/ directly — they'll be overwritten on package updates. To modify a stock unit, use systemctl edit <name>, which creates a drop-in override at /etc/systemd/system/<name>.d/override.conf.

2.2 Unit Types

`.service` — Service Units (Most Common)

Describes a daemon or process. This is what you'll use in 90% of cases.

ini

# /etc/systemd/system/myapp.service
[Unit]
Description=My Awesome Application
After=network.target postgresql.service
Requires=postgresql.service

[Service]
Type=simple
ExecStart=/usr/bin/myapp --config /etc/myapp/config.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5s
User=myapp
Group=myapp

[Install]
WantedBy=multi-user.target

The Type= parameter — get this right:

Type	Behavior	When to Use
`simple`	Service is considered started immediately after ExecStart launches	Most modern applications
`forking`	Program calls fork() and the parent exits. Systemd waits for this.	Classic Unix daemons (nginx, apache)
`notify`	Program signals systemd via `sd_notify()` when ready	Programs with native systemd API support
`oneshot`	Program runs and exits. Systemd waits for completion.	Scripts, one-off tasks
`dbus`	Service is considered started when it claims a D-Bus name	Daemons using D-Bus
`idle`	Start delayed until all other jobs complete	Low-priority background tasks

Restart policy:

ini

[Service]
# Restart= options:
# no          — never restart
# on-success  — only on exit code 0
# on-failure  — on non-zero exit, signal, or timeout (most common choice)
# on-abnormal — on signal or timeout (not normal exit)
# always      — always restart (even after systemctl stop!)
Restart=on-failure
RestartSec=5s

# Limit restart attempts:
# Max 5 attempts within 30 seconds, then give up
StartLimitIntervalSec=30s
StartLimitBurst=5

`.socket` — Socket-Based Activation (Lazy Launch)

This is one of the most powerful and underappreciated features of systemd. The idea: why keep 20 services running when most of them get called once an hour?

Socket-based activation works like this:

systemd opens and listens on a socket (port, Unix socket, or FIFO)
The actual service is not running
The first connection arrives
systemd launches the service and passes the established connection to it
The client never notices — the connection isn't lost!

ini

# /etc/systemd/system/echo.socket
[Unit]
Description=Echo Server Socket

[Socket]
ListenStream=12345
Accept=no

[Install]
WantedBy=sockets.target

ini

# /etc/systemd/system/echo.service
[Unit]
Description=Echo Server

[Service]
Type=simple
ExecStart=/usr/local/bin/echo-server
# Service receives the socket via file descriptor 3
StandardInput=socket

Enable: sudo systemctl enable --now echo.socket — and the service starts on the first connection.

`.timer` — Cron Replacement with Superpowers

Systemd timers beat cron on several fronts:

Support dependencies (run only if some service is running)
Logged in journald like any other unit
Can "catch up" on missed runs after reboot (Persistent=true)
Support random delays to spread load across the hour

ini

# /etc/systemd/system/backup.timer
[Unit]
Description=Daily Backup Timer

[Timer]
# Run every day at 02:30
OnCalendar=*-*-* 02:30:00
# Random delay up to 10 minutes (don't hammer the server at exactly 02:30!)
RandomizedDelaySec=10m
# Run the task if it was missed (e.g. system was off)
Persistent=true

[Install]
WantedBy=timers.target

ini

# /etc/systemd/system/backup.service
[Unit]
Description=Daily Backup Job

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh
User=backup

Enable: sudo systemctl enable --now backup.timer

Check all active timers: systemctl list-timers --all

OnCalendar syntax cheatsheet:

Expression	Meaning
`daily`	Every day at 00:00
`weekly`	Every Monday at 00:00
`monthly`	1st of every month
`--* 09:00:00`	Every day at 09:00
`Mon-Fri --* 08:30:00`	Weekdays at 08:30
`--1,15 00:00:00`	1st and 15th of every month

Validate an expression: systemd-analyze calendar "Mon-Fri *-*-* 08:30:00"

`.target` — Unit Groups (Replacing Runlevels)

A target is not a service — it's a synchronization point. Think of it as a "system state" to be reached.

Target	SysV Runlevel	Meaning
`poweroff.target`	0	Shutdown
`rescue.target`	1	Single-user mode
`multi-user.target`	3	Multi-user, no GUI
`graphical.target`	5	With graphical interface
`reboot.target`	6	Reboot

bash

# Check current default target (like current runlevel)
systemctl get-default

# Switch target (like init 3)
sudo systemctl isolate multi-user.target

# Set default target
sudo systemctl set-default multi-user.target

2.3 Unit Dependencies — A Graph, Not a Queue

This is one of the key differentiators from SysVinit. Instead of a fixed sequence, systemd builds a directed dependency graph.

Dependency directives:

Directive	Type	Behavior
`Requires=`	Hard	If the dependency fails to start, this unit also fails and stops with it
`Wants=`	Soft	Tries to start the dependency, but won't stop if it fails
`BindsTo=`	Very hard	Like Requires, but this unit stops whenever the dependency stops
`PartOf=`	One-way	Stops/restarts together with the dependency, but doesn't start with it
`Conflicts=`	Conflict	Cannot run simultaneously with the specified unit

Ordering directives:

Directive	Behavior
`After=`	This unit starts AFTER the specified one
`Before=`	This unit starts BEFORE the specified one

Critical nuance: After= and Before= only define ordering, NOT dependency! If you write only After=postgresql.service without Requires=postgresql.service, your service will start after PostgreSQL but will also start even if PostgreSQL failed. You almost always need both.

Part 3. cgroups — Why Systemd Always Knows Your Processes

3.1 The Problem cgroups Solve

Consider: nginx is running. It forks 4 workers. One worker forks a CGI process. That forks something else. Now there are 10 processes, all "belonging" to nginx, but in SysVinit there was no way to track this.

Control Groups (cgroups) are a Linux kernel mechanism that lets you group processes hierarchically and manage them collectively.

Systemd automatically creates a cgroup for every service. All child processes live inside that group. Always.

/sys/fs/cgroup/
├── system.slice/
│   ├── nginx.service/        ← all nginx processes here
│   │   ├── pid: 1234 (master)
│   │   ├── pid: 1235 (worker 1)
│   │   ├── pid: 1236 (worker 2)
│   │   └── pid: 1237 (cache loader)
│   ├── postgresql.service/
│   └── redis.service/
└── user.slice/
    └── user-1000.slice/      ← user processes

3.2 What cgroups Give You in Practice

Clean process termination — no zombie processes: When you run systemctl stop nginx, systemd sends the signal to the entire cgroup — all 10 processes die, including ones you didn't know existed. No more phantom workers.

Monitoring:

bash

# Show process tree for a service's cgroup
systemd-cgls /system.slice/nginx.service

# Real-time resource monitoring (like top, but for cgroups)
systemd-cgtop

3.3 Resource Limits via Unit Files

Instead of manually configuring cgroups, just add lines to your [Service] section:

ini

[Service]
# === MEMORY ===
# Soft limit: systemd will aggressively reclaim memory
MemoryHigh=400M
# Hard limit: OOM Killer will kill the process if exceeded
MemoryMax=512M
# Guaranteed memory (won't be given to others)
MemoryMin=100M

# === CPU ===
# 50% of a single core
CPUQuota=50%
# Or: CPU weight (1-10000, default=100)
CPUWeight=200

# === DISK I/O ===
IOReadBandwidthMax=/dev/sda 50M
IOWriteBandwidthMax=/dev/sda 20M

# === NETWORK ===
IPAccounting=yes
IPAddressAllow=192.168.0.0/24
IPAddressDeny=any

Verify current limits:

bash

# Check cgroup filesystem directly
cat /sys/fs/cgroup/system.slice/nginx.service/memory.max
# 536870912 (512 MB in bytes)

# Or via systemctl
systemctl show nginx.service | grep -E 'Memory|CPU|IO'

Part 4. journald — Logs as a Database

4.1 Why journald Beats Plain Text Logs

A plain syslog is a text file. Want to find all nginx errors from the last hour? You write grep "error" /var/log/nginx/error.log | grep "$(date +%b\ %d)" and hope for the best.

journald is a structured store with indexes. Every entry is not a text string but an object with fields:

_SYSTEMD_UNIT=nginx.service    ← which service
_PID=1234                       ← which process
_UID=www-data                   ← which user
_HOSTNAME=web-01                ← which host
PRIORITY=3                      ← severity level (err)
MESSAGE=connection refused...   ← the message itself
_SOURCE_REALTIME_TIMESTAMP=...  ← precise timestamp

4.2 Complete journalctl Reference

bash

# === BASIC QUERIES ===

# All logs for a service
sudo journalctl -u nginx.service

# Last 50 lines
sudo journalctl -u nginx.service -n 50

# Follow in real time (like tail -f)
sudo journalctl -u nginx.service -f

# From a specific time
sudo journalctl -u nginx.service --since "2024-01-15 10:00:00"
sudo journalctl -u nginx.service --since "1 hour ago"
sudo journalctl -u nginx.service --since today
sudo journalctl -u nginx.service --since yesterday --until "2024-01-14 23:59:59"

# === FILTERING BY SEVERITY ===
# 0=emerg, 1=alert, 2=crit, 3=err, 4=warning, 5=notice, 6=info, 7=debug
sudo journalctl -p err                      # only errors
sudo journalctl -p err..warning             # err through warning
sudo journalctl -u nginx -p warning         # nginx warnings only

# === FILTERING BY BOOT ===
sudo journalctl -b                          # current boot
sudo journalctl -b -1                       # previous boot
sudo journalctl -b -2                       # two boots ago
sudo journalctl --list-boots                # list all boots

# === OUTPUT FORMATS ===
sudo journalctl -u nginx -o json            # JSON (for parsing)
sudo journalctl -u nginx -o json-pretty     # Formatted JSON
sudo journalctl -u nginx -o verbose         # All metadata fields
sudo journalctl -u nginx -o cat             # Message text only

# === ADVANCED QUERIES ===

# Logs for a specific process
sudo journalctl _PID=1234

# Logs from a specific user
sudo journalctl _UID=1000

# Combine conditions (OR)
sudo journalctl _SYSTEMD_UNIT=nginx.service + _SYSTEMD_UNIT=php-fpm.service

# Export to file
sudo journalctl -u nginx --since today -o json > nginx-today.json

# === JOURNAL MANAGEMENT ===

# Disk usage of the journal
sudo journalctl --disk-usage

# Clean up logs older than 2 weeks
sudo journalctl --vacuum-time=2weeks

# Clean up to a specific size
sudo journalctl --vacuum-size=500M

Part 5. Real-World Scenarios

5.1 Creating a Production-Ready Service from Scratch

ini

# /etc/systemd/system/api-server.service
[Unit]
Description=API Server
Documentation=https://github.com/company/api-server
After=network-online.target
Wants=network-online.target
Requires=postgresql.service
After=postgresql.service

[Service]
Type=notify
ExecStart=/usr/local/bin/api-server
EnvironmentFile=/etc/api-server/env
Environment="PORT=8080"
Environment="LOG_LEVEL=info"

Restart=on-failure
RestartSec=5s
StartLimitIntervalSec=60s
StartLimitBurst=3

User=api
Group=api
WorkingDirectory=/opt/api-server

# === SECURITY HARDENING ===
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
SystemCallFilter=@system-service
ReadWritePaths=/var/lib/api-server /var/log/api-server

# === RESOURCE LIMITS ===
MemoryMax=512M
CPUQuota=200%
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

5.2 Drop-in Files — Override Without Touching Originals

bash

# systemctl edit creates the override file automatically
sudo systemctl edit nginx.service
# Creates: /etc/systemd/system/nginx.service.d/override.conf

ini

[Service]
MemoryMax=256M
Restart=always
Environment="NGINX_ENVSUBST_OUTPUT_DIR=/etc/nginx"

bash

sudo systemctl daemon-reload
sudo systemctl restart nginx.service

# View the full effective config (original + drop-ins)
sudo systemctl cat nginx.service

5.3 Boot Time Analysis and Optimization

bash

# Total boot time
systemd-analyze
# Startup finished in 2.134s (kernel) + 8.643s (userspace) = 10.777s

# Top boot-time offenders
systemd-analyze blame

# Critical path to a specific target
systemd-analyze critical-chain graphical.target

# Generate visual timeline (open in browser!)
systemd-analyze plot > boot-plot.svg

# Validate a unit file for errors
systemd-analyze verify /etc/systemd/system/myapp.service

5.4 Diagnosing a Failing Service — Step by Step

bash

# Step 1: Service status
sudo systemctl status myapp.service

# Step 2: Recent logs with full detail
sudo journalctl -u myapp.service -n 100 --no-pager

# Step 3: Logs since last boot (for startup issues)
sudo journalctl -u myapp.service -b

# Step 4: All errors in the system at the time of failure
sudo journalctl -p err --since "10 min ago" --no-pager

# Step 5: Check dependencies
systemctl list-dependencies myapp.service

# Step 6: Run manually as the service user (to reproduce)
sudo -u myapp /usr/local/bin/myapp --config /etc/myapp/config.yml

# Step 7: Check environment variables
sudo systemctl show myapp.service -p Environment

# Step 8: Check file permissions
sudo systemctl cat myapp.service | grep -E 'ExecStart|WorkingDirectory|User'
sudo ls -la /usr/local/bin/myapp

Quick Reference Cheatsheet

Service Control

Task	Command
Start	`sudo systemctl start <name>`
Stop	`sudo systemctl stop <name>`
Restart	`sudo systemctl restart <name>`
Reload config (no stop)	`sudo systemctl reload <name>`
Status	`systemctl status <name>`
Enable autostart	`sudo systemctl enable <name>`
Disable autostart	`sudo systemctl disable <name>`
Enable AND start	`sudo systemctl enable --now <name>`
Block permanently	`sudo systemctl mask <name>`

Viewing State

Task	Command
All running services	`systemctl list-units --type=service --state=running`
All failed	`systemctl --failed`
Check autostart	`systemctl is-enabled <name>`
Check active	`systemctl is-active <name>`
Dependency tree	`systemctl list-dependencies <name>`
Who depends on this	`systemctl list-dependencies --reverse <name>`
All timers	`systemctl list-timers`

Logs (journalctl)

Task	Command
Service logs	`sudo journalctl -u <name>`
Last N lines	`sudo journalctl -u <name> -n 50`
Real-time	`sudo journalctl -u <name> -f`
Current boot	`sudo journalctl -u <name> -b`
Errors only	`sudo journalctl -u <name> -p err`
Since time	`sudo journalctl -u <name> --since "1h ago"`
Journal size	`sudo journalctl --disk-usage`
Cleanup	`sudo journalctl --vacuum-time=2weeks`

Performance Diagnostics

Task	Command
Boot time	`systemd-analyze`
Boot bottlenecks	`systemd-analyze blame`
Critical chain	`systemd-analyze critical-chain <target>`
Visual timeline	`systemd-analyze plot > boot.svg`
Validate unit file	`systemd-analyze verify /path/to/unit`
cgroup tree	`systemd-cgls`
cgroup resource usage	`systemd-cgtop`

Conclusion

Systemd is not a monster to be feared — it's a powerful tool that makes you dramatically more effective as a sysadmin or developer. Key takeaways from this guide:

Units are declarative resource descriptions. Write them properly and use available security directives.
cgroups mean systemd always knows where your processes are. Use this for monitoring and resource constraints.
journald is a database, not a text file. Learn to query it properly.
Drop-in files — never edit original package-installed unit files.
systemd-analyze — your first tool when diagnosing boot problems.

Sign In

Introduction: Why SysVinit Died and What Systemd Fixed

Part 1. Systemd Architecture — What's Under the Hood

1.1 PID 1 — The Ruler of All Processes

1.2 Key Components

1.3 D-Bus — The Communication Backbone

Part 2. Units — The Building Blocks of Systemd

2.1 Where Units Live

2.2 Unit Types

.service — Service Units (Most Common)

.socket — Socket-Based Activation (Lazy Launch)

.timer — Cron Replacement with Superpowers

.target — Unit Groups (Replacing Runlevels)

2.3 Unit Dependencies — A Graph, Not a Queue

Part 3. cgroups — Why Systemd Always Knows Your Processes

3.1 The Problem cgroups Solve

3.2 What cgroups Give You in Practice

3.3 Resource Limits via Unit Files

Part 4. journald — Logs as a Database

4.1 Why journald Beats Plain Text Logs

4.2 Complete journalctl Reference

Part 5. Real-World Scenarios

5.1 Creating a Production-Ready Service from Scratch

5.2 Drop-in Files — Override Without Touching Originals

5.3 Boot Time Analysis and Optimization

5.4 Diagnosing a Failing Service — Step by Step

Quick Reference Cheatsheet

Service Control

Viewing State

Logs (journalctl)

Performance Diagnostics

Conclusion

User Feedback

Create an account or sign in to leave a review

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)

`.service` — Service Units (Most Common)

`.socket` — Socket-Based Activation (Lazy Launch)

`.timer` — Cron Replacement with Superpowers

`.target` — Unit Groups (Replacing Runlevels)