Most Terraform module libraries start with the best intentions. Someone writes a clean vpc module. Someone else writes rds. Six months later the vpc module has 47 input variables, three of them are deprecated-but-still-required-don't-ask, and nobody bumps the version because every upgrade is a risk. This is module rot — and it's almost entirely preventable.
Here are the patterns we've landed on after running shared module libraries across multiple client engagements, the ones that have actually held up under the pressure of dozens of consumers and years of drift.
The wrong-sized module problem
The two most common failure modes are too big and too small.
Too big is "the platform module" that provisions a VPC, an EKS cluster, an RDS instance, a Redis cluster, and an ALB in one apply. Every consumer needs only some of it but takes all of it. Every change is a high-blast-radius change. Plans take 4 minutes.
Too small is the aws-security-group module that's just a wrapper around the resource with the same inputs renamed. It adds no value, just an extra layer of indirection — and a place for bugs to hide between the wrapper and the underlying resource.
A module is the right size if (a) replacing it with raw resources would meaningfully increase boilerplate at the call site, AND (b) you can describe what it does in one sentence without using the word "and".
Inputs: the "sensible default, escape hatch" pattern
The single most useful module pattern we've adopted is what we call "sensible default, escape hatch". Every input variable ships with a sane default. Complex configuration objects accept a fallback override map for the cases the abstraction can't anticipate.
variable "tags" {
type = map(string)
default = {}
description = "Tags applied to all resources. Module adds standard tags automatically."
}
variable "extra_security_group_rules" {
type = list(object({
type = string
from_port = number
to_port = number
protocol = string
cidr_blocks = list(string)
}))
default = []
description = "Escape hatch for rules the module doesn't model natively."
}
The escape hatch is the difference between "consumers fork the module" and "consumers stay on the shared version." Forks are how libraries die. Every time you make a consumer choose between forking and waiting for a feature, you're slowly killing your own library.
Versioning: semver, enforced
Pin every module reference to a tag, never a branch. Use semantic versioning religiously, and treat any change to module inputs or outputs as a major version bump — even if it "shouldn't" break anyone. Consumers will surprise you.
- Conventional commits in the module repo (
feat:,fix:,feat!:) release-pleaseto auto-generate version bumps and changelogs from commit history- A monthly "stale module" CI job that opens PRs against every consumer repo on minor/patch bumps
module "vpc" {
source = "git::ssh://git@github.com/org/tf-modules.git//vpc?ref=v3.4.1"
# ^ pinned tag, not main, not v3
cidr_block = "10.42.0.0/16"
}
The auto-PR job is the unsung hero of this setup. Without it, consumers will sit on v1.2.0 for two years. With it, the cost of a minor bump is one approval click, and your library stays alive.
Validation in CI is non-negotiable
The fastest way to kill a module library's credibility is to ship a broken version. Every module repo in our library runs four checks on every PR:
terraform fmt -check+terraform validate— table stakestflintwith the AWS ruleset — catches deprecated arguments before they hit a plantfsecorcheckov— catches "you forgot to enable encryption" before it shipsterratest— actually provisions the module in a sandbox account and asserts on real resources
The terratest piece is the one most teams skip. Don't. The 12-minute round trip of "apply in a sandbox, assert, destroy" has caught more real bugs for us than every other check combined.
"Modules that aren't continuously applied in CI are documentation, not infrastructure code."
Docs as a contract
Every module repo has a generated README.md via terraform-docs on pre-commit. The README is the public API. If an input isn't in the README's Inputs table, consumers can't rely on it. If you change an input description, you've changed the contract — and the changelog should say so.
# .pre-commit-config.yaml
- repo: https://github.com/terraform-docs/terraform-docs
rev: v0.17.0
hooks:
- id: terraform-docs-go
args: ["markdown", "table", "--output-file", "README.md", "./"]
Bonus: tools like Backstage's TechDocs can ingest these READMEs directly to give you a searchable module catalogue with zero extra work.
The smell test
After three years, these are the signals that tell us a module is starting to rot:
- The README has a "deprecated but still required" note
- Consumers have started writing wrapper modules around it
- The last 3 PRs have been "increment patch version, nothing breaking" cleanup commits
- Nobody on the platform team can confidently explain what
enable_legacy_mode_v2actually does - You find yourself saying "yeah but don't pass that input, just leave it default"
When you see two or more of those, it's time to design a v2 — not patch around the v1. Ship the v2 in parallel, give consumers a deprecation window (we use 90 days), then delete the v1.
// the production checklist
- Module does one thing, describable in one sentence
- Every input has a default; complex inputs have an escape-hatch override
- Versioned with semver; every reference pinned to a tag
fmt,validate,tflint,tfsec,terratestrun on every PR- README auto-generated via
terraform-docspre-commit - Conventional commits +
release-pleasewired up - Monthly auto-PR job opens version-bump PRs in consumer repos
- A 90-day deprecation policy you actually enforce
Right-sized scope, sensible defaults with escape hatches, strict semver, terratest in CI, and treat the README as a public contract. Do those five things and your module library will scale past 30+ modules and 40+ engineers without becoming the next thing you have to rewrite.