A few times now in my "devops" career, I have, upon arrival to a new codebase, run into somewhat complex terraform modules that an author (or a small team) didn't have too much trouble getting initially written and stood up (once), but about which there is now some apprehension over ... venturing back into the forest.
In trying to tame intimidating terraform complexity, I have become something of an unorthodox (ar)ranger, a skeptic of the clean-signposted trail, and a believer in blazing trails over following the most common organizational patterns.
Join me on the road less traveled by, and perhaps that will make all the difference. :-)
The Danger of an Omniscient Tool
Terraform employs a paradigm of "declarative infrastructure as code". [1]
It is far purer in its declarative semantics than, say, most Ansible roles or Chef playbooks. Terraform HCL (or JSON) is a manifest language in which you declare resources with respect to (at best) only their immediate dependencies. The software is responsible for constructing (then walking, possibly pruning, and finally building provisioning execution plans from) a dependency graph
By design, terraform
the golang program, does not care where—within a single folder's collection of .tf
files— you declare any particular resource
, external input (variable
or data
), or output
. It does not care how many files there are or in what order they sort. It views the folder ("module") as essentially a single unordered data blob.
terraform
has, within the domain of a TF module, a third person omnicscient perspective with as much memory as the OS will provide. Sadly, we deficient human users are stuck merely in a first person POV, with a limited 7±2 item working memory.
Thus, it is critically important to humans, even as it is irrelevant to terraform
, that terraform HCL is written so as to provide locally comprehensible context. and I would/will argue the best way to deliver a consistent context is to provide a ~comprehensible narrative.
The Orthodox-Nominal (ON) Approach
While the terraform-writing community is definitely not blind to this concern, there is only vaguely any consensus as to how to address it: little advice is given, and what is suggested is, in my view, misguided.
Imagine you are newly hired Junior DevOps Engineer, Ev G. Strawmann
You are new to this whole terraform thing, but coming from the world of programming you assume there must be best practices to follow. Given little advice on organization by any docs, you turn to community examples. You find:
-
https://github.com/terraform-providers/terraform-provider-aws/tree/master/examples
and
-
https://github.com/hashicorp/terraform/tree/master/examples/cross-provider.
(Provided you catch that it exists,) you head over to https://registry.terraform.io/. You click on the top few modules and through to their GitHub repos. These all look rather the same. You can discern something of a pattern.
Looking at the patterns, you determine that for any given module, you should:
-
Put all the
variable
declarations in avariables.tf
orinputs.tf
-
Put all the
output
declarations in anoutputs.tf
-
Generally, throw everything else in a
main.tf
.If there's obviouly too much for one file, split things into a files which are named by either the common prefix/type of the resource they create:
rds.tf
,iam.tf
, etc.[2]or perhaps, (and I'd argue a little more usefully) files named for the primary target in the file (along with its similarly/identically named dependencies):
eks-cluster.tf
,main_vpc.tf
. -
If the amount of stuff in one folder gets real out of control, or you see obvious opportunities for multiple-instantiation (but... not with
count
, mind you), create submodules in folders or other repos.
In those folders, follow 1-3.
If instead or after, you google "how to organize terraform", you will get basically this advice from the first few results; amongst a few other tips.
Why It Seems Good
I, of course, don't know why various people ultimately choose to organize things this way, but I have some key guesses.
This approach optimizes for a few useful things:
-
Discoverability of interface. Especially if you are planning on embedding a module in another module, it lets you trivially answer some key questions; to wit:
-
What variables does the module take? look in
variables.tf
-
What are the module's outputs? look in
outputs.tf
-
-
More generally, it holds to the proposition "if I know what I'm looking for, I should know what file I will find it in". If everyting is in
main.tf
, well, it's in there. If it's a network thing, it's probably innetwork.tf
(or perhaps more likelyvpc.tf
for an AWS module).
Why It Isn't Good Enough (Sufficient)
The above are good and important properties, but I think they are insufficient.
I think that indeed they are fine and dandy and capture much, perhaps all that you need: for a steady state, for a one and done, for a published small example, for the kingdom perhaps of a single author.
But alas code is rarely ever done. And infrastructure as code is no exception. What the ON approach leaves lacking are several key questions for a growing codebase into which developers come and go. Some examples:
- As a developer new to the module, where should I start trying to understand?
- If I am adding something, where should I put it?
- If I remove something, what might I break?
- Where (is/should I check if) this variable (/is) used? Where does this resource come from (if not this file)?
It also provides no obvious approach for partial application (beyond terraform's -target
) nor hints at what can be easily refactored into another module.
Why It Isn't Even Really Good (Necessary)
In a word: search. In two letters: ag
[3]
It is true that it's super easy to track down all the variables by keeping them in variables.tf
; It is not much harder to track them down by running ag '^\s*variable'
. Ditto if you know what you are looking for and want to figure out what file to look in, search is a more efficient tool than reading all the file names and guessing.
Additionally, there is the element of documentation. If you look at https://registry.terraform.io/ or GitHub READMEs, you'll likely find inputs and outputs in tables. If you run terraform-docs
you'll get the same.
Looking for Other Options
Once I realized the shortcomings of the common patterns, I decided to try something else.
In trying to determine how I should organize things instead, I drew inspiration first from how people organize individual files (often main.tf
) and second, some guesses at a couple of the psychological factors that I think led to the present approach:
-
I haven't done a formal survey and I'm sure there are other approaches out there (e.g. alphabetical, grouped by owner, chronological by date of addition), but in general it seems like most people, influenced by a background in imperative programming and a natural human impulse, order any given file roughly in an order of causality or dependencies (or less often the exact opposite).
Though terraform doesn't care about even in-file ordering, people tend to go e.g. Virtual network, then subnets, then route table, then security groups; or
data "aws_ami"
then launch group then ASG; Template then file; private key then self-signed CA then CSR then cert. -
The commonality of huge
main.tf
s over many small files betrays a desire to have a single thing you can "read through", and an acknowledgment of the relative shortcoming of a pile of unordered (but not necessarily independent) files. -
Though I think the use of
variables.tf
andoutputs.tf
is primarily about trying to make interface discoverable, I think it also speaks to a desire to attempt to limit scope for immediate relevance.
Ordered File-Module Flow (OFF)
After some experimentation, I have arrived at a pattern I now follow (for terraform modules, as well as for helm charts and occasionally other similar things). It differs from the above (ON approach) substantially. I like to think of it as "writing the story/narrative of the infrastructure". It works as follows:
-
I reject the "top-level"
variables.tf
andoutputs.tf
files. (You may find files with similar names in some of my modules, for reasons I'll explain below, but in generalvariable
s andoutput
s are found alongside the resources they inform or derive from) -
I treat each file like a module. Each file contains a (logical group of) goal resource(s), preceeded by dependencies—notably including any
variable
s,locals
and possibly evenprovider
s solely relevant to it, and succeeded by anyoutput
s it produces.The rule of thumb on what should be the scope of a file is "things that make sense to delete at once" (or perhaps move to a different module at once, or to instantiate multiple times as an atomic unit)[4]
-
I want to make it so you can trivially get back to (reading) what would be a good (if overly-long) causality-ordered
main.tf
by "cat
ing the directory" (ls | xargs cat
[5]). So, I lexically order file names in dependency order.I'll admit I have tried to get away with just using alphabetical order to do this, but in practice, I end up using:
-
_init.tf
- this file contains widelyprovider
declarations, module level constant declarations (aslocals
), and possibly a widely useddata
source, or even select "top level"resource
(such as anazurerm_resource_group
orgoogle_project
). [6] -
Perhaps some other
_
prefixed files that should logically come first, are prelude or metadata. -
(Likely most controversially) numbered stages, e.g.
0-vpc.tf
,1-rds_subnets.tf
,1-rds_security_groups.tf
,2-rds_cluster.tf
,3-database.tf
.Note that "file modules" which are logically parallel (i.e. have the same dependencies and possibly shared offspring, but don't depend on one another) can have the same numbering.
The heuristic is that any point you should be able to delete all files from some point to the bottom and the earlier ordered files should all still execute cleanly.
-
The Value
The value of this approach is in answering the questions I said above that ON failed at.
-
As a developer new to the module, where should I start trying to understand?
At the beginning. Treat the directory listing as a table of contents and then read any "chapter" relevant to what you are working on
-
If I am adding something, where should I put it?
Where it falls in the logical order. Figure out what parts of the module it needs and then add it to the next logical stage. If it has no dependencies put it in a new
0-foo.tf
. If it needs everything that came before, find the largest nn-foo.tf
and make a(n+1)-bar.tf
-
If I remove something, what might I break?
Anything in the files with a greater n, most probably something in an n+1 file (but that failure likely propagates). But notably (if you've followed the patterns) nothing with an n lesser than or equal to the n of the file you are editing.
-
Where (is/should I check if) this variable (/is) used?
Rule of thumb: if it's declared in this file, it's used only in this file. For how to make that rule stick see the next section.
Where does this resource come from (if not this file)?
An earlier (lower n) file. Generally one of the ones at n -1
As a practical matter, it also becomes possible when executing a module for the first time, or setting up a duplicate for development or debugging, to build out only a first fraction of the module.
Say you have a module which sets up a managed kubernetes cluster and then (in higher n files) adds some cluster level services (e.g. cert-manager, cluster-autoscaler ; using helm_release
, kubernetes_stateful_set
or perhaps, k8sraw_yaml
resource), if you want to work on terraform which adds a new one, you could rm
all the others and just have it build out to the cluster level (then e.g. git restore
the other files before you commit)
terraform apply -target
is great when you only want to build one or two things, but having the ability to "move the goalpost" can also be powerful.
Dealing with Reused Variables and Multi-resource Outputs
To restate what I said above:
Rule of thumb: if [a variable is] declared in [a] file, it [should be] used only in [that] file
Terraform won't enforce this (though it would if you went the extra step, per footnote 4, of actually making every file a directory with a file(s) in it). But it really is what makes the file-module concept more manageable long term vs the main.tf
or type-separated files.
But alas, it is also not always realistic. If you want to keep your files reasonably small in scope, you may ultimately need to pass some variable(s) to more than one file. Examples of such things: a DNS suffix, a subnet CIDR or VPC, a common prefix for resources.
For these variables, but again not all variables, I recommend the usage of an _inputs_foo.tf
file, where foo
is some description of what they are (_inputs_subnets.tf
, _inputs_dnsname.tf
).
These files may also be an appropriate place to put data
sources like virtual image ids, dns zones, and especially if you are using it, terraform_remote_state
. There should not generally be resources in these files. If there are data sources (or against my advice resources) in these files, I recommend declaring a locals
block at the bottom of the file with "module local ouputs" of the specific fields you expect for resources to use. This makes them more like variables and slightly easier to trace the usage of than dotted data.foo.bar
syntax.
There is a similar issue with output
s (likely maps) which wrap up data from more than one file's worth of resources. If an output comes from resources in a single file, it should be in that file. If not, it should be in a _output_foo.tf
(or perhaps z_output_foo.tf
if you want them ordered at the end; you could also use a numbered file that succeeds the last needed resource.)
To refute the generally true expectation that if by the end of a file, nothing from a resource declared therein has been output
ed, it won't be elsewhere, I recommend adding a bottom of file local
s block, like I suggested for data providers above to suggest a "module local output" (in this case to be consumed by a module exiting output). You may also find it useful to use such blocks for relevant details passing from one stage to the next.
(Mitigating) the Downside of Numbering
It is true that if you insert a new stage anywhere but the end of a module, you will have to renumber your files. It may behoove you to commit this rename separate from further modifications to hint git and GitHub about the files history (though with enough changes it may be truncated anyway).
In practice, with practice, the middle of the story insertion of a scene or stage is less frequent than you might worry. Indeed the stages can take on something of a semantic connotation themselves:
0
is for preparation and "establishing the setting" (setting up networks, namespaces, etc),1
is the main character section - the thing we are really here for,2
and further either do some type of configuration or initialization of the things in1
or setup connections to the outside world.
(Of course this is not always true, I have, for example, a module with : 0-users.tf
, 1-iam-policies.tf
[7], 2-groups_and_memberships.tf
– though you could argue that the policy is the meat of that sandwich)
Following this soft convention, you can conceivably foresee reasons to not always start at 0
when creating a new module.
File-module support files
If a file-module uses external files with the file()
or templatefile()
helper, or indeed if a file declares a module
statement for a logical submodule, I generally recommend biting the bullet and prefixing those files or that directory with (something that sort the same as) the name of the file, e.g. 3-vault-user-policies.tf
and 3-vault-user-policy-admin.hcl.tpl
.
Other _no_resources.tf
files
I have also occasionally used some other non-numbered files I thought I'd quickly share:
-
_doc.tf
- a long form piece of developer documentation written as a block comment, perhaps with (still commented out) example code. There isn't an obvious reason that this is preferable to a README, but I have done it -
_constants.tf
a set oflocals
blocks (each one a logical grouping) declaring some stuff that you would make variables but never expect to be overridden from their default. These must have wide-ranging relevance to the module or they should just be declared at the top of the file that uses them. -
__globals.tf
Its something of a dirty secret of the Terraform world thatterraform
has no problem with following symlinks. Overuse of this facility will make you sad, but it can be helpful to have a defined set ofvariable
s orlocals
(constants or variable derived expressions), and maybe evendata
sources which are present and usable in all modules in a group, without explicit import and export (though if you use such variables on submodules, you will need to doa = var.a
,b = var.b
, etc on themodule "foo" {}
invocation).I have adopted a double-underscore (
__
) convention to signify "this file is symlinked and shared with other modules" – though I also put a comment at the top of the file warning people of that fact. -
__workspaces.tf
like above but with maps of values keyed by terraform workspace, interrelated into a set oflocals
definitions indexing into those maps with"${terraform.workspace}"
. -
__common_inputs.tf
- like globals, but variable definitions with values that actually vary from one module to another, but for which the descriptions and even perhaps defaults remain useful in common and can be symlinked through
Happy Trails!
Thus I have presented my heresy. It is a pattern that has served me well and about which I have ultimately heard few complaints from coworkers.
It has made complex terraform modules (and helm charts, but obviously .yaml
instead of .tf
) more approachable and invited more people to jump in and make changes in them.
It's not a perfect solution and I do sometimes depart from it (especially where you truly do find unordered sets of independent resources: a set of IAM policies, a submodule per application instance, etc.), but it is my go-to approach, and I suspect it will be for the foreseeable future.
I hope you find it useful as well. :-)
Indeed a terraform module is, to those who would draw such a distinction, not really infrastructure as (executed) code at all, but rather infrastructure as (paramaterized) data. (c.f. Pulumi, kinda Ballerina, Ecstasy, etc.) ↩︎
this focus on grouping by type and names is my excuse for why I call this approach "nominal"... beyond merely being dismissive and getting my acronyms to work 🧐 ↩︎
Or if you prefer,
rg
,pt
,gg
; but some directory global search. Perhaps "Find In Files/Project" in your IDE of choice. ↩︎If you were especially diligent or committed to the
{variables,main,outputs}.tf
pattern, you could follow OFF but substitute each file I'd use with a directory for a corresponding module, and then either a file for each, invoking it, or a single top level file that invokes them all. ↩︎Note that
_
has a lower ascii value thana
and so often is sorted prior to any files without a leading_
. This is the case in, for example, the sidebar of Visual Studio Code. It may however not sort that way on an actualls
, possibly depending on your terminal's locale settings. In addition to the sorting hint the_foo
convention is borrowing form e.g. the use of underscore names in python ↩︎init
is preferred oversetup
orproviders
for the largely aesthetic, but I'd argue a little useful, fact that it alphabetizes before_inputs
or_outputs
(which we'll come back to to) ↩︎This file is an example, however, of where I do still depart from my approach (or at least drop the numbering when each file-module is independent from every other). That files contents are just,
module "iam_policies" { source = "./1-iam-policies"}
where the1-iam-policies
submodule/directory contains one file per policy with adata "aws_iam_policy_document"
, perhaps some otherdata
s to fetch ARNs, and then aresource "aws_iam_policy"
. ↩︎
Header photo by Caleb Jones on Unsplash