Terraform
Dependency Graph
This topic explains the dependency graph Terraform builds from Terraform configurations. This is an advanced topic and not required to understand how to use Terraform.
Introduction
Terraform builds a dependency graph and uses it to perform operations, such as generate plans and refresh state. For background on graph theory and a summary of how Terraform applies it, refer the HashiCorp 2016 presentation Applying Graph Theory to Infrastructure as Code.
Graph Nodes
The following node types can exist within the graph:
Resource Node - Represents a single resource. If you have the
count
metaparameter set, then there will be one resource node for each count. The configuration, diff, state, etc. of the resource under change is attached to this node.Provider Configuration Node - Represents the time to fully configure a provider. This is when the provider configuration block is given to a provider, such as AWS security credentials.
Resource Meta-Node - Represents a group of resources, but does not represent any action on its own. This is done for convenience on dependencies and making a prettier graph. This node is only present for resources that have a
count
parameter greater than 1.
When visualizing a configuration with terraform graph
, you can
see all of these nodes present.
Building the Graph
Building the graph is done in a series of sequential steps:
Resources nodes are added based on the configuration. If a diff (plan) or state is present, that meta-data is attached to each resource node.
Resources are mapped to provisioners if they have any defined. This must be done after all resource nodes are created so resources with the same provisioner type can share the provisioner implementation.
Explicit dependencies from the
depends_on
meta-parameter are used to create edges between resources.If a state is present, any "orphan" resources are added to the graph. Orphan resources are any resources that are no longer present in the configuration but are present in the state file. Orphans never have any configuration associated with them, since the state file does not store configuration.
Resources are mapped to providers. Provider configuration nodes are created for these providers, and edges are created such that the resources depend on their respective provider being configured.
Interpolations are parsed in resource and provider configurations to determine dependencies. References to resource attributes are turned into dependencies from the resource with the interpolation to the resource being referenced.
Create a root node. The root node points to all resources and is created so there is a single root to the dependency graph. When traversing the graph, the root node is ignored.
If a diff is present, traverse all resource nodes and find resources that are being destroyed. These resource nodes are split into two: one node that destroys the resource and another that creates the resource (if it is being recreated). The reason the nodes must be split is because the destroy order is often different from the create order, and so they can't be represented by a single graph node.
Validate the graph has no cycles and has a single root.
Walking the Graph
To walk the graph, a standard depth-first traversal is done. Graph walking is done in parallel: a node is walked as soon as all of its dependencies are walked.
The amount of parallelism is limited using a semaphore to prevent too many
concurrent operations from overwhelming the resources of the machine running
Terraform. By default, up to 10 nodes in the graph will be processed
concurrently. This number can be set using the -parallelism
flag on the
plan, apply, and
destroy commands.
Setting -parallelism
is considered an advanced operation and should not be
necessary for normal usage of Terraform. It may be helpful in certain special
use cases or to help debug Terraform issues.
Note that some providers (AWS, for example), handle API rate limiting issues at
a lower level by implementing graceful backoff/retry in their respective API
clients. For this reason, Terraform does not use this parallelism
feature to
address API rate limits directly.