The Platform Engineer's Guide to

Structuring Terraform and OpenTofu

This guide provides practical strategies and patterns for structuring Terraform and OpenTofu code, covering repository choices, folder layouts, environment and multi-region management, state splitting, advanced techniques, and success metrics to help teams build scalable and maintainable infrastructure as code.

Last Reviewed
May 14, 2025

1. Introduction: Why IaC Structure is Mission Critical

Infrastructure as Code (IaC) tools like Terraform and its open-source fork, OpenTofu, promise automation, consistency, and reliability in managing cloud and on-premises resources. (Note: For the purposes of structure and core functionality discussed in this guide, Terraform and OpenTofu can be considered largely interchangeable, stemming from a common heritage before Terraform's license change 1). However, realizing these benefits hinges critically on how the code itself is structured. Many teams begin their IaC journey by simply grouping .tf configuration files together until the infrastructure deploys successfully. While expedient initially, this often ad-hoc approach inevitably hits a scaling wall.

Poorly structured Terraform/OpenTofu code isn't just an aesthetic concern; it actively creates friction, introduces significant risk, and becomes a bottleneck to velocity. Teams find themselves grappling with deployment failures caused by subtle inconsistencies between environments, spending hours debugging trivial configuration differences, struggling to onboard new members due to opaque codebases, and facing performance degradation during plan and apply cycles as state files grow unwieldy.3 The initial focus on getting resources provisioned quickly often leads to technical debt, manifesting as operational pain points down the line. This can involve excessive copy-pasting of code blocks between environment configurations, a lack of reusable components, or monolithic state files managing disparate parts of the infrastructure.3

The core challenge lies in balancing the desire for reusable, Don't Repeat Yourself (DRY) code against the practical needs for clarity, isolation between environments or components, and managing inherent complexity.4 Over-abstraction can be as detrimental as no abstraction at all. Therefore, proactively choosing and evolving an appropriate structure is not premature optimization; it's fundamental architectural planning necessary to avoid predictable future problems and build a foundation for sustainable, scalable infrastructure management.

This guide provides a framework and practical patterns for structuring Terraform/OpenTofu code. It explores repository strategies, folder layouts, environment and multi-region management, scaling considerations, advanced techniques, and methods for measuring success. The goal is to equip developers, DevOps engineers, SREs, and technical leads with the knowledge to design, implement, and maintain reliable, scalable, and maintainable IaC across different organizational sizes and complexities.

2. The Building Blocks: Foundational Files, Naming, and Essential Modularity

Before tackling repository-level strategies or complex environment management, establishing foundational conventions is paramount. Consistency in file layout, naming, and the basic unit of reuse—the module—prevents downstream chaos and forms the bedrock of any scalable IaC structure. Skipping these basics undermines even the most sophisticated patterns.

Standard File Layout

A typical Terraform/OpenTofu root module or configuration directory benefits from a standard file organization. While not strictly enforced by the tool, adhering to conventions improves readability and maintainability.8 Common files include:

  • main.tf: Contains the primary set of resource definitions. For larger configurations, resources might be split into logical files like network.tf or instances.tf.8
  • variables.tf: Defines input variables for the configuration, including types, descriptions, and default values.10
  • outputs.tf: Declares output values from the resources created, making key information accessible to other configurations or users.10
  • versions.tf: Specifies required versions for Terraform/OpenTofu itself and necessary providers, ensuring consistent execution environments.6
  • providers.tf: Configures the providers (e.g., AWS, Azure, GCP) being used, including settings like region or credentials.9
  • terraform.tfvars: (Optional) Assigns values to input variables. Often used for environment-specific settings, though sensitive values should be managed securely, not committed here.9
  • .terraform/: Directory where Terraform/OpenTofu downloads provider plugins and modules during init. Managed by the tool.
  • .terraform.lock.hcl: Records the exact provider versions selected, ensuring consistent dependency resolution across runs and team members.10

Naming Conventions

Clear and consistent naming is crucial for reducing cognitive load, improving readability, and simplifying searching and refactoring.5 Key conventions include:

  • Resources and Data Sources: Use underscores (_) to separate words (e.g., google_compute_instance, web_server_firewall). Make resource names singular (e.g., google_compute_instance.web_server). If a resource is the primary one of its type in a module, consider naming it main for simplicity (e.g., aws_vpc.main).8
  • Variables: Use descriptive names reflecting usage or purpose. Include units for numeric values (e.g., ram_size_gb, disk_size_gb) as cloud provider APIs may not have standard units.8 Use binary prefixes (kibi, mebi, gibi) for storage and decimal prefixes (kilo, mega, giga) for other measurements, consistent with Google Cloud usage.8 Use positive names for boolean flags (e.g., enable_monitoring instead of disable_monitoring).8
  • Outputs: Name outputs descriptively based on the value they provide.
  • Files: Group related resources into logically named files (e.g., network.tf, database.tf) but avoid creating a separate file for every single resource.8

Introduction to Modules

Modules are the fundamental mechanism for code reuse and abstraction in Terraform/OpenTofu.6 A module is simply a collection of .tf files within a directory that defines a set of related resources intended to be used together.6 Using modules offers significant advantages:

  • Reusability (DRY): Define common infrastructure patterns once and reuse them across multiple environments or projects, reducing code duplication.5 Copy-pasting configuration blocks is a common anti-pattern that leads to inconsistencies and maintenance nightmares.4
  • Abstraction: Hide the complexity of underlying resources behind a simpler interface (the module's input variables and outputs).9
  • Standardization: Enforce organizational standards and best practices within modules, ensuring consistency in deployed infrastructure.6
  • Team Collaboration: Enable teams to share and consume infrastructure components easily, often via module registries or shared repositories.6

Basic Module Structure

Reusable modules follow a similar standard file structure as root modules 6:

modules/
└── my_module/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    ├── versions.tf
    └── README.md

The README.md is particularly important for documenting the module's purpose, inputs, outputs, and usage examples.

Module Design Principles (Initial)

Effective modules adhere to certain principles:

  • Focus: Modules should have a clear, defined purpose. Avoid creating monolithic modules that try to do too much.5
  • Avoid Thin Wrappers: Don't create modules that simply wrap a single resource type without adding significant value or abstraction. Use the resource type directly instead.9
  • Logical Grouping: Encapsulate resources that work together to provide a specific capability (e.g., a VPC with subnets and route tables, a database cluster).6
  • Parameterize Sparingly: Only expose variables for values that genuinely need to vary between instances or environments. Hardcode sensible defaults or organizational standards where possible.8 Adding variables later is easier than removing them.8
  • Expose Necessary Outputs: Define outputs for values that downstream resources or modules will need to consume.8

Mastering these foundational elements—consistent file structures, clear naming, and effective basic modularity—is the essential first step before tackling more complex repository organization, environment management, or scaling strategies.

3. Repository Showdown: Monorepo vs. Polyrepo for IaC

Once the foundational practices are in place, a critical architectural decision is how to organize IaC projects within version control repositories. The two dominant strategies are the monorepo and the polyrepo approach, each with significant implications for workflow, tooling, collaboration, and scalability.13 The choice between them often dictates subsequent investments needed to mitigate their respective weaknesses.

Definitions

  • Monorepo: A single version control repository containing the IaC for multiple projects, components, services, or environments. Most or all infrastructure code resides in one place.13
  • Polyrepo: Multiple version control repositories, typically with each repository containing the IaC for a single project, component, service, or perhaps a specific team's domain.13

Monorepo Deep Dive

In a monorepo setup, different infrastructure components or environment configurations live as distinct directories within the same repository.14

Pros:

  • Simplified Dependencies: Managing dependencies between different infrastructure components defined within the same repo is straightforward. Changes across components can often be coordinated in a single commit/PR.
  • Code Discovery & Refactoring: Easier to find code, understand relationships, and perform large-scale refactoring across the entire infrastructure codebase.13
  • Unified Tooling/CI: Potential to implement consistent linting, testing, and deployment pipelines across all infrastructure code.13
  • Collaboration: Encourages shared ownership and visibility, potentially breaking down team silos.13 Real-world examples like Google and Uber demonstrate monorepos scaling to large engineering organizations.13

Cons:

  • CI/CD Bottlenecks: Cloning large repositories and running plan/apply across many components can become very slow, significantly impacting pipeline times.13
  • Tooling Requirements: Requires sophisticated tooling for selective builds/tests (e.g., only run pipelines for changed components), smart caching, and potentially sparse checkouts to manage performance.13 Maintaining this tooling often requires dedicated effort.13
  • Blast Radius: A breaking change on the main branch can potentially impact a larger surface area, although state splitting mitigates infrastructure impact.
  • Access Control: Managing granular permissions (e.g., restricting who can modify production network code) can be more challenging than with separate repositories.
  • Hygiene: Requires strict branching strategies (like trunk-based development) and diligent dependency management to avoid chaos.13

Conceptual Structure Example:

terraform-monorepo/
├── modules/              # Shared, reusable modules
│   ├── vpc/
│   └── rds/
├── environments/         # Root modules per environment
│   ├── dev/
│   │   ├── networking/   # Component within dev
│   │   │   └── main.tf
│   │   └── app-db/
│   │       └── main.tf
│   ├── staging/
│   │   └──...
│   └── prod/
│       └──...
├── components/           # Alternative: Group by component first
│   ├── networking/
│   │   ├── dev/
│   │   │   └── main.tf
│   │   └── prod/
│   │       └── main.tf
│   └── database/
│       └──...
└── README.md

Polyrepo Deep Dive

In a polyrepo setup, different infrastructure components or services each reside in their own dedicated repository. Shared modules might live in another separate repository.14

Pros:

  • Clear Ownership: Boundaries are explicit; a specific team typically owns each repository.13
  • Independent Pipelines: Each repository can have its own tailored CI/CD pipeline, potentially leading to faster builds for individual components.13
  • Granular Access Control: Permissions can be easily managed on a per-repository basis.13
  • Team Autonomy: Teams have more freedom to choose tooling and evolve their infrastructure code independently.13

Cons:

  • Complex Dependency Management: Managing dependencies between repositories is challenging. Requires mechanisms like referencing shared modules via Git tags/branches or a module registry.14 Coordinating changes across multiple repositories can be difficult.
  • Code Duplication Risk: High potential for boilerplate code (e.g., CI/CD pipeline definitions, provider configurations) to be duplicated across repositories unless actively managed with templates or shared libraries.13
  • Discoverability Issues: Harder to get a holistic view of the entire infrastructure and understand inter-service dependencies.13
  • Tooling Overhead: Requires investment in cross-repository tooling for tasks like coordinated deployments, dependency updates, and observability. Often necessitates building a developer platform.13
  • Inconsistent Standards: Risk of diverging practices, module versions, and quality standards across different repositories and teams.

Conceptual Structure Example:

# Repository 1: Shared Modules
terraform-shared-modules/
└── modules/
    ├── vpc/
    └── rds/

# Repository 2: Networking Infrastructure
infra-networking/
├── dev/
│   └── main.tf  # Uses source = "git::../terraform-shared-modules//modules/vpc?ref=v1.0"
└── prod/
    └── main.tf

# Repository 3: Application Database
infra-app-database/
├── dev/
│   └── main.tf  # Uses source = "git::../terraform-shared-modules//modules/rds?ref=v1.2"
└── prod/
    └── main.tf

Key Tradeoffs and Decision Factors

The choice isn't about which is universally "better," but which set of tradeoffs aligns best with an organization's context.13 Consider:

  • Team Size & Structure: Smaller, highly collaborative teams might favor a monorepo's ease of internal dependencies. Larger organizations with distinct team boundaries might lean towards polyrepos for autonomy, provided they invest in managing cross-repo complexity.
  • Infrastructure Complexity: Highly interconnected systems might benefit from a monorepo's unified view, while loosely coupled microservices could map well to polyrepos.
  • Tooling Maturity: Organizations with strong CI/CD platforms and experience managing large codebases might handle monorepo scaling challenges. Those without may find the overhead of managing many polyrepo pipelines easier initially, but will eventually need cross-repo tooling.
  • Organizational Pain Points: Is the biggest current problem slow CI builds (favors polyrepo investigation) or managing dependencies and consistency (favors monorepo or better polyrepo tooling)? 13

Monorepo vs. Polyrepo Tradeoff Summary

Comparison of Monorepo and Polyrepo Structures
Criteria Monorepo Polyrepo
Dependency Management Simpler for internal dependencies; complex external dependency management. Complex for inter-repo dependencies; requires explicit sharing (Git, registry).
CI/CD Performance Can become slow; requires optimization (selective builds, caching).13 Faster individual builds; overhead managing many pipelines.13
Code Discoverability High visibility within the repo. Lower visibility; requires cross-repo search/tooling.13
Team Autonomy Lower; encourages shared standards. Higher; teams control their repo's evolution.13
Access Control More complex for granular control. Simpler; per-repository permissions.13
Tooling Overhead High for build optimization & hygiene at scale.13 High for cross-repo coordination, observability, dependency mgmt.13
Consistency Enforcement Easier to enforce globally via shared tooling/pipelines. Harder; requires deliberate effort across repos (templates, platforms).13

Ultimately, the decision forces subsequent technical and organizational adaptations. Choosing a monorepo necessitates tackling CI/CD scaling and maintaining hygiene. Choosing polyrepos demands investment in robust dependency management, discoverability tools, and consistency mechanisms.

4. Organizing Your Code: Practical Folder Structures

Regardless of whether a monorepo or polyrepo strategy is chosen, the internal arrangement of directories and files within a repository significantly impacts clarity, maintainability, and scalability. The goal is to create a logical structure that reflects the infrastructure's architecture and the team's workflow.

Common Patterns

Several common patterns exist for organizing Terraform/OpenTofu code within a repository:

Grouping by Environment: This simple approach uses top-level directories for each deployment environment, such as dev, staging, and prod.14

infra-live/
├── dev/
│   ├── main.tf
│   ├── terraform.tfvars
│   └── backend.tf  # Environment-specific backend config
├── staging/
│   └──...
└── prod/
    └──...
modules/
└──... # Shared modules referenced by environments

Pros: Provides clear isolation, easy to understand which code applies to which environment, allows for distinct backend configurations per environment.
Cons: Highly prone to code duplication if not strictly managed using shared modules and environment-specific .tfvars files. Directly copying code between environment folders is a significant anti-pattern leading to inconsistencies.4

Grouping by Component/Service: This pattern organizes code around logical parts of the application or infrastructure, such as networking, database, monitoring, app-frontend.7 Environment variations are typically handled within each component's configuration (e.g., via tfvars or workspaces).

infra-components/
├── networking/
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf # Outputs consumed by other components
├── database/
│   ├── main.tf
│   ├── variables.tf # Might take network IDs as input
│   └── outputs.tf
├── application/
│   └──...
└── environments/ # Configuration files mapping components to envs
    ├── dev.tfvars
    └── prod.tfvars

Pros: Promotes modularity, aligns well with microservice architectures and component ownership, reflects the application's structure. Tends to scale better conceptually as it avoids tight coupling based purely on resource type.7
Cons: Requires careful management of dependencies between components (e.g., the application needs outputs from the database and networking components). State splitting often aligns with this structure, necessitating mechanisms like terraform_remote_state or orchestration tools.

Grouping by Resource Type: Organizes directories by the type of cloud resource, like ec2, s3, rds, iam.15

infra-by-resource-type/
├── ec2/
│   └── main.tf
├── s3/
│   └── main.tf
├── rds/
│   └── main.tf
└── iam/
    └── main.tf

Pros: Simple to understand initially.
Cons: Often discouraged for larger setups as it doesn't reflect the application architecture, obscures dependencies, and can lead to overly large, tightly coupled state files managing unrelated application parts.7 Makes it hard to reason about the infrastructure supporting a specific application.

Hybrid Approaches: Combining patterns is common and often practical. A frequent example is organizing by environment first, then by component within each environment.14

infra-live/
├── dev/
│   ├── networking/
│   │   └── main.tf
│   ├── database/
│   │   └── main.tf
│   └── application/
│       └── main.tf
├── prod/
│   └──...
modules/
└──...

This provides environmental isolation while still structuring code logically within each environment.

Root vs. Reusable Modules Distinction

It's crucial to distinguish between root modules (the configurations directly applied by terraform apply, often representing an environment or component) and reusable modules (parameterized building blocks stored in a modules/ directory or separate repository and called by root modules).9 Reusable modules should be designed for general use, while root modules compose these building blocks for a specific deployment.

Supporting Files

Standard locations help organize auxiliary files 8:

  • scripts/: For any helper or custom scripts used during provisioning (use sparingly, as Terraform should manage state 8).
  • files/: For static files that need to be uploaded or used by resources (e.g., configuration files for instances).
  • templates/: For template files processed by Terraform's templatefile function or provider-specific templating.

Terragrunt Structures

Tools like Terragrunt often encourage a layered structure using include blocks and functions like find_in_parent_folders(). A common pattern involves defining common settings (like backend and provider configurations) in root-level terragrunt.hcl files and inheriting/overriding them in environment- or component-specific terragrunt.hcl files deeper in the directory tree.17 This promotes DRY configuration across many Terraform modules managed by Terragrunt. (More details in Section 7).

Ultimately, the most effective folder structure often mirrors the team's deployment model and ownership boundaries. While multiple patterns exist, structuring by application or component generally scales better conceptually than grouping purely by resource type, as it aligns infrastructure with the services it supports and facilitates more logical state management.7 However, this approach necessitates robust dependency management between components.

5. Environment Parity and Multi-Region Deployments

Managing infrastructure across distinct deployment environments (e.g., development, staging, production) and potentially multiple geographic regions presents significant challenges. The primary goals are to ensure consistency, minimize configuration drift, achieve isolation, and avoid duplicating code unnecessarily. Effective strategies focus on isolating state and using configuration mechanisms rather than code duplication to handle variations.

Environment Management Techniques

Two primary techniques are used for managing environment differences within Terraform/OpenTofu:

Terraform Workspaces:

  • Mechanism: Built-in feature allowing multiple state files for a single configuration directory. Commands like terraform workspace new <name>, terraform workspace select <name>, and terraform workspace list manage these isolated states.19
  • Use Case: Ideal when environments are structurally identical or very similar, with differences primarily handled by input variables.19 The terraform.workspace interpolation sequence can be used within the code to introduce minor variations based on the selected workspace (e.g., different instance counts or CIDR blocks).19
  • Limitations: All workspaces within a configuration directory must share the same backend configuration (e.g., the same S3 bucket for state). This makes them unsuitable if environments require fundamentally different backend setups (e.g., different AWS accounts or storage locations).3 They are not designed for managing significant structural differences between environments.

Example (Conceptual Variable):

variable "instance_count" {
  type = map(number)
  default = {
    default = 1 # Corresponds to the 'default' workspace
    dev     = 2
    prod    = 10
  }
}

resource "aws_instance" "app" {
  # Use lookup with terraform.workspace to get the count
  count = lookup(var.instance_count, terraform.workspace, var.instance_count.default)
  #... other configuration
}

Directory-Based Separation:

  • Mechanism: Using separate directories for each environment's root module configuration, as discussed in Section 4.14 Each directory has its own main.tf, backend.tf, etc. Shared logic is encapsulated in reusable modules called by each environment's configuration. Environment-specific variable values are typically passed via -var-file=env/<env_name>.tfvars during plan/apply.4
  • Use Case: Provides maximum isolation, allowing completely different backend configurations, provider settings, and even different module compositions per environment. Suitable for complex scenarios with significant divergence between environments.
  • Limitations: Requires discipline to keep code DRY by maximizing the use of shared modules. There's a risk of duplicating logic outside of modules if not carefully managed.

Choosing the Right Approach: Workspaces offer simplicity for near-identical environments managed by the same configuration code. Directory-based separation provides greater flexibility and isolation, essential when backend configurations or core infrastructure components differ significantly. Often, teams use a hybrid approach, perhaps using directories for major environment boundaries (dev/prod) and potentially workspaces within those for variations like feature branches.

State Management for Isolation

Regardless of the technique used, robust state management is critical:

  • Separate State Files: Each distinct environment and region should have its own isolated state file. This minimizes the "blast radius" – an error affecting one environment's state won't impact others.20
  • Remote Backends: Store state files remotely (e.g., AWS S3, Azure Blob Storage, Google Cloud Storage) rather than locally. This enables collaboration and is essential for CI/CD automation.5
  • State Locking: Use backend features that support state locking (e.g., DynamoDB for S3, Azure Blob leases) to prevent concurrent operations from corrupting the state file.20
  • Logical Backend Keys: Structure the path (key) within the remote backend logically to reflect the environment, region, and component, making state files easy to locate and manage (e.g., env:/<environment>/<region>/<component>/terraform.tfstate).20
  • Versioning: Enable versioning on the remote backend storage (e.g., S3 bucket versioning) to allow rollback in case of state corruption.20

Multi-Region Strategies

Deploying infrastructure across multiple geographic regions adds another layer of complexity:

Provider Aliases: This is the core Terraform/OpenTofu mechanism for targeting multiple regions within a single configuration. Define multiple provider blocks for the same provider type (e.g., aws), each with a unique alias attribute and the specific region configured.22

provider "aws" {
  region = "us-east-1" # Default provider
}

provider "aws" {
  alias  = "eu-west-1"
  region = "eu-west-1"
}

resource "aws_instance" "server_us" {
  # provider implicitly uses the default (us-east-1)
  ami           = "ami-..."
  instance_type = "t3.micro"
}

resource "aws_instance" "server_eu" {
  provider      = aws.eu-west-1 # Explicitly use the aliased provider
  ami           = "ami-..."
  instance_type = "t3.micro"
}

Structuring for Regional Differences:

  • Pass region-specific parameters (like AMIs, instance types, availability zones) into modules via input variables.
  • Use conditional logic (count or for_each meta-arguments) based on region variables to selectively create resources.
  • If regional differences are substantial, consider creating separate region-specific modules or even separate root module configurations per region, managed via directory separation.
  • Leverage data sources to look up region-specific information (e.g., finding the latest AMI in each region).

State Isolation: Reinforce the need for separate state files per region (and often per environment within each region) to maintain isolation.21 The backend key structure should reflect this (e.g., prod/us-east-1/network/terraform.tfstate).

Global and Replicated Services: Account for services that are inherently global (like AWS IAM, Route 53) or offer cross-region capabilities (like S3 Cross-Region Replication, DynamoDB Global Tables).21 These often require special handling in the IaC structure, potentially managed in a dedicated "global" configuration or region.

The common thread in managing both environments and regions effectively is the principle of isolating state and parameterizing configurations. Variations should primarily be handled through input variables and conditional logic within well-structured modules, minimizing the need to duplicate the core infrastructure code itself.

Environment/Region Management Techniques Comparison

Comparison of Terraform Workspaces and Directory + tfvars
Criteria Terraform Workspaces Directory + tfvars
State Isolation High (separate state files per workspace).19 High (separate state files per directory/backend config).
Backend Config Flexibility Low (shared backend config per directory).3 High (each directory can have unique backend config).
Code Duplication Risk Low (single codebase). Medium (requires discipline to use shared modules).4
Ease of Setup High (built-in commands).19 Medium (requires setting up directory structure).
Handling Major Differences Poor (not designed for structural divergence).16 Good (allows different modules/resources per env).
Use Case Near-identical envs, variable-driven changes.19 Divergent envs, different backends/structures needed.6

6. Scaling Your Structure: From Startup Roots to Enterprise Branches

Infrastructure as Code structures are not static; they must evolve alongside the organization, team size, and infrastructure complexity. What works for a small startup deploying a handful of resources will inevitably buckle under the weight of an enterprise managing thousands of resources across multiple teams, accounts, and regions. Understanding this evolution, particularly concerning state management, is key to maintaining velocity and stability.

Typical Evolutionary Stages

Startup Phase: Often begins with a simple structure, potentially a single monorepo, minimal modularity, and perhaps environment folders or workspaces for basic separation.10 A single state file might manage everything initially. The focus is on rapid iteration, and the inherent limitations are usually manageable due to the small scale.

Growth Phase: As the infrastructure expands and the team grows, pain points emerge. The single state file (if used) becomes a bottleneck. terraform plan and apply times increase significantly. State locking contention becomes frequent as more developers or CI/CD pipelines attempt simultaneous operations. Coordinating changes becomes difficult, and the risk of unintended consequences (blast radius) from a single apply increases.3 Inconsistencies between environments may creep in due to inadequate structure or discipline.

Enterprise Phase: Characterized by large, complex, often multi-account and multi-region infrastructure. Multiple teams contribute to IaC, necessitating clear ownership boundaries, robust modularity (often via internal module registries), strong governance, and extensive automation. Advanced tooling like Terragrunt or custom orchestration platforms often becomes necessary to manage complexity, enforce standards, and handle dependencies across numerous state files.9

The State File Bottleneck

The Terraform/OpenTofu state file is central to its operation, mapping declared resources to real-world infrastructure.11 However, as infrastructure grows, large, monolithic state files become a primary scaling bottleneck 3:

Performance Degradation: Before plan or apply, Terraform typically refreshes the state by querying the cloud provider APIs for the current status of managed resources. With thousands of resources, this refresh operation can take many minutes, significantly slowing down local development cycles and CI/CD pipelines.3 Cloud provider API rate limits can also become a factor.23

Increased Blast Radius: A single large state file means a single apply operation potentially touches a vast number of resources. An error during the apply, a misconfiguration, or state file corruption can have widespread consequences.3

Locking Contention: Remote backends use locking mechanisms to prevent simultaneous writes to the state file.20 With a single state file and many concurrent users or pipelines, contention for this lock increases, leading to delays.

Review Complexity: The output of terraform plan against a large state file can be enormous, making it difficult and time-consuming for reviewers to validate the intended changes accurately.3

State Splitting Strategies

The necessary solution to the state file bottleneck is to break down large state files into smaller, more manageable units.3 This aligns with the principle of reducing the blast radius. Common splitting strategies include:

By Environment: Maintaining separate state files for dev, staging, prod, etc. (as discussed in Section 5). This is a fundamental level of isolation.

By Region: Using distinct state files for each geographic region being managed (also discussed in Section 5).

By Component/Stack: Creating separate state files for logically distinct parts of the infrastructure, even within the same environment and region. Examples include:

  • A state file for core networking (VPC, subnets, gateways).
  • A state file for shared security infrastructure (IAM roles, security groups).
  • A state file for a specific application's database cluster.
  • A state file for the application's compute resources. This approach, often referred to as a Multi-Unit or Stack topology 3, offers the most granularity and aligns well with component-based folder structures. It significantly reduces the blast radius and plan/apply times for individual components.

Managing Dependencies Between Split States

Splitting state introduces a new challenge: managing dependencies between these independent units.3 For instance, an application component's state needs information (like subnet IDs or database endpoints) from the networking or database state files.

terraform_remote_state Data Source: Terraform provides the terraform_remote_state data source to read output values from other state files stored in a remote backend.20 This allows one configuration to consume outputs from another.

data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "my-terraform-state"
    key    = "prod/us-east-1/network/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "app" {
  # Use output from the network state
  subnet_id = data.terraform_remote_state.network.outputs.private_subnet_ids
  #...
}

While functional, relying heavily on terraform_remote_state creates explicit coupling between configurations and can sometimes make dependency chains harder to visualize and manage, especially at scale.

The evolution from monolithic to strategically split state is a natural and necessary progression for scaling Terraform/OpenTofu. This shift in state management strategy directly drives changes in repository structure (favoring component-based layouts) and often leads teams towards adopting more advanced orchestration tools designed to handle the resulting inter-state dependencies more elegantly.

7. Advanced Patterns: Composition, Dependencies, and Orchestration (Terragrunt)

As infrastructure complexity grows and state files are split, basic module usage may not be sufficient. Advanced patterns emerge to manage dependencies more effectively, compose larger systems from smaller blocks, and keep configurations DRY across numerous deployments. Terragrunt is a popular tool that facilitates many of these patterns.

Module Composition and Dependency Inversion

Instead of creating large modules that internally provision all their dependencies (e.g., a module that creates its own VPC, subnets, and the application instances within them), a more flexible approach is module composition.24 This involves:

  • Creating smaller, focused modules (e.g., a VPC module, a database module, an application module).
  • Having the calling configuration (the root module) provision instances of these modules and explicitly pass dependencies between them via input variables and outputs. For example, the root module creates the VPC using the VPC module, then passes the resulting subnet IDs as input variables to the application module.

This pattern, related to the software engineering principle of Dependency Inversion, makes modules more reusable and testable, as their dependencies are injected rather than internally managed.24 It allows assembling the same building blocks in different ways to create different systems.

Data-Only Modules

A specific type of module composition involves creating "data-only" modules.24 These modules contain no resource blocks but consist primarily of data sources designed to fetch information, often performing complex lookups or calculations. This abstracts away the logic of retrieving specific data (e.g., finding the latest approved AMI based on certain tags) into a reusable component.

Introduction to Terragrunt

Terragrunt is a thin wrapper around Terraform/OpenTofu designed to address common pain points encountered at scale, particularly with managing multiple modules and split state files.25 It doesn't replace Terraform/OpenTofu's core provisioning logic but acts as an orchestrator. Key reasons for using Terragrunt include:

DRY Configuration: Keeping backend, provider, and other common configurations DRY across many Terraform modules, avoiding repetition in numerous backend.tf or providers.tf files.17

Dependency Management: Providing a structured way to define and manage dependencies between different Terraform modules/state files, automatically fetching outputs from dependencies.26

Orchestration: Enabling commands like terragrunt run-all plan or terragrunt run-all apply to execute Terraform commands across multiple modules in the correct dependency order.

Promoting Reusable Patterns: Facilitating the definition and deployment of complex, multi-module infrastructure patterns consistently across environments or regions.25

Key Terragrunt Concepts & Patterns

Terragrunt uses HCL (like Terraform/OpenTofu) for its configuration files, typically named terragrunt.hcl.

include Block: A core feature for DRY configuration. Allows one terragrunt.hcl file to inherit configurations from another, often using find_in_parent_folders() to locate common configuration files higher up the directory tree.17 This is commonly used to define backend and provider settings once in a root file and include it in all child modules.

# /live/terragrunt.hcl (Root common config)
remote_state {
  backend = "s3"
  config = {
    bucket = "my-company-tfstate"
    key    = "${path_relative_to_include()}/terraform.tfstate"
    region = "us-east-1"
    #... other common backend settings
  }
}

# /live/prod/app/terragrunt.hcl (Leaf module config)
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::github.com/my-company/modules.git//app?ref=v1.0.0"
}

inputs = {
  instance_count = 5
  #... other app-specific inputs
}

dependency Block: Defines dependencies on other Terragrunt modules.26 Terragrunt ensures dependencies are applied first and makes their outputs available via the dependency object.

# /live/prod/app/terragrunt.hcl (App depends on VPC)
include "root" {
  path = find_in_parent_folders()
}

dependency "vpc" {
  config_path = "../vpc" # Path to the VPC's terragrunt.hcl directory
  # Terragrunt will run 'output' on the VPC module
  # Mock outputs can be provided for testing/planning [26]
}

terraform {
  source = "git::github.com/my-company/modules.git//app?ref=v1.0.0"
}

inputs = {
  vpc_id          = dependency.vpc.outputs.vpc_id
  private_subnets = dependency.vpc.outputs.private_subnet_ids
  #... other inputs
}

inputs Block: Used to pass input variables to the underlying Terraform module. Can merge inputs from multiple sources, including dependency outputs.

Terragrunt "Stacks": A higher-level concept, often implemented using Terragrunt's configuration features or dedicated files (terragrunt.stack.hcl in newer experimental features), allows defining a collection of related Terragrunt modules (units) that should be deployed together as a single entity.3 This simplifies replicating entire environments (e.g., stamping out dev, staging, prod stacks) or onboarding new customers in multi-tenant scenarios.

Tradeoffs of Terragrunt

While powerful, Terragrunt introduces an additional layer of abstraction. Teams need to learn both Terraform/OpenTofu and Terragrunt concepts. Debugging can sometimes be slightly more complex, requiring understanding of how Terragrunt processes its configuration and calls the underlying Terraform/OpenTofu binary. However, for managing IaC at significant scale with split state and complex dependencies, the benefits in terms of DRY configuration and dependency orchestration often outweigh the added learning curve. It directly addresses the challenges that arise when scaling beyond simple Terraform/OpenTofu setups.17

8. In the Trenches: Addressing Developer Pain Points

As Terraform/OpenTofu structures become more complex with modules, multiple environments, split state, and potentially orchestration tools like Terragrunt, developers inevitably encounter challenging issues. Understanding common pain points and effective debugging strategies is crucial for maintaining productivity and stability. Many common errors actually stem from neglecting the foundational best practices discussed earlier.

Debugging Strategies

Navigating complex IaC requires systematic debugging:

Error Message Analysis: Carefully read and analyze the error messages provided by Terraform/OpenTofu or the cloud provider. They often contain specific clues about the resource, attribute, or configuration issue.5 Identify the core problem – syntax error, provider authentication failure, invalid input, dependency cycle, state mismatch, etc.

Leverage Logging: Terraform/OpenTofu provides detailed logging via the TF_LOG environment variable. Setting TF_LOG=DEBUG or TF_LOG=TRACE produces verbose output detailing API calls, state operations, and internal decision-making, which is invaluable for understanding execution flow and pinpointing failures.28 Logs should be captured in CI/CD pipelines for post-mortem analysis.

State Inspection: Directly examine the state file to understand what Terraform/OpenTofu currently manages and compare it to the configuration and reality.

  • terraform state list: Shows all resources tracked in the current state file.20
  • terraform state show <resource_address>: Displays detailed attributes of a specific resource instance in the state.20
  • terraform state pull: Downloads the remote state file locally for offline inspection (use cautiously, don't modify manually unless absolutely necessary).28

terraform plan Analysis: The plan command is a primary debugging tool. Scrutinize the proposed changes to ensure they match intentions.28 Unexpected additions, deletions, or modifications often indicate configuration errors or state drift.

  • Use terraform plan -refresh=false cautiously during debugging to skip the potentially slow state refresh step, but be aware this might mask issues related to drift.23
  • Use terraform plan -target=<resource_address> sparingly to focus the plan on specific resources when isolating a problem. Over-reliance on -target in regular workflows is an anti-pattern, as it ignores dependencies.15

Isolating Issues: Systematically narrow down the problem scope. Comment out recently added resources or modules. Use -target temporarily to test specific parts of the configuration. Apply changes incrementally.

Provider-Specific Debugging: For issues potentially within a provider itself, advanced techniques involve using Terraform CLI development overrides to substitute the official provider binary with a local build, allowing step-through debugging with tools like Delve.28

Common Pitfalls and Avoidance

Many recurring problems can be avoided by adhering to best practices:

Ignoring Modules/Excessive Duplication: Leads to inconsistencies, maintenance burdens, and errors when changes aren't replicated correctly.4 Solution: Embrace modularity early.6

Not Pinning Provider Versions: Unpinned versions (version = ">= 3.0.0") can lead to unexpected behavior or breaking changes when providers update automatically. Solution: Use pessimistic constraints (version = "~> 3.74.0") or exact versions (version = "= 3.74.0") in versions.tf or module requirements to control updates deliberately.5

Poor Resource Dependencies: Relying solely on implicit dependencies can sometimes lead to race conditions or incorrect ordering. Solution: Use depends_on explicitly when Terraform/OpenTofu cannot automatically infer the correct order, but use it judiciously as it can obscure the dependency graph.12

Lack of Environment Isolation: Allowing configurations or state from one environment (e.g., dev) to affect another (e.g., prod) is dangerous. Solution: Use separate state files, distinct backend configurations (often via directory separation), and clear variable scoping.5

Monolithic Files/Inconsistent Structure: Large, disorganized .tf files are hard to read, maintain, and debug.5 Solution: Follow standard file layouts and group resources logically.5

Manual State Manipulation: Commands like terraform state rm should be used with extreme caution, typically only as a last resort during complex refactoring or disaster recovery, as they can easily lead to orphaned resources or incorrect state.20

Improving Performance

Slow plan and apply times are a major developer pain point, especially at scale:

Split State Files: As discussed extensively, this is the most effective way to reduce the scope of refresh and apply operations.3

Conditional Refresh: Use terraform plan/apply -refresh=false in CI/CD pipelines if infrastructure drift is acceptable or managed through other means (e.g., periodic full refreshes or dedicated drift detection tools). This bypasses the potentially lengthy refresh step.23

Targeting (Use Cautiously): While -target speeds up iteration during development or debugging by limiting the scope, it should generally be avoided in automated production deployments as it ignores dependencies and can lead to inconsistent states.15

Optimize Complex Logic: Highly complex for_each loops, locals involving many function calls, or deeply nested module calls can sometimes contribute to plan generation time. Simplify logic where possible.8

Debugging complexity increases proportionally with structural complexity. Mastering Terraform/OpenTofu's internals (state, dependency graph), leveraging logging and state inspection tools, and employing systematic isolation techniques are essential skills. Crucially, adhering to foundational best practices acts as a powerful form of preventative maintenance, avoiding many common errors before they occur.

9. Knowing You're Winning: Measuring IaC Success

How can teams objectively determine if their chosen Terraform/OpenTofu structure and associated practices are effective? Moving beyond subjective feelings ("it feels cleaner") requires quantitative measurement. Applying metrics, particularly those from the DORA (DevOps Research and Assessment) framework, provides valuable insights into the efficiency, stability, and overall performance of IaC workflows, helping to validate structural choices and guide continuous improvement.29

Why Measure IaC?

Measuring IaC practices helps teams:

  • Understand Impact: Quantify whether structural changes (e.g., adopting Terragrunt, splitting state) actually improve delivery speed or stability.
  • Identify Bottlenecks: Pinpoint areas for improvement, such as slow CI/CD stages (plan/apply times) or high failure rates in specific components.
  • Justify Investments: Provide data to support investments in tooling, refactoring efforts, or platform engineering resources.
  • Align with Business Goals: Connect infrastructure delivery performance to broader objectives like time-to-market or system reliability.29

DORA Metrics for Infrastructure as Code

The four key DORA metrics, traditionally applied to application software delivery, are highly relevant to IaC 29:

Deployment Frequency (DF):

  • IaC Context: How often are infrastructure changes successfully deployed to production (or other key target environments)?
  • Interpretation: Higher frequency often indicates smaller, incremental changes, which are generally less risky. It reflects agility and is enabled by efficient pipelines, good modularity, and confidence in the process. Elite teams deploy multiple times per day; lower performers might deploy monthly.29
  • Influence of Structure: Good modularity, state splitting, and automation facilitate smaller, more frequent deployments. Monolithic structures often lead to larger, less frequent, and riskier changes.

Lead Time for Changes (LTTC):

  • IaC Context: How long does it take from committing an IaC change (e.g., to main branch) to that change being successfully deployed in production?
  • Interpretation: Measures the end-to-end efficiency of the delivery pipeline, including code review, testing, and plan/apply execution times. Shorter lead times indicate streamlined processes. Elite teams measure lead time in hours; lower performers may take weeks.29
  • Influence of Structure: Slow plan/apply times due to large state files directly increase lead time.3 Complex review processes caused by unclear structures also contribute. Efficient CI/CD pipelines enabled by good structure reduce lead time.

Change Failure Rate (CFR):

  • IaC Context: What percentage of infrastructure deployments result in a failure requiring remediation (e.g., rollback, hotfix, incident)?
  • Interpretation: Measures the stability and reliability of the deployment process. Lower rates indicate higher quality and less disruption.
  • Influence of Structure: Poor isolation between environments 5, inconsistencies from code duplication 4, or large blast radius from monolithic state 3 can increase CFR. Good testing, modularity, clear state separation, and automated validation help lower CFR.30

Mean Time to Recovery (MTTR):

  • IaC Context: How long does it typically take to restore service after a deployment-induced failure?
  • Interpretation: Measures the resilience of the system and the effectiveness of recovery processes. Faster recovery minimizes user impact. Elite teams recover in minutes or hours; lower performers may take days.29
  • Influence of Structure: IaC inherently supports faster recovery via rollbacks (re-applying a previous configuration version) or roll-forwards (quickly applying a fix). Clear structure, state isolation, and good monitoring facilitate rapid diagnosis and recovery.30

How to Measure

Implementing DORA metrics for IaC involves:

  • Defining Terms: Clearly define what constitutes a "deployment" (e.g., a successful apply to production) and a "failure" (e.g., an incident requiring rollback).
  • Tooling Integration: Leverage data from CI/CD platforms (pipeline logs, execution times), version control systems (commit history, PR merge times), and potentially monitoring/incident management systems.29
  • Automation: Automate data collection and visualization wherever possible to ensure consistency and reduce manual effort.29
  • Baselines: Establish baseline measurements before making significant structural changes to track improvement over time.

Other Indicators of Success

Beyond DORA, other indicators provide valuable context:

  • Plan/Apply Time Trends: Monitor the execution time of key Terraform/OpenTofu jobs. Consistently increasing times may signal state bloat or inefficient code requiring refactoring.3
  • Ease of Onboarding: Subjective but important: how quickly can a new team member understand the IaC structure and confidently make a safe, simple change?.4
  • Environment Consistency / Drift: Use drift detection tools or regular checks to measure how often the actual infrastructure state deviates from the code-defined state. Lower drift indicates effective automation and structure.30
  • Code Review Velocity: Are IaC pull requests reviewed and merged efficiently, or do they get stuck due to complexity or lack of clarity?

Applying quantitative metrics like DORA elevates the conversation about IaC structure from subjective preference to objective performance analysis. It provides data-driven validation for architectural choices and highlights areas needing improvement, ultimately connecting infrastructure practices to tangible delivery outcomes.

10. Choosing Your Path: A Decision Framework

Selecting the "right" Terraform/OpenTofu structure is not about finding a single perfect solution, but rather about making informed tradeoffs based on context. The optimal structure balances simplicity, scalability, isolation, and maintainability against the specific needs, constraints, and maturity of the organization, team, and infrastructure.

Recap Key Decision Factors

Several factors heavily influence the most appropriate structural choices:

Team Size & Structure: A small, co-located team can often manage simpler structures or monorepos effectively due to high communication bandwidth. Large, distributed organizations or multiple teams contributing to the same infrastructure often require stricter boundaries, clear ownership, and potentially polyrepos or well-defined components within a monorepo.4 Skill levels also matter; simpler structures might be better initially for less experienced teams.4

Infrastructure Complexity & Scale: Managing a few dozen resources is vastly different from managing thousands across multiple cloud accounts and regions. Scale drives the need for modularity, state splitting, and potentially advanced orchestration.3

Rate of Change: Infrastructure that changes infrequently might tolerate less optimized structures, whereas rapidly evolving systems demand efficiency and safety nets provided by robust modularity and automation.

Existing Tooling & Expertise: Mature CI/CD pipelines, existing monitoring solutions, and team familiarity with Terraform/OpenTofu, Git patterns, and potentially Terragrunt influence feasibility.13 Adopting a structure requiring tooling the organization doesn't have or expertise it lacks is risky.

Organizational Culture: Highly collaborative environments might thrive with monorepos, while siloed organizations might map more naturally (though perhaps not optimally) to polyrepos.13

Compliance & Security Requirements: The need for strict isolation between environments or granular access control over specific infrastructure components (e.g., security infrastructure) might favor directory-based separation or polyrepos.13

Guiding Questions for Self-Assessment

To guide the decision-making process, teams should ask:

  • What are the most significant pain points being experienced today with the current structure (or lack thereof)? Focus on solving real problems, not just theoretical ideals.13
  • Where are the biggest bottlenecks in the IaC workflow (e.g., CI/CD pipeline speed, code review time, deployment failures)?
  • What is the relative importance of strict environment/component isolation versus maximizing code reuse (DRY)?
  • Is the team/organization prepared to invest the necessary effort and resources in tooling and process changes required to support the chosen structure (e.g., optimizing monorepo builds, building polyrepo observability, learning Terragrunt)?
  • What does the ideal, frictionless workflow for proposing, reviewing, and deploying an infrastructure change look like in this context?

Scenario-Based Recommendations (Illustrative)

While every situation is unique, some general starting points can be considered:

Small Startup (Greenfield Project):

  • Recommendation: Start simple. A single monorepo is likely sufficient. Focus on foundational best practices: clear naming, basic file structure, remote state backend from day one. Introduce reusable modules early for repeated patterns (e.g., web server setup). Use environment folders with .tfvars for separation, or workspaces if environments are truly identical. Avoid premature complexity like Terragrunt initially.
  • Rationale: Prioritize speed and simplicity while establishing good habits. The scale doesn't yet warrant complex orchestration or state splitting.

Mid-Size Company (Experiencing Growing Pains):

  • Recommendation: Critically evaluate monorepo vs. polyrepo based on CI/CD performance bottlenecks and team collaboration patterns.13 Invest heavily in creating a library of well-documented, reusable internal modules. Aggressively split monolithic state files by logical component and environment/region.3 If managing DRY configuration (backends, providers) and dependencies across split states becomes a major headache, seriously evaluate adopting Terragrunt.25 Implement automated testing and linting in CI.
  • Rationale: Scale is likely causing friction with simpler structures. Addressing state management and modularity is key. The repo structure choice depends on which scaling challenge (CI vs. dependency management) is more pressing or easier to solve with available resources.

Large Enterprise (Complex, Multi-Team Environment):

  • Recommendation: A hybrid approach is probable. Polyrepos might be used for distinct business units, security domains, or highly autonomous teams requiring strict boundaries.13 Within those boundaries, monorepos might be used for closely related components. Expect a strong need for internal Terraform/OpenTofu module registries, dedicated platform/infrastructure teams providing shared modules and tooling, and robust automation. Terragrunt or custom orchestration solutions are likely necessary for managing complexity, dependencies, and enforcing standards across hundreds or thousands of modules/state files.9 Governance, policy-as-code (e.g., OPA), and drift detection are critical.30
  • Rationale: Extreme scale and organizational complexity demand sophisticated solutions focusing on standardization, governance, clear ownership, and managing dependencies across highly fragmented state and code.

There is no single "correct" answer. The process involves understanding the tradeoffs inherent in each structural pattern, assessing the organization's specific context and constraints, and choosing the approach that best aligns with current needs and future scalability goals. The structure should serve the team and the infrastructure lifecycle, not the other way around.

11. Conclusion: Key Principles for Sustainable IaC Structure

Structuring Terraform and OpenTofu code effectively is not a one-time task but an ongoing architectural discipline crucial for building and maintaining reliable, scalable infrastructure. While specific patterns like monorepos, polyrepos, environment folders, or Terragrunt offer different tradeoffs suited to various contexts, several overarching principles consistently emerge as foundations for success:

  1. Embrace Modularity: Design and utilize focused, reusable modules as the primary building blocks. This promotes DRY principles, enhances maintainability, enables standardization, and facilitates collaboration.6 Avoid monolithic configurations and thin wrappers around single resources.9
  2. Prioritize State Management: Treat the state file with care. Use remote backends with locking from the outset. Isolate state strategically—by environment, region, and component—as complexity grows to minimize blast radius and improve performance.3
  3. Configuration over Code Duplication: Handle variations between environments, regions, or deployments primarily through input variables, .tfvars files, workspace interpolation, or Terragrunt inputs, rather than copying and pasting code blocks.4
  4. Structure Reflects Architecture & Teams: Organize directories and repositories in a way that logically mirrors the application architecture (often component-based) and aligns with team ownership boundaries. This improves clarity and maintainability.7
  5. Automate Everything: Implement robust CI/CD pipelines for linting, validation, testing, planning, and applying infrastructure changes. Automation reduces manual errors, enforces consistency, and improves delivery speed.4
  6. Iterate and Refactor: IaC structure is not immutable. Regularly review and refactor the codebase as the infrastructure evolves, teams change, or new patterns emerge. Use metrics like DORA to measure the impact of changes and guide improvements.29
  7. Invest in Tooling: Leverage the capabilities of Terraform/OpenTofu itself, cloud provider services, CI/CD platforms, and potentially orchestration tools like Terragrunt or platforms like Terraform Cloud/Enterprise, Spacelift, or env0 to manage complexity effectively.2
  8. Consistency is Key: Adhere to consistent naming conventions, formatting standards (use terraform fmt), and structural patterns across the codebase. Consistency reduces cognitive load and makes the infrastructure easier to understand and maintain.8

By adhering to these principles, teams can move beyond ad-hoc infrastructure scripting towards building robust, scalable, and maintainable Infrastructure as Code systems that truly deliver on the promise of automation and reliability, regardless of the specific tools or patterns chosen. The journey requires thoughtful planning, continuous refinement, and a commitment to treating infrastructure code with the same rigor as application code.

Works cited

  1. spacelift.io, accessed May 12, 2025, https://spacelift.io/blog/opentofu-vs-terraform#:~:text=Differences%20between%20OpenTofu%20and%20Terraform,-The%20biggest%20and&text=OpenTofu%20is%20open%2Dsource%20under,directly%20influenced%20by%20any%20vendor.
  2. OpenTofu vs. Terraform | Pulumi Docs, accessed May 12, 2025, https://www.pulumi.com/docs/iac/concepts/vs/terraform/opentofu/
  3. How to Avoid Large OpenTofu/Terraform State Files - Gruntwork, accessed May 12, 2025, https://www.gruntwork.io/blog/how-to-manage-large-opentofu-terraform-state-files
  4. Working with a client who created the TF repo like this for our project ..., accessed May 12, 2025, https://www.reddit.com/r/Terraform/comments/1kilvx1/working_with_a_client_who_created_the_tf_repo/
  5. 10 Common Terraform Errors & Best Practices to Avoid Them - ControlMonkey, accessed May 12, 2025, https://controlmonkey.io/resource/terraform-errors-guide/
  6. Terraform Modules Guide: Best Practices & Examples - Env0, accessed May 12, 2025, https://www.env0.com/blog/terraform-modules
  7. How do you structure Terraform / OpenTofu Codebases? - Inuits, accessed May 12, 2025, https://inuits.eu/blog/structuring-terraform-codebases/
  8. Best practices for general style and structure | Terraform - Google Cloud, accessed May 12, 2025, https://cloud.google.com/docs/terraform/best-practices/general-style-structure
  9. Best practices for code base structure and organization - AWS ..., accessed May 12, 2025, https://docs.aws.amazon.com/prescriptive-guidance/latest/terraform-aws-provider-best-practices/structure.html
  10. Terraform Files and Folder Structure | Organizing Infrastructure-as-Code - Env0, accessed May 12, 2025, https://www.env0.com/blog/terraform-files-and-folder-structure-organizing-infrastructure-as-code
  11. Terraform Architecture Overview – Structure and Workflow - Spacelift, accessed May 12, 2025, https://spacelift.io/blog/terraform-architecture
  12. OpenTofu Tutorial – Getting Started, How to Install & Examples - Spacelift, accessed May 12, 2025, https://spacelift.io/blog/opentofu-tutorial
  13. Monorepo vs. Polyrepo: How to Choose Between Them | Buildkite, accessed May 12, 2025, https://buildkite.com/resources/blog/monorepo-polyrepo-choosing/
  14. Are you using mono or poly repos for your infra? : r/Terraform - Reddit, accessed May 12, 2025, https://www.reddit.com/r/Terraform/comments/1bmmnk6/are_you_using_mono_or_poly_repos_for_your_infra/
  15. 0xDones/terraform-monorepo-example - GitHub, accessed May 12, 2025, https://github.com/0xDones/terraform-monorepo-example
  16. How to structure Terraform with multi-env + multi-regions for TBD in monorepo - Reddit, accessed May 12, 2025, https://www.reddit.com/r/Terraform/comments/112avtz/how_to_structure_terraform_with_multienv/
  17. Includes - Terragrunt - Gruntwork, accessed May 12, 2025, https://terragrunt.gruntwork.io/docs/features/includes/
  18. Terragrunt Module Support · Issue #2913 - GitHub, accessed May 12, 2025, https://github.com/gruntwork-io/terragrunt/issues/2913
  19. How to Manage Multiple Terraform Environments Efficiently - Spacelift, accessed May 12, 2025, https://spacelift.io/blog/terraform-environments
  20. Managing Terraform State - Best Practices & Examples - Spacelift, accessed May 12, 2025, https://spacelift.io/blog/terraform-state
  21. Best Practices for Multi-Region Terraform Deployments with AWS CodePipeline, accessed May 12, 2025, https://www.bdccglobal.com/blog/terraform-multi-region-deployment-aws-codepipeline/
  22. Using Terraform to manage multiple AWS regions - Stack Overflow, accessed May 12, 2025, https://stackoverflow.com/questions/48632797/using-terraform-to-manage-multiple-aws-regions
  23. Performance Optimization in OpenTofu: Best Practices - Improwised Technologies, accessed May 12, 2025, https://www.improwised.com/blog/open-tofu-best-practices/
  24. Module Composition | Terraform - HashiCorp Developer, accessed May 12, 2025, https://developer.hashicorp.com/terraform/language/modules/develop/composition
  25. Terminology - Terragrunt - Gruntwork, accessed May 12, 2025, https://terragrunt.gruntwork.io/docs/getting-started/terminology/
  26. Configuration Blocks and Attributes - Terragrunt, accessed May 12, 2025, https://terragrunt.gruntwork.io/docs/reference/config-blocks-and-attributes/
  27. The Road to 1.0: Terragrunt Stacks - Gruntwork, accessed May 12, 2025, https://www.gruntwork.io/blog/the-road-to-terragrunt-1-0-stacks
  28. Complete Terraform Debug Guide for 2024 - Zeet.co, accessed May 12, 2025, https://zeet.co/blog/terraform-debug
  29. DevOps Metrics 2025: The Complete Guide to Successfully Measuring Dev Operations, accessed May 12, 2025, https://checkmarx.com/learn/appsec/devops-metrics-2025-the-complete-guide-to-successfully-measuring-dev-operations/
  30. DORA Metrics: An Infrastructure as Code Perspective | env0, accessed May 12, 2025, https://www.env0.com/blog/dora-metrics-an-infrastructure-as-code-perspective