As organizations increasingly adopt Infrastructure as Code (IaC) methodologies, such as Terraform and OpenTofu, the need for policy enforcement mechanisms has grown in tandem. This is where the concept of "Policy as Code" comes into play, with Open Policy Agent (OPA) emerging as a powerful tool in this domain, especially when integrated with Terraform. In this blog post, we'll explore the synergy between OPA and Terraform, diving deep into what Policy as Code means, how OPA works, and its practical applications in Terraform workflows, such as Scalr’s pipeline.
Policy as Code is an approach to defining and managing policies using the same practices and tools used for writing and maintaining software code. This method allows organizations to automate policy enforcement, making it an integral part of the development and deployment process.
Policy as code, a best practice in the DevOps world, reduces human error that could make organizations vulnerable. By automating policy enforcement, the risk of manual mistakes in applying security rules is minimized. Implementing policy as code also results in faster developments and a better developer experience. Developers prefer to catch any issues as soon as possible in the development process and with policies integrated into the development workflow, security checks become part of the process, reducing delays in deployment. Lastly, from a maintainer perspective, policies can be quickly updated and rolled out to adapt to new compliance requirements or security threats using standard GitOps practices. Because it is code, it should be stored in a VCS repository where it is versioned, tested, and audited.
OpenPolicy agent has become the de facto standard for policy as code across the industry. HashiCorp created a proprietary product named Sentinel to use with Terraform Cloud, but once Styra created OPA as an alternative open-source solution, OPA quickly became the leading solution. OPA was accepted by the CNCF in March of 2018. It has had enough adoption that now even HashiCorp supports it in their Terraform Cloud pipelines.
At its core, OPA operates on three main components:
When a query is made to OPA, it evaluates the relevant policies against the provided data and returns a decision. This decision can be a simple yes/no or more complex structured data.
Terraform allows developers to define and provision infrastructure resources across various cloud providers. While Terraform itself focuses on resource management, it doesn't inherently provide robust policy enforcement capabilities. This is where OPA comes in, complementing Terraform by adding a layer of policy control to the infrastructure provisioning process.
The main use case for using OPA policy with Terraform is to evaluate the Terraform plan and check for anything that might be non-compliant. When a Terraform plan runs, it generates a plan file, which is what OPA can check against. An OPA policy can be written for any data that is present, or not, in the plan file. By evaluating OPA policy against the plan, developers will catch any issues prior to executing a Terraform apply, to ensure non-compliant resources are not created. The standard workflow is as follows:
When Scalr generates the data for the OPA policy to ingest, it breaks it out into two sections: tfrun and tfplan.
The tfrun section contains all of the information about the actual run:
By adding all of the extra tfrun context to the input file, users are able to create more advanced logic to determine what to check during the run. Users also have the option to include the policy as a pre-plan check as the tfrun data is available before a Terraform plan is executed.
The tfplan section contains all of the data that was generated from the Terraform code by running a Terraform plan, for example:
The tfrun section is more focused on how the run is happening, whereas the tfplan is more focused on what is being done in the Terraform code.
OPA policies can be executed in Scalr's pre-plan stage. At this stage, Scalr has the tfrun data available to it to have an OPA policy check against. For example, an organization decides that runs in a specific environment cannot be CLI-based. They want to ensure they catch this as early as possible to improve development speed, which is why they would implement it as a pre-plan check.
Post-plan checks are used to evaluate the data generated from a Terraform plan. The plan will generate data about the resources being created, changed, and deleted as part of the proposed plan. Post-plan checks help with security standards regarding the creation and deployment of provider resources. For example, they can be used to prevent users from deploying public S3 buckets or restrict which Terraform providers can be used.
Scalr is the only product in the market that offers an impact analysis for OPA policies. An impact analysis is the equivalent of running a Terraform plan, but for OPA. When a pull request is opened against an OPA policy that is currently active in Scalr, Scalr will check the OPA code in the pull request against all existing workspaces and return the results back to the OPA maintainer. This lets the OPA maintainer know what would happen if they were to merge the new OPA policy code; is there a typo that breaks the code, will some workspaces fail based on changes to the OPA, or is everything working correctly and there is nothing to worry about. This helps greatly with operational excellence when working with OPA policy at scale.
In Scalr, the OPA maintainers have the option to set three different enforcement levels:
The enforcement is set outside of the rego policy in a file name scalr-policy.hcl. Find out more about that here.
Below there are a few examples of OPA policies:
Pull Request Evaluation: Deny a run if the merged by and pull request author is the same person. This shows the power of having the tfrun section available to be able to evaluate the source of the run. Only the tfrun section needs to be imported in this case:
Rego file:
package terraform
import input.tfrun as tfrun
deny["Merged by and PR author are the same person"] {
not is_null(tfrun.vcs)
pr := tfrun.vcs.pull_request
not is_null(pr)
pr.merged_by == pr.author
}
Limit Module Usage: If a specific resource is being created, this policy will enforce that it must be created based on specific Terraform modules. This is helpful if a private module registry is being used and you want to ensure developers are using modules from the registry. Only the tfplan section needs to be imported in this case:
Rego file:
package terraform
import input.tfplan as tfplan
# Map of resource types which must be created only using module
# with corresponding module source
resource_modules = {
"aws_db_instance": "terraform-aws-modules/rds/aws"
}
array_contains(arr, elem) {
arr[_] = elem
}
deny[reason] {
resource := tfplan.resource_changes[_]
action := resource.change.actions[count(resource.change.actions) - 1]
array_contains(["create", "update"], action)
module_source = resource_modules[resource.type]
not resource.module_address
reason := sprintf(
"%s cannot be created directly. Module '%s' must be used instead",
[resource.address, module_source]
)
}
deny[reason] {
resource := tfplan.resource_changes[_]
action := resource.change.actions[count(resource.change.actions) - 1]
array_contains(["create", "update"], action)
module_source = resource_modules[resource.type]
parts = split(resource.module_address, ".")
module_name := parts[1]
actual_source := tfplan.configuration.root_module.module_calls[module_name].source
not actual_source == module_source
reason := sprintf(
"%s must be created with '%s' module, but '%s' is used",
[resource.address, module_source, actual_source]
)
}
Limit Provider Usage: There might be some providers that are banned from being used in your organization and you want to prevent developers from executing code on them. This policy has a list of blacklisted providers that can be checked against the Terraform plan file:
Rego file:
package terraform
import input.tfplan as tfplan
# Blacklisted Terraform providers
not_allowed_provider = [
"null"
]
array_contains(arr, elem) {
arr[_] = elem
}
get_basename(path) = basename{
arr := split(path, "/")
basename:= arr[count(arr)-1]
}
deny[reason] {
resource := tfplan.resource_changes[_]
action := resource.change.actions[count(resource.change.actions) - 1]
array_contains(["create", "update"], action) # allow destroy action
# registry.terraform.io/hashicorp/aws -> aws
provider_name := get_basename(resource.provider_name)
array_contains(not_allowed_provider, provider_name)
reason := sprintf(
"%s: provider type %q is not allowed",
[resource.address, provider_name]
)
}
Limit Cost: Scalr integrates with Infracost to generate an estimated cost when deploying resources. The cost is generated based on information in the Terraform plan file and then injected into the tfrun section of the output file:
Rego file:
package terraform
import input.tfrun as tfrun
deny[reason] {
cost = tfrun.cost_estimate.proposed_monthly_cost
cost > 5
reason := sprintf("Plan is too expensive: $%.2f, while up to $5 is allowed", [cost])
}
Limit Instance Type: This policy is a little more advanced as it shows how you can make decisions across multiple providers. In this case, OPA will check for instance types across AWS, Azure, and GCP:
Rego file:
package terraform
import input.tfplan as tfplan
# Allowed sizes by provider
allowed_types = {
"aws": ["t2.nano", "t2.micro"],
"azurerm": ["Standard_A0", "Standard_A1"],
"google": ["n1-standard-1", "n1-standard-2"]
}
# Attribute name for instance type/size by provider
instance_type_key = {
"aws": "instance_type",
"azurerm": "vm_size",
"google": "machine_type"
}
array_contains(arr, elem) {
arr[_] = elem
}
get_basename(path) = basename{
arr := split(path, "/")
basename:= arr[count(arr)-1]
}
# Extracts the instance type/size
get_instance_type(resource) = instance_type {
# registry.terraform.io/hashicorp/aws -> aws
provider_name := get_basename(resource.provider_name)
instance_type := resource.change.after[instance_type_key[provider_name]]
}
deny[reason] {
resource := tfplan.resource_changes[_]
instance_type := get_instance_type(resource)
# registry.terraform.io/hashicorp/aws -> aws
provider_name := get_basename(resource.provider_name)
not array_contains(allowed_types[provider_name], instance_type)
reason := sprintf(
"%s: instance type %q is not allowed",
[resource.address, instance_type]
)
}
Scalr maintains a repository of example OPA policies that can be used by the OpenTofu or Terraform community
The integration of Open Policy Agent with Terraform represents a powerful approach to implementing Policy as Code in infrastructure management. By leveraging OPA's flexible policy engine alongside Terraform, organizations can achieve a higher level of security, compliance, and governance when executing their Terraform configuration files.
By adopting OPA policy, teams can shift left on security and compliance, catching potential issues early in the development process. This not only reduces the risk of non-compliant infrastructure being deployed but also speeds up the development cycle by providing immediate feedback to developers to update their Terraform code.