TF continually wants to update nomad_job resource between plan/apply #290

kian · 2022-09-23T20:19:59Z

Terraform Version

Terraform v1.0.0
Nomad provider version 1.4.17

Nomad Version

1.1.0

Provider Configuration

Which values are you setting in the provider configuration?

nomad = {
  source = "hashicorp/nomad"
  version = "1.4.17"
}

provider "nomad" {
  address = "https://nomad.service./${var.datacenter}.internal:4646"
  region  = "global"
}

Environment Variables

No

Affected Resource(s)

Please list the resources as a list, for example:

nomad_job

Terraform Configuration Files

Terraform:

variable "datacenter" {
  type        = string
  description = "Datacenter ID to deploy the job to"
}

resource "nomad_job" "cassandra_stage" {
  jobspec = templatefile("${path.module}/cassandra-stage.hcl",
            { datacenter = var.datacenter })
}

Job spec template:

job "cassandra_stage" {
  datacenters = ["${datacenter}"]
  type = "service"
  update {
    max_parallel = 1
    stagger      = "1m"
  }
  group "cassandra-1.50" {
    constraint {
      attribute = "$${attr.unique.network.ip-address}"
      value     = "10.10.1.50"
    }
    restart {
      attempts = 10
      delay    = "30s"
      interval = "30m"
      mode     = "delay"
    }
    task "cassandra-1-50" {
      driver = "docker"
      config {
        image = "ecr-proxy.service.xxx.internal/internal/cassandra:master-20220105-39f4ce4a0"
        port_map = {
          rpc    = 9160
          gossip = 7000
        }
        network_mode = "host"
        logging {
          type = "gelf"
          config {
            gelf-address = "udp://$${node.unique.name}:12201"
            tag = "cassandra"
          }
        }
        volumes = [
          "/mnt/ebs/cassandra/:/srv/var/"
        ]
      }
      service {
        name = "cassandra"
        port = "rpc"
      }
      env {
        CLUSTER_NAME                       = "${datacenter}"
        SEEDS                              = "10.10.2.50,10.10.2.51,10.10.3.51,10.10.3.52"
        LISTEN_ADDRESS                     = "$${NOMAD_IP_rpc}"
        HEAP                               = "8G"
        KEY_CACHE_MB                       = "1024"
        COMPACTION_THROUGHPUT_MB           = "8"
        STREAM_THROUGHPUT_MEGABITS_PER_SEC = "400"
        ENABLE_HINTED_HANDOFF              = "true"
      }
      kill_timeout = "300s"
      resources {
        cpu    = 10000
        memory = 18432
        network {
          mbits = 100
          port "rpc" {
            static = 9160
          }
          port "gossip" {
            static = 7000
          }
        }
      }
    }
  }
  group "cassandra-2.50" {
    constraint {
      attribute = "$${attr.unique.network.ip-address}"
      value     = "10.10.2.50"
    }
    restart {
      attempts = 10
      delay    = "30s"
      interval = "30m"
      mode     = "delay"
    }
    task "cassandra-2-50" {
      driver = "docker"
      config {
        image = "ecr-proxy.service.xxx.internal/internal/cassandra:master-20220105-39f4ce4a0"
        port_map = {
          rpc    = 9160
          gossip = 7000
        }
        network_mode = "host"
        logging {
          type = "gelf"
          config {
            gelf-address = "udp://$${node.unique.name}:12201"
            tag = "cassandra"
          }
        }
        volumes = [
          "/mnt/ebs/cassandra/:/srv/var/"
        ]
      }
      service {
        name = "cassandra"
        port = "rpc"
      }
      env {
        CLUSTER_NAME                       = "${datacenter}"
        SEEDS                              = "10.10.1.50,10.10.2.51,10.10.3.51,10.10.3.52"
        LISTEN_ADDRESS                     = "$${NOMAD_IP_rpc}"
        HEAP                               = "8G"
        KEY_CACHE_MB                       = "1024"
        COMPACTION_THROUGHPUT_MB           = "8"
        STREAM_THROUGHPUT_MEGABITS_PER_SEC = "400"
        ENABLE_HINTED_HANDOFF              = "true"
      }
      kill_timeout = "300s"
      resources {
        cpu    = 10000
        memory = 18432
        network {
          mbits = 100
          port "rpc" {
            static = 9160
          }
          port "gossip" {
            static = 7000
          }
        }
      }
    }
  }
  group "cassandra-1.51" {
    constraint {
      attribute = "$${attr.unique.network.ip-address}"
      value     = "10.10.1.51"
    }
    restart {
      attempts = 10
      delay    = "30s"
      interval = "30m"
      mode     = "delay"
    }
    task "cassandra-1-51" {
      driver = "docker"
      config {
        image = "ecr-proxy.service.xxx.internal/internal/cassandra:master-20220105-39f4ce4a0"
        port_map = {
          rpc    = 9160
          gossip = 7000
        }
        network_mode = "host"
        logging {
          type = "gelf"
          config {
            gelf-address = "udp://$${node.unique.name}:12201"
            tag = "cassandra"
          }
        }
        volumes = [
          "/mnt/ebs/cassandra/:/srv/var/"
        ]
      }
      service {
        name = "cassandra"
        port = "rpc"
      }
      env {
        CLUSTER_NAME                       = "${datacenter}"
        SEEDS                              = "10.10.1.50,10.10.2.50,10.10.2.51"
        LISTEN_ADDRESS                     = "$${NOMAD_IP_rpc}"
        HEAP                               = "8G"
        KEY_CACHE_MB                       = "1024"
        COMPACTION_THROUGHPUT_MB           = "8"
        STREAM_THROUGHPUT_MEGABITS_PER_SEC = "400"
        ENABLE_HINTED_HANDOFF              = "true"
      }
      kill_timeout = "300s"
      resources {
        cpu    = 10000
        memory = 18432
        network {
          mbits = 100
          port "rpc" {
            static = 9160
          }
          port "gossip" {
            static = 7000
          }
        }
      }
    }
  }
  group "cassandra-2.51" {
    constraint {
      attribute = "$${attr.unique.network.ip-address}"
      value     = "10.10.2.51"
    }
    restart {
      attempts = 10
      delay    = "30s"
      interval = "30m"
      mode     = "delay"
    }
    task "cassandra-2-51" {
      driver = "docker"
      config {
        image = "ecr-proxy.service.xxx.internal/internal/cassandra:master-20220105-39f4ce4a0"
        port_map = {
          rpc    = 9160
          gossip = 7000
        }
        network_mode = "host"
        logging {
          type = "gelf"
          config {
            gelf-address = "udp://$${node.unique.name}:12201"
            tag = "cassandra"
          }
        }
        volumes = [
          "/mnt/ebs/cassandra/:/srv/var/"
        ]
      }
      service {
        name = "cassandra"
        port = "rpc"
      }
      env {
        CLUSTER_NAME                       = "${datacenter}"
        SEEDS                              = "10.10.1.50,10.10.2.50,10.10.3.51,10.10.3.52"
        LISTEN_ADDRESS                     = "$${NOMAD_IP_rpc}"
        HEAP                               = "8G"
        KEY_CACHE_MB                       = "1024"
        COMPACTION_THROUGHPUT_MB           = "8"
        STREAM_THROUGHPUT_MEGABITS_PER_SEC = "400"
        ENABLE_HINTED_HANDOFF              = "true"
      }
      kill_timeout = "300s"
      resources {
        cpu    = 10000
        memory = 18432
        network {
          mbits = 100
          port "rpc" {
            static = 9160
          }
          port "gossip" {
            static = 7000
          }
        }
      }
    }
  }
  group "cassandra-3.51" {
    constraint {
      attribute = "$${attr.unique.network.ip-address}"
      value     = "10.10.3.51"
    }
    restart {
      attempts = 10
      delay    = "30s"
      interval = "30m"
      mode     = "delay"
    }
    task "cassandra-3-51" {
      driver = "docker"
      config {
        image = "ecr-proxy.service.xxx.internal/internal/cassandra:master-20220105-39f4ce4a0"
        port_map = {
          rpc    = 9160
          gossip = 7000
        }
        network_mode = "host"
        logging {
          type = "gelf"
          config {
            gelf-address = "udp://$${node.unique.name}:12201"
            tag = "cassandra"
          }
        }
        volumes = [
          "/mnt/ebs/cassandra/:/srv/var/"
        ]
      }
      service {
        name = "cassandra"
        port = "rpc"
      }
      env {
        CLUSTER_NAME                       = "${datacenter}"
        SEEDS                              = "10.10.1.50,10.10.2.50,10.10.2.51,10.10.3.52"
        LISTEN_ADDRESS                     = "$${NOMAD_IP_rpc}"
        HEAP                               = "8G"
        KEY_CACHE_MB                       = "1024"
        COMPACTION_THROUGHPUT_MB           = "8"
        STREAM_THROUGHPUT_MEGABITS_PER_SEC = "400"
        ENABLE_HINTED_HANDOFF              = "true"
      }
      kill_timeout = "300s"
      resources {
        cpu    = 10000
        memory = 18432
        network {
          mbits = 100
          port "rpc" {
            static = 9160
          }
          port "gossip" {
            static = 7000
          }
        }
      }
    }
  }
  group "cassandra-3.52" {
    constraint {
      attribute = "$${attr.unique.network.ip-address}"
      value     = "10.10.3.52"
    }
    restart {
      attempts = 10
      delay    = "30s"
      interval = "30m"
      mode     = "delay"
    }
    task "cassandra-3-52" {
      driver = "docker"
      config {
        image = "ecr-proxy.service.xxx.internal/internal/cassandra:master-20220105-39f4ce4a0"
        port_map = {
          rpc    = 9160
          gossip = 7000
        }
        network_mode = "host"
        logging {
          type = "gelf"
          config {
            gelf-address = "udp://$${node.unique.name}:12201"
            tag = "cassandra"
          }
        }
        volumes = [
          "/mnt/ebs/cassandra/:/srv/var/"
        ]
      }
      service {
        name = "cassandra"
        port = "rpc"
      }
      env {
        CLUSTER_NAME                       = "${datacenter}"
        SEEDS                              = "10.10.1.50,10.10.2.50,10.10.2.51,10.10.3.51"
        LISTEN_ADDRESS                     = "$${NOMAD_IP_rpc}"
        HEAP                               = "8G"
        KEY_CACHE_MB                       = "1024"
        COMPACTION_THROUGHPUT_MB           = "8"
        STREAM_THROUGHPUT_MEGABITS_PER_SEC = "400"
        ENABLE_HINTED_HANDOFF              = "true"
      }
      kill_timeout = "300s"
      resources {
        cpu    = 10000
        memory = 18432
        network {
          mbits = 100
          port "rpc" {
            static = 9160
          }
          port "gossip" {
            static = 7000
          }
        }
      }
    }
  }
}

Expected Behavior

terraform plan and terraform apply do not continually report a difference in nomad_job attributes such as allocation_ids and region.

Actual Behavior

terraform plan followed by terraform apply constantly show a change in the nomad job resource.

first plan, followed by apply:

# module.cassandra_stage.nomad_job.cassandra_stage will be updated in-place
  ~ resource "nomad_job" "cassandra_stage" {
      ~ allocation_ids          = [
          - "d7ead8f3-6668-316b-2593-d8e3c02a725d",
          - "01820573-0c0d-0321-9a32-bd8966f6e366",
          - "874ec969-80c1-a344-1f3c-0d174ed9ec02",
          - "0c7e412d-cc7d-832b-e1d9-d4f345104d37",
          - "02d16058-65d5-3a68-502d-051a58c4b6e9",
          - "ad8a8744-f776-3636-bd4c-31ee82db23b4",
        ] -> (known after apply)
        id                      = "cassandra_stage"
      ~ modify_index            = "5289351" -> (known after apply)
        name                    = "cassandra_stage"
      ~ region                  = "global" -> (known after apply)
        # (8 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

after applying, second plan:

# module.cassandra_stage.nomad_job.cassandra_stage will be updated in-place
  ~ resource "nomad_job" "cassandra_stage" {
      ~ allocation_ids          = [
          - "d7ead8f3-6668-316b-2593-d8e3c02a725d",
          - "01820573-0c0d-0321-9a32-bd8966f6e366",
          - "874ec969-80c1-a344-1f3c-0d174ed9ec02",
          - "0c7e412d-cc7d-832b-e1d9-d4f345104d37",
          - "02d16058-65d5-3a68-502d-051a58c4b6e9",
          - "ad8a8744-f776-3636-bd4c-31ee82db23b4",
        ] -> (known after apply)
        id                      = "cassandra_stage"
      ~ modify_index            = "5289351" -> (known after apply)
        name                    = "cassandra_stage"
      ~ region                  = "global" -> (known after apply)
        # (8 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

terraform apply && terraform apply

The text was updated successfully, but these errors were encountered:

lgfa29 · 2022-12-01T01:18:57Z

Hi @kian 👋

I have not been able to reproduce this issue. Could test with a smaller job and see if the problem still happens?

goatmale · 2022-12-22T16:24:09Z

I'm having a similar issue, but with periodic jobs -some thoughts that I had to be the cause -

i'm not sure if it's because we are restarting allocations quite frequently - or if the state keeps track of allocation ids - but maybe this is an issue.
we are also using HCL2 - maybe this is the related to the issue?

Here is an example job:

job "uat.util.some.job.name" {
  region      = "someregion"
  datacenters = ["SOMEDC"]
  type        = "batch"
  meta {
    description = "Some job."
  }


  constraint {
    attribute = "${attr.unique.hostname}"
    value     = "somehost"
  }


  periodic {
    cron             = " 0 15 7 1 JAN ? 2099"
    prohibit_overlap = true
    time_zone        = "America/Chicago"
  }

  group "uat.util.some.job.name.group" {
    network {
    }
    restart {
      attempts = "10"
      interval = "15m"
      delay    = "30s"
      mode     = "delay"
    }

    task "uat.util.some.job.name.task" {


      artifact {
        source      = "git:somerepohere"
        destination = "local/repo"
        options {
          ref    = "someref"
          sshkey = "somekey"
        }
      }

      driver = "raw_exec"


      config {
        command = "/bin/bash"
        args    = ["-c", "bash local/somescript.sh"]
      }
    }
  }
}

Here is the error:

  # nomad_job.uat_util_some_job_name will be updated in-place
  ~ resource "nomad_job" "uat_util_some_job_name" {
      ~ allocation_ids          = [] -> (known after apply)
        id                      = "uat.util.some.job.name"
      ~ modify_index            = "181804" -> (known after apply)
        name                    = "uat.util.some.job.name"

We are using Nomad 1.4.1
Terraform v1.3.5
terraform-provider-nomad_v1.4.19_x5

arcenik · 2023-01-05T13:47:39Z

Hi,

I have the same issue.
I think the allocation_ids should be ignored as it is managed by nomad

~ resource "nomad_job" "this" {
      ~ allocation_ids          = [
          - "4992e7e2-9b6f-adce-a2e4-fd007d042eaf",
          - "09c0473b-9adf-ae2b-85e4-724dfdc324cb",
          - "ed1a45ed-f3a3-e722-1fe5-20ac966047f3",
          - "31bbd497-401f-9cb9-986d-1e31860810b3",
          - "96a4f90c-b675-265a-ee23-208b9d476a14",
          - "2b63d096-ea89-bddc-50d6-151d02b187aa",
          - "65864468-b42c-8bb4-f802-7a0bf7edc221",
          - "880f23ed-8000-92e0-beee-11127f54e8e0",
          - "d138bed9-1cfa-1030-8a51-34162ccda3e9",
        ] -> (known after apply)
        id                      = "cassandra-main"
      ~ modify_index            = "1572279" -> (known after apply)
        name                    = "cassandra-main"
        # (9 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

the-nando · 2023-04-22T08:59:04Z

I've run into the same issue on a running system and I've tracked it down to an inconsistency between the job spec in the state (left) and the one on the filesystem (right):

Default values were configured in the job spec after the initial apply and aren't persisted in the state on update, when overridden by variables passed in by the provider. Subsequent plan/apply will trigger the behaviour described by the OP.

The reference to allocation_ids and modify_index is a red herring as both are computed fields:

// We know that applying changes here will change the modify index
// _somehow_, but we won't know how much it will increment until
// after we complete registration.
d.SetNewComputed("modify_index")
// similarly, we won't know the allocation ids until after the job registration eval
d.SetNewComputed("allocation_ids")

To reproduce the issue:

Terraform v1.4.5
on darwin_arm64
+ provider registry.terraform.io/hashicorp/nomad v1.4.20

Create a Nomad job file with a variable and a matching resource definition which sets such variable:

variable "image" {
  type    = string
}
[...]

resource "nomad_job" "efs-nodes" {
  hcl2 {
    enabled = true
    vars = {
      datacenters   = join(",", var.datacenters)
      region        = var.region
      namespace     = var.namespace
      image         = var.driver_image
    }
  }
  jobspec = file("${path.module}/efs-nodes.hcl")
}

Run an apply
Run another apply (no changes as expected)
Update the job spec with a default value for the variable:

variable "image" {
  type    = string
  default = "amazon/aws-efs-csi-driver:v1.4.7"
}

Run an apply

Terraform will perform the following actions:

  # module.nomad-efs.nomad_job.efs-nodes will be updated in-place
  ~ resource "nomad_job" "efs-nodes" {
      ~ allocation_ids          = [] -> (known after apply)
        id                      = "csi-efs-nodes"
      ~ modify_index            = "461678" -> (known after apply)
        name                    = "csi-efs-nodes"
        # (9 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Run another apply

Terraform will perform the following actions:

  # module.nomad-efs.nomad_job.efs-nodes will be updated in-place
  ~ resource "nomad_job" "efs-nodes" {
      ~ allocation_ids          = [] -> (known after apply)
        id                      = "csi-efs-nodes"
      ~ modify_index            = "461678" -> (known after apply)
        name                    = "csi-efs-nodes"
        # (9 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Interestingly enough if I remove the setting of the variable from the hcl2.vars block of the resource, the job spec and state are updated as expected. It looks like something is off with the way the computed state is persisted after the hcl2 templating has run.

Terraform will perform the following actions:

  # module.nomad-efs.nomad_job.efs-nodes will be updated in-place
  ~ resource "nomad_job" "efs-nodes" {
      ~ allocation_ids          = [] -> (known after apply)
        id                      = "csi-efs-nodes"
      ~ jobspec                 = <<-EOT
          - variable "image" { type = string }
          + variable "image" {
          +   type    = string
          +   default = "amazon/aws-efs-csi-driver:v1.4.7"
          + }
            variable "region" { type = string }
            variable "datacenters" { type = string }
[...]
        EOT
      ~ modify_index            = "461678" -> (known after apply)
        name                    = "csi-efs-nodes"
        # (8 unchanged attributes hidden)

      ~ hcl2 {
          ~ vars     = {
              - "image"         = "amazon/aws-efs-csi-driver:v1.5.4" -> null
                # (5 unchanged elements hidden)
            }
            # (2 unchanged attributes hidden)
        }
    }

stevecn · 2023-04-22T19:32:43Z

Hi @lgfa29:
From mine, this problem can reproduce easily by removing(or adding) a blank line in nomad job file.

ttaghavi · 2023-06-21T10:50:44Z

Hi @lgfa29: From mine, this problem can reproduce easily by removing(or adding) a blank line in nomad job file.

I have a similar behaviour, but when adding comments to the jobspec file.

apply a new jobspec
add a comment to the jobspec
apply again
terraform shows it will do an in-place update of the job
I am guessing nothing actually changes (as nomad ignores comments)
the terraform state is not updated with the changed jobspec content (added comments)
thus apply will keep informing about changes it needs to do.

It looks like the comments are ignored by nomad and thus nomad says "nothing changed". But terraform would need to update the state with the comments/ignored lines by nomad.

lgfa29 · 2023-07-26T16:31:49Z

Thanks for the all the extra info everyone.

The analysis from @the-nando, @stevecn, and @ttaghavi about the space or var default changes seems like the root cause. #356 uses a semantic jobspec diff to prevent problems like these.

I'm also planning on deprecating the allocation_ids field. While not the root cause, it is a weird attribute that smells more like a data source.

lgfa29 added theme/resource/job type/bug stage/needs-investigation labels Dec 1, 2022

lgfa29 mentioned this issue Jul 12, 2023

template change mode noop being respected. #336

Closed

blmhemu mentioned this issue Jul 15, 2023

Terraform updates nomad_job even if nothing changes. #349

Closed

lgfa29 mentioned this issue Jul 26, 2023

resource/job: use semantic jobspec diff #356

Merged

lgfa29 closed this as completed in #356 Jul 26, 2023

lgfa29 added this to the 2.0.0 milestone Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF continually wants to update nomad_job resource between plan/apply #290

TF continually wants to update nomad_job resource between plan/apply #290

kian commented Sep 23, 2022

lgfa29 commented Dec 1, 2022

goatmale commented Dec 22, 2022 •

edited

Loading

arcenik commented Jan 5, 2023

the-nando commented Apr 22, 2023 •

edited

Loading

stevecn commented Apr 22, 2023

ttaghavi commented Jun 21, 2023 •

edited

Loading

lgfa29 commented Jul 26, 2023

TF continually wants to update nomad_job resource between plan/apply #290

TF continually wants to update nomad_job resource between plan/apply #290

Comments

kian commented Sep 23, 2022

Terraform Version

Nomad Version

Provider Configuration

Environment Variables

Affected Resource(s)

Terraform Configuration Files

Expected Behavior

Actual Behavior

Steps to Reproduce

lgfa29 commented Dec 1, 2022

goatmale commented Dec 22, 2022 • edited Loading

arcenik commented Jan 5, 2023

the-nando commented Apr 22, 2023 • edited Loading

stevecn commented Apr 22, 2023

ttaghavi commented Jun 21, 2023 • edited Loading

lgfa29 commented Jul 26, 2023

goatmale commented Dec 22, 2022 •

edited

Loading

the-nando commented Apr 22, 2023 •

edited

Loading

ttaghavi commented Jun 21, 2023 •

edited

Loading