Skip to content

Conversation

@chucklever
Copy link
Contributor

No description provided.

Drop the -b option. We're stuck with RSA because some cloud providers
require it, but ssh-keygen's default bit length for RSA keys is 3072,
which is better than the value specified here.

Leaving off -b means when ssh-keygen increases its default, kdevops
will get that change without human intervention.

I'm also adding a -C option here because I got really confused when
I looked at the terraform state imported back from the provider. The
public ssh key comment was "cel@ ... ". I thought that this was my
personal public key.

It's is not my public key, thankfully. Rather the comment chosen by
ssh-keygen happens to be the same as the one in my personal public
key. So let's pick a more distinct eye-catcher to avoid future
myocardial infarction.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
The Kconfig choice menu allows the selection of a single disk, but
there was no default setting for that, causing the gen_tfvars
playbook to fail with:

ansible.errors.AnsibleUndefinedVariable:
  'terraform_azure_managed_disks_per_instance' is undefined.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Clean up: as far as I can tell these are unused anywhere.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
This option appears to be on by default when creating VM machines
via the Azure console.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Resource groups appear to be global. To run more than one instance
of kdevops in Azure at the same time, each instance needs a unique
resource group. Hoist the resource group name into Kconfig so that
it can be set to a unique value.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
The most recent version 3 release of hashicorp/azurerm was 3.117.1,
released on February 28, 2025. Version 3 is no longer getting fixes.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Make the most recent RHEL 9 update (update 6, announced two months
ago) the default selection.

Since update 10 has been available for more than a year, add a RHEL
8.10 menu choice.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
First, I was getting

module.volumes[0].oci_core_volume.kdevops_volume_extra[0]: Creating...
╷
│ Error: 400-CannotParseRequest, Incorrectly formatted request. Please refer to our documentation for help.

No idea why, though maybe this is the first time I've tried using
us-east1 when my home region is us-central. Nothing I tried changed
this result, and there's no obvious problem with the way that the
oci_core_volume resource is specified, which hasn't changed in many
months.

All documentation I consulted stated that because the root module
explicitly specifies exactly one provider, child modules are
supposed to inherit that provider. But that isn't happening here:

$ terraform providers

Providers required by configuration:
.
├── provider[registry.terraform.io/oracle/oci] ~> 6.0
└── module.volumes
    └── provider[registry.terraform.io/hashicorp/oci]

And looks like it's a bug that was never fixed:
oracle/terraform-provider-oci#1608

The only workaround I understood was to switch the root module
back to using the hashicorp/oci provider so that the root and
"volumes" module use the same provider.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Add the vpus_per_gb argument when provisioning disk drives to
control how much performance is available. This is similar to
the performance settings available for other providers.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Enabling instance preemption allows the provider to destroy the
instance if it needs the resources. The benefit is a 50%
reduction in price.

Typically, there's nothing on these instances that can't be
re-cloned or reproduced.

This should really be a Kconfig setting.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
The us_west1 region was missing its zone selection menu.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
The Oracle Linux AMIs are only for-pay marketplace options, not
created by Amazon itself.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Clean up.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Unlike the other Linux distributions, Fedora requires some
cloud-init magic to set up the log-in user. I don't feel like
digging into that right now, and no one is demanding that kdevops
support Fedora on GCE at the moment.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Add a spectrum of cost and performance choices, plus a new GPU
platform.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
While setting up some new test runners, I encountered this failure:

TASK [terraform : Ensure the Include directive is present on the controller] ***
task path: /usr/local/home/queue-5-4/worker/queue-5-4-kernel/build/playbooks/roles/terraform/tasks/main.yml:31
fatal: [localhost]: FAILED! => {
    "changed": false,
    "rc": 257
}
MSG:
Path /usr/local/home/queue-5-4/.ssh/config does not exist !

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
When running the pynfs workflow on RHEL 8:

> :stderr:   Could not find a version that satisfies the requirement
> xdrlib3 (from versions: )
> No matching distribution found for xdrlib3

RHEL 8 sports a version of Python that neither requires or supports
the use of "pip install xdrlib3".

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Match the description for the terraform playbook

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Daniel Gomez <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Commit 58a0745 ("Add a guest/instance for building the test
kernel") was applied just after commit 1cf0800 ("gen_hosts:
templates: include localhost in the all group"). The former should
have been updated to include localhost in its [all] group.

Without 'localhost', trying to provision the kernel builder node
fails.

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Daniel Gomez <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
"make destroy" was skipping some important file removals, and
removing other files that do not exist.

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Daniel Gomez <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
This is handled later by the devconfig playbook, and only when a
distribution needs it.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
ansible.netcommon.ipaddr is deprecated. See:

 https://docs.ansible.com/ansible/latest/collections/ansible/netcommon/ipaddr_filter.html

Ensure the update_etc_hosts playbook uses the ansible.utils.ipaddr
plugin instead.

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Daniel Gomez <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Commit b90d89d ("Switch to the cloud.terraform.terraform
module") introduced the use of the cloud.terraform module, and
commit 7ccb648 ("guestfs: Replace scripts/destroy_guestfs.sh
with an Ansible playbook") introduced the use of the
community.libvirt module. It would be friendly if kdevops could
pull in the Ansible modules it needs transparently.

The requirements.yml file is a manifest of Ansible collections that
the project needs to run. Installation of these collections is made
automatic by adding:

  ansible-galaxy install -r requirements.yml

to the "make ansible_cfg" step. This mechanism can keep cached
versions of collections up to date, and can also constrain a
cached collection to a specific version, if that's needed.

The initial file contains requirements I could find easily, and
should be updated over time as new collection dependencies are
introduced.

See also:

https://docs.ansible.com/ansible/latest/user_guide/collections_using.html

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Daniel Gomez <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Some workflows create several guests but use only one or two extra
storage drives per guest. Enable configurations that reduce the
amount of storage allocated per guest to conserve local persistent
storage space on the host.

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Daniel Gomez <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
The size of some cloud OS images is too small for the ltp
installation (eg, the RHEL 8 OS image on Azure is about 10GB). This
causes the ltp workflow to fail 100% of the time during the test
build step.

A simple solution is to replace /opt with a symlink to /data/opt, as
kdevops can make /data large enough to hold the ltp install.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
During "make", I see this warning twice when preparing a terraform-
based kdevops run:

[WARNING]: provided hosts list is empty, only localhost is available.
   Note that the implicit localhost does not match 'all'

The install_terraform and gen_tfvars playbooks both run before the
inventory file is created. AIUI they have to be invoked with an
explicit inventory on the command line.

Fixes: 0987c30 ("Makefile: use inventory from ansible.cfg")
Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Eliminate this warning:

  [WARNING]: Reset is not implemented for this connection

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Since commit 1cf0800 ("gen_hosts: templates: include localhost
in the all group"), the devconfig playbook is no longer updating the
nfsd service host.

1. Actually all the service hosts need to be updated by devconfig,
   so ensure devconfig runs against the service group.

2. Since devconfig can run against most workflows, add blank
   [service] groups to any inventory template that doesn't already
   have it.

Fixes: 1cf0800 ("gen_hosts: templates: include localhost in the all group")
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Daniel Gomez <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Clean up: Sufficient time has passed that this compatibility setting
is no longer necessary.

Suggested-by: Luis Chamberlain <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Commit 42d2757 ("guestfs: enhance ssh config choice") added a
uniquifier to the name of the ssh config to avoid multiple kdevops
runs clobbering each other's ssh config entries. It uses the TOPDIR
pathname set in the current .config to benefit from the nice
convenience of generating an Ansible extra_var using "output yaml".

Commit 42d2757 ("guestfs: enhance ssh config choice") broke the
ability to share a .config with other users because
CONFIG_TOPDIR_PATH contains a fixed string pointing to the home
directory where the .config was created.

The TOPDIR and its checksum need to be generated not by "make
menuconfig" but rather by "make" in order that .config files (as
saved in defconfigs/) can be truly portable.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
I need an ansible.cfg that is generated by the Kconfig menu, but
whose pathname is selected dynamically based on where kdevops is
being run rather than having that path baked into the .config.

Reviewed-by: Luis Chamberlain <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
My CI workflows need the inventory location to be determined at run
time, not set by the kdevops .config file, so the .configs can be
portable amongst test runners and test branches.

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Daniel Gomez <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
@chucklever chucklever merged commit 85bb885 into main Sep 5, 2025
10 checks passed
@chucklever chucklever deleted the cel-fixes branch September 5, 2025 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants