Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database and AZ machinery issue #2323

Closed
cccs-mog opened this issue Sep 18, 2024 · 14 comments
Closed

Database and AZ machinery issue #2323

cccs-mog opened this issue Sep 18, 2024 · 14 comments

Comments

@cccs-mog
Copy link
Contributor

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [X ] I am running the latest version
  • [X ] I did read the README!
  • [X ] I checked the documentation and found no answer
  • [X ] I checked to make sure that this issue has not already been filed
  • [X ] I'm reporting the issue to the correct repository (for multi-repository projects)
  • [ X] I have read and checked all configs (with all optional parts)

Expected Behavior

A call to self.db.add_machine() should result on a machine being created and thus the check in initialize_machinery() listing machine should not induce an error for the AZ machinery.

Current Behavior

No machine are created at all, the running of this function doesn't seem to produce any database commands being ran.

Failure Information (for bugs)

The session goes all the way throught initialize_machinery(). The AZ machinery overwrite the initialize and initialize_check function. In all cases at some point the machinery will attempt to add machines to the database the same way as the other machinery would do it, but for some reasons it doesn't work. The connection to the database is ok because tasks can be added to it and the schema was created at the beginning. Tested on both SQLite3 and PosgreSQL. This could be a timing issue which is specific to scaling machinery but I would expect that explicitely calling the addition of a machine in the database to actually do it.

Steps to Reproduce

  1. Clean install of CAPE
  2. Setup for the AZ machinery with VMSS
  3. Run the cape service/daemon as normal

Context

Fresh install on Ubuntu 20.04. Base installation made from cape2.sh. Every aspect seem to work fine except this.

Failure Logs

2024-09-18 18:26:45,412 [root] DEBUG: Imported "machinery" modules:
2024-09-18 18:26:45,412 [root] DEBUG: `-- Azure
2024-09-18 18:26:45,413 [root] DEBUG: Checking for locked tasks...
2024-09-18 18:26:45,417 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-09-18 18:26:45,417 [sqlalchemy.engine.Engine] INFO: BEGIN (implicit)
2024-09-18 18:26:45,448 INFO sqlalchemy.engine.Engine SELECT tasks.id AS tasks_id, tasks.target AS tasks_target, tasks.category AS tasks_category, tasks.cape AS tasks_cape, tasks.timeout AS tasks_timeout, tasks.priority AS tasks_priority, tasks.custom AS tasks_custom, tasks.machine AS tasks_machine, tasks.package AS tasks_package, tasks.route AS tasks_route, tasks.tags_tasks AS tasks_tags_tasks, tasks.options AS tasks_options, tasks.platform AS tasks_platform, tasks.memory AS tasks_memory, tasks.enforce_timeout AS tasks_enforce_timeout, tasks.clock AS tasks_clock, tasks.added_on AS tasks_added_on, tasks.started_on AS tasks_started_on, tasks.completed_on AS tasks_completed_on, tasks.status AS tasks_status, tasks.dropped_files AS tasks_dropped_files, tasks.running_processes AS tasks_running_processes, tasks.api_calls AS tasks_api_calls, tasks.domains AS tasks_domains, tasks.signatures_total AS tasks_signatures_total, tasks.signatures_alert AS tasks_signatures_alert, tasks.files_written AS tasks_files_written, tasks.registry_keys_modified AS tasks_registry_keys_modified, tasks.crash_issues AS tasks_crash_issues, tasks.anti_issues AS tasks_anti_issues, tasks.analysis_started_on AS tasks_analysis_started_on, tasks.analysis_finished_on AS tasks_analysis_finished_on, tasks.processing_started_on AS tasks_processing_started_on, tasks.processing_finished_on AS tasks_processing_finished_on, tasks.signatures_started_on AS tasks_signatures_started_on, tasks.signatures_finished_on AS tasks_signatures_finished_on, tasks.reporting_started_on AS tasks_reporting_started_on, tasks.reporting_finished_on AS tasks_reporting_finished_on, tasks.timedout AS tasks_timedout, tasks.sample_id AS tasks_sample_id, tasks.machine_id AS tasks_machine_id, tasks.shrike_url AS tasks_shrike_url, tasks.shrike_refer AS tasks_shrike_refer, tasks.shrike_msg AS tasks_shrike_msg, tasks.shrike_sid AS tasks_shrike_sid, tasks.parent_id AS tasks_parent_id, tasks.tlp AS tasks_tlp, tasks.user_id AS tasks_user_id, tasks.username AS tasks_username, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name, guests_1.id AS guests_1_id, guests_1.status AS guests_1_status, guests_1.name AS guests_1_name, guests_1.label AS guests_1_label, guests_1.platform AS guests_1_platform, guests_1.manager AS guests_1_manager, guests_1.started_on AS guests_1_started_on, guests_1.shutdown_on AS guests_1_shutdown_on, guests_1.task_id AS guests_1_task_id, errors_1.id AS errors_1_id, errors_1.message AS errors_1_message, errors_1.task_id AS errors_1_task_id
FROM tasks LEFT OUTER JOIN (tasks_tags AS tasks_tags_1 JOIN tags AS tags_1 ON tags_1.id = tasks_tags_1.tag_id) ON tasks.id = tasks_tags_1.task_id LEFT OUTER JOIN guests AS guests_1 ON tasks.id = guests_1.task_id LEFT OUTER JOIN errors AS errors_1 ON tasks.id = errors_1.task_id
WHERE tasks.status = %(status_1)s ORDER BY tasks.added_on DESC
2024-09-18 18:26:45,448 [sqlalchemy.engine.Engine] INFO: SELECT tasks.id AS tasks_id, tasks.target AS tasks_target, tasks.category AS tasks_category, tasks.cape AS tasks_cape, tasks.timeout AS tasks_timeout, tasks.priority AS tasks_priority, tasks.custom AS tasks_custom, tasks.machine AS tasks_machine, tasks.package AS tasks_package, tasks.route AS tasks_route, tasks.tags_tasks AS tasks_tags_tasks, tasks.options AS tasks_options, tasks.platform AS tasks_platform, tasks.memory AS tasks_memory, tasks.enforce_timeout AS tasks_enforce_timeout, tasks.clock AS tasks_clock, tasks.added_on AS tasks_added_on, tasks.started_on AS tasks_started_on, tasks.completed_on AS tasks_completed_on, tasks.status AS tasks_status, tasks.dropped_files AS tasks_dropped_files, tasks.running_processes AS tasks_running_processes, tasks.api_calls AS tasks_api_calls, tasks.domains AS tasks_domains, tasks.signatures_total AS tasks_signatures_total, tasks.signatures_alert AS tasks_signatures_alert, tasks.files_written AS tasks_files_written, tasks.registry_keys_modified AS tasks_registry_keys_modified, tasks.crash_issues AS tasks_crash_issues, tasks.anti_issues AS tasks_anti_issues, tasks.analysis_started_on AS tasks_analysis_started_on, tasks.analysis_finished_on AS tasks_analysis_finished_on, tasks.processing_started_on AS tasks_processing_started_on, tasks.processing_finished_on AS tasks_processing_finished_on, tasks.signatures_started_on AS tasks_signatures_started_on, tasks.signatures_finished_on AS tasks_signatures_finished_on, tasks.reporting_started_on AS tasks_reporting_started_on, tasks.reporting_finished_on AS tasks_reporting_finished_on, tasks.timedout AS tasks_timedout, tasks.sample_id AS tasks_sample_id, tasks.machine_id AS tasks_machine_id, tasks.shrike_url AS tasks_shrike_url, tasks.shrike_refer AS tasks_shrike_refer, tasks.shrike_msg AS tasks_shrike_msg, tasks.shrike_sid AS tasks_shrike_sid, tasks.parent_id AS tasks_parent_id, tasks.tlp AS tasks_tlp, tasks.user_id AS tasks_user_id, tasks.username AS tasks_username, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name, guests_1.id AS guests_1_id, guests_1.status AS guests_1_status, guests_1.name AS guests_1_name, guests_1.label AS guests_1_label, guests_1.platform AS guests_1_platform, guests_1.manager AS guests_1_manager, guests_1.started_on AS guests_1_started_on, guests_1.shutdown_on AS guests_1_shutdown_on, guests_1.task_id AS guests_1_task_id, errors_1.id AS errors_1_id, errors_1.message AS errors_1_message, errors_1.task_id AS errors_1_task_id
FROM tasks LEFT OUTER JOIN (tasks_tags AS tasks_tags_1 JOIN tags AS tags_1 ON tags_1.id = tasks_tags_1.tag_id) ON tasks.id = tasks_tags_1.task_id LEFT OUTER JOIN guests AS guests_1 ON tasks.id = guests_1.task_id LEFT OUTER JOIN errors AS errors_1 ON tasks.id = errors_1.task_id
WHERE tasks.status = %(status_1)s ORDER BY tasks.added_on DESC
2024-09-18 18:26:45,448 INFO sqlalchemy.engine.Engine [generated in 0.00079s] {'status_1': 'running'}
2024-09-18 18:26:45,448 [sqlalchemy.engine.Engine] INFO: [generated in 0.00079s] {'status_1': 'running'}
2024-09-18 18:26:45,456 INFO sqlalchemy.engine.Engine COMMIT
2024-09-18 18:26:45,456 [sqlalchemy.engine.Engine] INFO: COMMIT
2024-09-18 18:26:45,479 [lib.cuckoo.core.machinery_manager] INFO: Using MachineryManager[az] with max_machines_count=0
2024-09-18 18:26:45,479 [lib.cuckoo.core.scheduler] INFO: Creating scheduler with max_analysis_count=unlimited
2024-09-18 18:27:38,006 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-09-18 18:27:38,006 [sqlalchemy.engine.Engine] INFO: BEGIN (implicit)
2024-09-18 18:27:38,009 INFO sqlalchemy.engine.Engine DELETE FROM machines_tags
2024-09-18 18:27:38,009 [sqlalchemy.engine.Engine] INFO: DELETE FROM machines_tags
2024-09-18 18:27:38,009 INFO sqlalchemy.engine.Engine [generated in 0.00066s] {}
2024-09-18 18:27:38,009 [sqlalchemy.engine.Engine] INFO: [generated in 0.00066s] {}
2024-09-18 18:27:38,013 INFO sqlalchemy.engine.Engine DELETE FROM machines
2024-09-18 18:27:38,013 [sqlalchemy.engine.Engine] INFO: DELETE FROM machines
2024-09-18 18:27:38,013 INFO sqlalchemy.engine.Engine [generated in 0.00050s] {}
2024-09-18 18:27:38,013 [sqlalchemy.engine.Engine] INFO: [generated in 0.00050s] {}
2024-09-18 18:27:38,014 [modules.machinery.az] DEBUG: Connecting to Azure for the region 'XXX'.
2024-09-18 18:27:38,638 [modules.machinery.az] DEBUG: Trying <bound method GalleryImagesOperations.get of <azure.mgmt.compute.v2023_07_03.operations._operations.GalleryImagesOperations object at 0x7fab7172d810>>(('XXX', 'cape_compute_gallery', 'XXX'),{})
2024-09-18 18:27:38,761 [modules.machinery.az] DEBUG: Trying <bound method VirtualMachineScaleSetsOperations.list of <azure.mgmt.compute.v2024_07_01.operations._operations.VirtualMachineScaleSetsOperations object at 0x7fab7172df60>>(('XXX',),{})
2024-09-18 18:27:38,913 [modules.machinery.az] DEBUG: Trying <bound method SubnetsOperations.get of <azure.mgmt.network.operations._operations.SubnetsOperations object at 0x7fab7172f730>>(('XXX', 'VNET', 'XXX'),{})
2024-09-18 18:27:43,160 [modules.machinery.az] DEBUG: Trying <bound method VirtualMachineScaleSetsOperations.begin_reimage_all of <azure.mgmt.compute.v2024_07_01.operations._operations.VirtualMachineScaleSetsOperations object at 0x7fab7172f940>>(('XXX', 'vmss-cape-win10x64-60'),{'polling_interval': 1})
2024-09-18 18:32:08,967 [modules.machinery.az] DEBUG: Adding machines to database for vmss-cape-win10x64-60.
2024-09-18 18:32:08,981 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-09-18 18:32:08,981 [sqlalchemy.engine.Engine] INFO: BEGIN (implicit)
2024-09-18 18:32:08,990 INFO sqlalchemy.engine.Engine SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
WHERE machines.reserved = false
2024-09-18 18:32:08,990 [sqlalchemy.engine.Engine] INFO: SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
WHERE machines.reserved = false
2024-09-18 18:32:08,991 INFO sqlalchemy.engine.Engine [generated in 0.00067s] {}
2024-09-18 18:32:08,991 [sqlalchemy.engine.Engine] INFO: [generated in 0.00067s] {}
2024-09-18 18:32:08,995 [modules.machinery.az] DEBUG: Trying <bound method VirtualMachineScaleSetVMsOperations.list of <azure.mgmt.compute.v2024_07_01.operations._operations.VirtualMachineScaleSetVMsOperations object at 0x7fab717add20>>(('XXX', 'vmss-cape-win10x64-60'),{})
2024-09-18 18:32:08,998 [modules.machinery.az] DEBUG: Trying <bound method NetworkInterfacesOperations.list_virtual_machine_scale_set_network_interfaces of <azure.mgmt.network.operations._operations.NetworkInterfacesOperations object at 0x7fab6fe0c040>>(('XXX', 'vmss-cape-win10x64-60'),{'api_version': '2024-07-01'})
2024-09-18 18:32:18,186 INFO sqlalchemy.engine.Engine SELECT tags.id AS tags_id, tags.name AS tags_name
FROM tags
WHERE tags.name = %(name_1)s
LIMIT %(param_1)s
2024-09-18 18:32:18,186 [sqlalchemy.engine.Engine] INFO: SELECT tags.id AS tags_id, tags.name AS tags_name
FROM tags
WHERE tags.name = %(name_1)s
LIMIT %(param_1)s
2024-09-18 18:32:18,187 INFO sqlalchemy.engine.Engine [generated in 0.00113s] {'name_1': 'win10x64', 'param_1': 1}
2024-09-18 18:32:18,187 [sqlalchemy.engine.Engine] INFO: [generated in 0.00113s] {'name_1': 'win10x64', 'param_1': 1}
2024-09-18 18:32:19,321 [modules.machinery.az] DEBUG: Machine vmss-cape-win10x64-60_0 was created and available in 0s
2024-09-18 18:32:22,244 INFO sqlalchemy.engine.Engine SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
2024-09-18 18:32:22,244 [sqlalchemy.engine.Engine] INFO: SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
2024-09-18 18:32:22,244 INFO sqlalchemy.engine.Engine [generated in 0.00080s] {}
2024-09-18 18:32:22,244 [sqlalchemy.engine.Engine] INFO: [generated in 0.00080s] {}
2024-09-18 18:32:22,248 INFO sqlalchemy.engine.Engine ROLLBACK
2024-09-18 18:32:22,248 [sqlalchemy.engine.Engine] INFO: ROLLBACK
2024-09-18 18:32:22,251 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-09-18 18:32:22,251 [sqlalchemy.engine.Engine] INFO: BEGIN (implicit)
2024-09-18 18:32:22,256 INFO sqlalchemy.engine.Engine SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
WHERE machines.locked = true AND machines.reserved = false
2024-09-18 18:32:22,256 [sqlalchemy.engine.Engine] INFO: SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
WHERE machines.locked = true AND machines.reserved = false
2024-09-18 18:32:22,256 INFO sqlalchemy.engine.Engine [generated in 0.00056s] {}
2024-09-18 18:32:22,256 [sqlalchemy.engine.Engine] INFO: [generated in 0.00056s] {}
2024-09-18 18:32:22,259 INFO sqlalchemy.engine.Engine COMMIT
2024-09-18 18:32:22,259 [sqlalchemy.engine.Engine] INFO: COMMIT
2024-09-18 18:32:22,260 [root] CRITICAL: CuckooCriticalError: No machines available
2024-09-18 18:32:38,638 [modules.machinery.az] DEBUG: Monitoring the machine pools...
2024-09-18 18:32:38,678 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-09-18 18:32:38,678 [sqlalchemy.engine.Engine] INFO: BEGIN (implicit)
2024-09-18 18:32:38,680 INFO sqlalchemy.engine.Engine SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
WHERE machines.reserved = false
2024-09-18 18:32:38,680 [sqlalchemy.engine.Engine] INFO: SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
WHERE machines.reserved = false
2024-09-18 18:32:38,683 INFO sqlalchemy.engine.Engine [cached since 29.69s ago] {}
2024-09-18 18:32:38,683 [sqlalchemy.engine.Engine] INFO: [cached since 29.69s ago] {}
2024-09-18 18:32:38,695 INFO sqlalchemy.engine.Engine SELECT tasks.id AS tasks_id, tasks.target AS tasks_target, tasks.category AS tasks_category, tasks.cape AS tasks_cape, tasks.timeout AS tasks_timeout, tasks.priority AS tasks_priority, tasks.custom AS tasks_custom, tasks.machine AS tasks_machine, tasks.package AS tasks_package, tasks.route AS tasks_route, tasks.tags_tasks AS tasks_tags_tasks, tasks.options AS tasks_options, tasks.platform AS tasks_platform, tasks.memory AS tasks_memory, tasks.enforce_timeout AS tasks_enforce_timeout, tasks.clock AS tasks_clock, tasks.added_on AS tasks_added_on, tasks.started_on AS tasks_started_on, tasks.completed_on AS tasks_completed_on, tasks.status AS tasks_status, tasks.dropped_files AS tasks_dropped_files, tasks.running_processes AS tasks_running_processes, tasks.api_calls AS tasks_api_calls, tasks.domains AS tasks_domains, tasks.signatures_total AS tasks_signatures_total, tasks.signatures_alert AS tasks_signatures_alert, tasks.files_written AS tasks_files_written, tasks.registry_keys_modified AS tasks_registry_keys_modified, tasks.crash_issues AS tasks_crash_issues, tasks.anti_issues AS tasks_anti_issues, tasks.analysis_started_on AS tasks_analysis_started_on, tasks.analysis_finished_on AS tasks_analysis_finished_on, tasks.processing_started_on AS tasks_processing_started_on, tasks.processing_finished_on AS tasks_processing_finished_on, tasks.signatures_started_on AS tasks_signatures_started_on, tasks.signatures_finished_on AS tasks_signatures_finished_on, tasks.reporting_started_on AS tasks_reporting_started_on, tasks.reporting_finished_on AS tasks_reporting_finished_on, tasks.timedout AS tasks_timedout, tasks.sample_id AS tasks_sample_id, tasks.machine_id AS tasks_machine_id, tasks.shrike_url AS tasks_shrike_url, tasks.shrike_refer AS tasks_shrike_refer, tasks.shrike_msg AS tasks_shrike_msg, tasks.shrike_sid AS tasks_shrike_sid, tasks.parent_id AS tasks_parent_id, tasks.tlp AS tasks_tlp, tasks.user_id AS tasks_user_id, tasks.username AS tasks_username, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name, guests_1.id AS guests_1_id, guests_1.status AS guests_1_status, guests_1.name AS guests_1_name, guests_1.label AS guests_1_label, guests_1.platform AS guests_1_platform, guests_1.manager AS guests_1_manager, guests_1.started_on AS guests_1_started_on, guests_1.shutdown_on AS guests_1_shutdown_on, guests_1.task_id AS guests_1_task_id, errors_1.id AS errors_1_id, errors_1.message AS errors_1_message, errors_1.task_id AS errors_1_task_id
FROM tasks LEFT OUTER JOIN (tasks_tags AS tasks_tags_1 JOIN tags AS tags_1 ON tags_1.id = tasks_tags_1.tag_id) ON tasks.id = tasks_tags_1.task_id LEFT OUTER JOIN guests AS guests_1 ON tasks.id = guests_1.task_id LEFT OUTER JOIN errors AS errors_1 ON tasks.id = errors_1.task_id
WHERE tasks.status = %(status_1)s ORDER BY tasks.added_on DESC
2024-09-18 18:32:38,695 [sqlalchemy.engine.Engine] INFO: SELECT tasks.id AS tasks_id, tasks.target AS tasks_target, tasks.category AS tasks_category, tasks.cape AS tasks_cape, tasks.timeout AS tasks_timeout, tasks.priority AS tasks_priority, tasks.custom AS tasks_custom, tasks.machine AS tasks_machine, tasks.package AS tasks_package, tasks.route AS tasks_route, tasks.tags_tasks AS tasks_tags_tasks, tasks.options AS tasks_options, tasks.platform AS tasks_platform, tasks.memory AS tasks_memory, tasks.enforce_timeout AS tasks_enforce_timeout, tasks.clock AS tasks_clock, tasks.added_on AS tasks_added_on, tasks.started_on AS tasks_started_on, tasks.completed_on AS tasks_completed_on, tasks.status AS tasks_status, tasks.dropped_files AS tasks_dropped_files, tasks.running_processes AS tasks_running_processes, tasks.api_calls AS tasks_api_calls, tasks.domains AS tasks_domains, tasks.signatures_total AS tasks_signatures_total, tasks.signatures_alert AS tasks_signatures_alert, tasks.files_written AS tasks_files_written, tasks.registry_keys_modified AS tasks_registry_keys_modified, tasks.crash_issues AS tasks_crash_issues, tasks.anti_issues AS tasks_anti_issues, tasks.analysis_started_on AS tasks_analysis_started_on, tasks.analysis_finished_on AS tasks_analysis_finished_on, tasks.processing_started_on AS tasks_processing_started_on, tasks.processing_finished_on AS tasks_processing_finished_on, tasks.signatures_started_on AS tasks_signatures_started_on, tasks.signatures_finished_on AS tasks_signatures_finished_on, tasks.reporting_started_on AS tasks_reporting_started_on, tasks.reporting_finished_on AS tasks_reporting_finished_on, tasks.timedout AS tasks_timedout, tasks.sample_id AS tasks_sample_id, tasks.machine_id AS tasks_machine_id, tasks.shrike_url AS tasks_shrike_url, tasks.shrike_refer AS tasks_shrike_refer, tasks.shrike_msg AS tasks_shrike_msg, tasks.shrike_sid AS tasks_shrike_sid, tasks.parent_id AS tasks_parent_id, tasks.tlp AS tasks_tlp, tasks.user_id AS tasks_user_id, tasks.username AS tasks_username, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name, guests_1.id AS guests_1_id, guests_1.status AS guests_1_status, guests_1.name AS guests_1_name, guests_1.label AS guests_1_label, guests_1.platform AS guests_1_platform, guests_1.manager AS guests_1_manager, guests_1.started_on AS guests_1_started_on, guests_1.shutdown_on AS guests_1_shutdown_on, guests_1.task_id AS guests_1_task_id, errors_1.id AS errors_1_id, errors_1.message AS errors_1_message, errors_1.task_id AS errors_1_task_id
FROM tasks LEFT OUTER JOIN (tasks_tags AS tasks_tags_1 JOIN tags AS tags_1 ON tags_1.id = tasks_tags_1.tag_id) ON tasks.id = tasks_tags_1.task_id LEFT OUTER JOIN guests AS guests_1 ON tasks.id = guests_1.task_id LEFT OUTER JOIN errors AS errors_1 ON tasks.id = errors_1.task_id
WHERE tasks.status = %(status_1)s ORDER BY tasks.added_on DESC
2024-09-18 18:32:38,703 INFO sqlalchemy.engine.Engine [cached since 353.3s ago] {'status_1': 'pending'}
2024-09-18 18:32:38,703 [sqlalchemy.engine.Engine] INFO: [cached since 353.3s ago] {'status_1': 'pending'}
2024-09-18 18:32:38,711 INFO sqlalchemy.engine.Engine SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
WHERE machines.reserved = false
2024-09-18 18:32:38,711 [sqlalchemy.engine.Engine] INFO: SELECT machines.id AS machines_id, machines.name AS machines_name, machines.label AS machines_label, machines.arch AS machines_arch, machines.ip AS machines_ip, machines.platform AS machines_platform, machines.interface AS machines_interface, machines.snapshot AS machines_snapshot, machines.locked AS machines_locked, machines.locked_changed_on AS machines_locked_changed_on, machines.status AS machines_status, machines.status_changed_on AS machines_status_changed_on, machines.resultserver_ip AS machines_resultserver_ip, machines.resultserver_port AS machines_resultserver_port, machines.reserved AS machines_reserved, tags_1.id AS tags_1_id, tags_1.name AS tags_1_name
FROM machines LEFT OUTER JOIN (machines_tags AS machines_tags_1 JOIN tags AS tags_1 ON tags_1.id = machines_tags_1.tag_id) ON machines.id = machines_tags_1.machine_id
WHERE machines.reserved = false
2024-09-18 18:32:38,725 INFO sqlalchemy.engine.Engine [cached since 29.73s ago] {}
2024-09-18 18:32:38,725 [sqlalchemy.engine.Engine] INFO: [cached since 29.73s ago] {}
2024-09-18 18:32:38,731 [modules.machinery.az] DEBUG: Trying <bound method UsageOperations.list of <azure.mgmt.compute.v2024_07_01.operations._operations.UsageOperations object at 0x7fab720be980>>(('XXX',),{})
2024-09-18 18:32:39,030 [modules.machinery.az] DEBUG: Scaling vmsscape-win10x64-60 size from 0 -> 1
2024-09-18 18:32:39,045 [modules.machinery.az] DEBUG: Trying <bound method VirtualMachineScaleSetsOperations.get of <azure.mgmt.compute.v2024_07_01.operations._operations.VirtualMachineScaleSetsOperations object at 0x7fab7172f8e0>>(('XXX', 'vmsscape-win10x64-60'),{})
2024-09-18 18:32:39,133 [modules.machinery.az] DEBUG: Trying <bound method VirtualMachineScaleSetsOperations.begin_update of <azure.mgmt.compute.v2024_07_01.operations._operations.VirtualMachineScaleSetsOperations object at 0x7fab6fe8ec50>>(('XXX', 'vmss-cape-win10x64-60', <azure.mgmt.compute.v2024_07_01.models._models_py3.VirtualMachineScaleSet object at 0x7fab6fe8f490>),{'polling_interval': 1})

@doomedraven
Copy link
Collaborator

could it be related to another AZ issues that is in WIP? as we don't run it we can't help here

@cccs-mog
Copy link
Contributor Author

Hi @doomedraven, I can't say for sure. I am trying to resolve the issue and have tried a lot of things. Just wanted to open it here to see if anyone is either facing the same issue or was able to make Azure work since the revamp of the database which happened a while ago. Also maybe a quick glance from @tbeadle might help since he's probably way better than I am on DB issues.

@doomedraven
Copy link
Collaborator

yes make sense

@cccs-mog
Copy link
Contributor Author

So I was able to make it work. It seem that for some reasons the function add_machine() from the database require a flush() and commit(). I don't know all the involvement on that but it is strange to me that AZ would be the only machinery affected by this. It could be a versionning issue somehow. Python3.10.14, SQLAlchemy==1.4.50, SQLAlchemy-Utils==0.41.1,sqlparse==0.5.0 which is exactly the version from the requirements.txt and poetry.

@doomedraven
Copy link
Collaborator

so far versions looks ok %) but yes is strange, i did deployment of new servers on GCP 2 days ago and no issues with kvm

@ChrisThibodeaux
Copy link
Contributor

ChrisThibodeaux commented Sep 18, 2024

@cccs-mog Can you try these on your end in the following order. Before/after each you should start/stop the cape.service:

  1. Run cleaners.py --clean.
  2. Try deleting all of the VMSS instances in Azure. Once they are completely removed, run cleaners.py --clean.
  3. Try the same steps as 2, but delete the entire VMSS instead of the instances.

I noticed that I was getting errors when a VMSS from a previous cape.service run. I'm pretty sure it was only when the previous run's machines still existed at startup. For my case, running the cleaners util fixed that.

@ChrisThibodeaux
Copy link
Contributor

ChrisThibodeaux commented Sep 18, 2024

Sorry in advanced for the long comment. I have tracked down what causes my issue described above. You may not be experiencing the same thing, but something very similar.

scheduler.py - Note the session context manager.

    def start(self):
        """Start scheduler."""
        if self.machinery_manager:
            with self.db.session.begin():
                self.machinery_manager.initialize_machinery()

machinery_manager.py

    def initialize_machinery(self) -> None:
        """Initialize the machines in the database and initialize routing for them."""
        try:
            self.machinery.initialize()

abstracts.py

    def initialize(self) -> None:
        """Read, load, and verify machines configuration."""
        # Machine table is cleaned to be filled from configuration file
        # at each start.
        self.db.clean_machines()

        # Load.
        self._initialize()

        # Run initialization checks.
        self._initialize_check()

database.py - Here, existing machines in the database are removed correctly, but the transaction is still within the original session context manager.

    def clean_machines(self):
        """Clean old stored machines and related tables."""
        # Secondary table.
        # TODO: this is better done via cascade delete.
        # self.engine.execute(machines_tags.delete())

        self.session.execute(machines_tags.delete())
        self.session.query(Machine).delete()

Finally, az.py - We override _initialize and _initialize_check, still acting within the scheduler.py session context manager. In the az machinery, begin_reimage_all is called on an existing, required VMSS with greater than 0 machines. This that flow.

def _initialize_check(self):
    ...
    self._set_vmss_stage()

def _set_vmss_stage(self):
    ...
        for vmss, vals in self.required_vmsss.items():
            if vals["exists"] and not self.options.az.just_start:
                if machine_pools[vmss]["size"] == 0:
                    self._thr_scale_machine_pool(self.options.az.scale_sets[vmss].pool_tag, True if vals["platform"] else False),
                else:
                    # Reimage VMSS!
                    thr = threading.Thread(
                        target=self._thr_reimage_vmss,
                        args=(vmss,),
                    )
                    vmss_reimage_threads.append(thr)
                    thr.start()
    ...
        for thr in vmss_reimage_threads + vmss_creation_threads:
            thr.join()
    
def _thr_reimage_vmss(self, vmss_name):
    ...
    try:
        async_reimage_all = Azure._azure_api_call(
            self.options.az.sandbox_resource_group,
            vmss_name,
            polling_interval=1,
            operation=self.compute_client.virtual_machine_scale_sets.begin_reimage_all,
        )
        _ = self._handle_poller_result(async_reimage_all)
    except CuckooMachineError as e:
    ...
    with self.db.session.begin():
        self._add_machines_to_db(vmss_name)

That new session context manager at the end is the issue, but also necessary. While the main thread responsible for initializing the az machinery has still not closed, we create a subthread that creates its own session context manager. Because the clean_machines transaction still hasn't been committed, the machines from the previous run are present as far as the subthread knows when it hits the DB.

In _add_machines_to_db, the variable machines_in_db has the machines we assume were removed, so this check is going to skip adding the machine to the DB:

            machines_in_db = self.db.list_machines()
            db_machine_labels = [machine.label for machine in machines_in_db]
            ...
            # Get all VMs in the VMSS
            paged_vmss_vms = Azure._azure_api_call(
                self.options.az.sandbox_resource_group,
                vmss_name,
                operation=self.compute_client.virtual_machine_scale_set_vms.list,
            )
            ...
                for vmss_vm in paged_vmss_vms:
                if vmss_vm.name in db_machine_labels:
                    # Don't add it if it already exists!
                    continue

Snippet of logs demonstrating this behavior:

[lib.cuckoo.core.machinery_manager] INFO: Using MachineryManager[az] with max_machines_count=0
[lib.cuckoo.core.scheduler] INFO: Creating scheduler with max_analysis_count=unlimited
[lib.cuckoo.core.database] WARNING: In database._Database.clean_machines before delete: self.list_machines() = [<Machine(2,'cape-guest-vmss-win10x64_20')>]
[lib.cuckoo.core.database] WARNING: In database._Database.clean_machines after delete: self.list_machines() = []
[modules.machinery.az] WARNING: Start of _set_vmss_stage: self.db.list_machines() = []
...
[modules.machinery.az] WARNING: Before joining threads in _set_vmss_stage: self.db.list_machines() = []
# Note, the log below returns the same list of machines that we got before we deleted them in clean_machines
[modules.machinery.az] WARNING: In _thr_reimage_vmss: self.db.list_machines() = [<Machine(2,'cape-guest-vmss-win10x64_20')>]
[modules.machinery.az] WARNING: Start of _add_machines_to_db: self.db.list_machines() = [<Machine(2,'cape-guest-vmss-win10x64_20')>].
[modules.machinery.az] DEBUG: Adding machines to database for cape-guest-vmss-win10x64.
[modules.machinery.az] DEBUG: Trying <bound method NetworkInterfacesOperations.list_virtual_machine_scale_set_network_interfaces of <azure.mgmt.network.operations._operations.NetworkInterfacesOperations object at 0x75b0a7712ef0>>(('XXX', 'cape-guest-vmss-win10x64'),{})
[modules.machinery.az] WARNING:In 'if vmss_vm.name in db_machine_labels' block of _add_machines_to_db: cape-guest-vmss-win10x64_20 exists in the db, skipping.
[modules.machinery.az] WARNING: After subthreads exit in _set_vmss_stage: self.db.list_machines() = [].
[root] CRITICAL: CuckooCriticalError: No machines available

The problem you're running into seems a little different. You're machines are somehow being added to ready_vmss_vm_threads, as evident by this log [modules.machinery.az] DEBUG: Machine vmss-cape-win10x64-60_0 was created and available in 0s. I am having a hard time understanding how you are reaching this point in the code, though.

From your logs, I think this is the part that indicates our problem is at least related. Time stamps line up.

# Likely logs from the creation of the session context manager in the scheduler
2024-09-18 18:26:45,417 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-09-18 18:26:45,417 [sqlalchemy.engine.Engine] INFO: BEGIN (implicit)

# The critical failure
2024-09-18 18:32:22,260 [root] CRITICAL: CuckooCriticalError: No machines available

# Further on, logs showing cached with one transaction pending.
2024-09-18 18:32:38,703 INFO sqlalchemy.engine.Engine [cached since 353.3s ago] {'status_1': 'pending'}
2024-09-18 18:32:38,703 [sqlalchemy.engine.Engine] INFO: [cached since 353.3s ago] {'status_1': 'pending'}

@ChrisThibodeaux
Copy link
Contributor

ChrisThibodeaux commented Sep 19, 2024

All that to say, I don't have a good idea for a fix here. Maybe it would be possible to move the self.db.clean_machines() call out of the initialize method, but I don't know if that is going to break things for folks using other hypervisors.

For the time being, my first comment has been a sure-fire workaround for this.

@cccs-mog
Copy link
Contributor Author

Hi @ChrisThibodeaux, I also faced that issue in the past and applied a similar approach for a temporary fix. It is somehow of a pain but only on each restart. It doesn't seem like anything changed so I think it's because I started with a fresh db or something like that. In any case like you point out, the issue is similar and is caused by the session span without commit interlocking needs and conditions which are contradicting. I would assume it is possible to keep the logic intact for every machinery but separate the session manager in smaller units which would fix it for AZ. I will do some tests, then PR so the fix can be tested and hopefully work well for all machinery.

@ChrisThibodeaux
Copy link
Contributor

ChrisThibodeaux commented Sep 19, 2024

@cccs-mog Looking forward to the PR, would be great to have a fix for this.

I am wondering now if this is related to another issue I have seen, maybe you can provide some insight here. Suppose I have:

  1. CAPE running with a VMSS
  2. A given machine finishes a task and we attempt to reimage it
  3. Somewhere, somehow, something goes wrong. From that point on, once any machine running an analysis finishes, it is unable to pick up a new task.
  4. Reimaged machine is also never able to pick up another task

It has only ever happened to me a couple times, and unfortunately I did not have the DB logs enabled at the time. I have patched the az.py machinery with a fix locally that has prevented it from ever happening again. I am hoping that this is somehow related to the current issues we have discussed.

Edit:

Had it crop up right after I made this comment... Seems like a couple of issues to unpack, all stemming from a problem on my end with Azure taking so long to delete/reimage machines. No idea what that's about.. For now, I'm going to copy what you have in your PR and bump the timeout up to 300 seconds.

Those timeouts make the threads call delete_machine, but the DB removals don't persist (per the session manager issue). Since out monitor thread -> scaling thread can't delete the machine before the scheduler assigns it a new task, Azure timeouts for reimaging/deletions are critical failures.

I have a fix by checking the machine in the az.start method. I'll be putting a PR up for it and a few other minor changes when I can confirm it fixes the problem. Here are the logs from when this happened if you're curious.

**# Two machines, 36/37, finish their tasks.**
2024-09-20 02:02:09,378 [lib.cuckoo.core.guest] INFO: Task #2: End of analysis reached! (id=cape-guest-vmss-win10x64_37, ip=10.2.2.7)
2024-09-20 02:02:10,686 [lib.cuckoo.core.guest] INFO: Task #3: End of analysis reached! (id=cape-guest-vmss-win10x64_36, ip=10.2.2.6)
2024-09-20 02:02:15,448 [lib.cuckoo.core.resultserver] DEBUG: Task #2: Stopped tracking machine 10.2.2.7
2024-09-20 02:02:15,451 [modules.machinery.az] DEBUG: Stopping machine 'cape-guest-vmss-win10x64_37'
2024-09-20 02:02:15,568 [lib.cuckoo.core.resultserver] DEBUG: Task #3: Stopped tracking machine 10.2.2.6
2024-09-20 02:02:15,570 [modules.machinery.az] DEBUG: Stopping machine 'cape-guest-vmss-win10x64_36'
**# Reimaging thread makes API call to reimage 36/37.**
2024-09-20 02:02:18,796 [modules.machinery.az] DEBUG: Trying <bound method VirtualMachineScaleSetsOperations.begin_reimage_all of <azure.mgmt.compute.v2024_03_01.operations._operations.VirtualM>
...
**# Reimaging thread removes the machines from the database and queues them to be removed from the VMSS.**
2024-09-20 02:04:19,123 [modules.machinery.az] DEBUG: Reimaging machines ['36', '37'] in vmss-win10x64 took too long, deleting them from the DB and the VMSS.
2024-09-20 02:04:19,134 [modules.machinery.az] DEBUG: Reimaging instances ['36', '37'] in vmss-win10x64 took 120s
**# Thread finishes and returns. Machines appear back in the DB.**
**# Machine 37 is scheduled for another task. As it still exists on the VMSS, it runs.**
2024-09-20 02:04:20,877 [lib.cuckoo.core.machinery_manager] INFO: Task #7: found useable machine vmss-win10x64_37 (arch=x64, platform=windows)
2024-09-20 02:04:20,877 [lib.cuckoo.core.scheduler] INFO: Task #7: Processing task
...
2024-09-20 02:04:20,899 [lib.cuckoo.core.analysis_manager] INFO: Task #7: Starting analysis of FILE '/tmp/cuckoo-tmp/upload_8feacaqv/XXX'
...
2024-09-20 02:04:20,935 [lib.cuckoo.core.guest] INFO: Task #7: Starting analysis on guest (id=cape-guest-vmss-win10x64_37, ip=10.2.2.7)
2024-09-20 02:04:20,947 [lib.cuckoo.core.guest] INFO: Task #7: Guest is running CAPE Agent 0.17 (id=cape-guest-vmss-win10x64_37, ip=10.2.2.7)
2024-09-20 02:04:20,999 [lib.cuckoo.core.guest] DEBUG: Task #7: Uploading analyzer to guest (id=cape-guest-vmss-win10x64_37, ip=10.2.2.7, size=16933455)
2024-09-20 02:04:21,414 [lib.cuckoo.core.guest] INFO: Task #7: Uploading script files to guest (id=cape-guest-vmss-win10x64_37, ip=10.2.2.7)
2024-09-20 02:04:21,900 [lib.cuckoo.core.resultserver] DEBUG: Task #7: live log analysis.log initialized
**# This was the original task of machine 37**
2024-09-20 02:04:22,356 [lib.cuckoo.core.resultserver] DEBUG: Task #2 had connection reset by peer for <Context for None>
...
**# VMSS instance deletion API call for 36/37**
2024-09-20 02:04:23,867 [modules.machinery.az] DEBUG: Trying <bound method VirtualMachineScaleSetsOperations.begin_delete_instances of <azure.mgmt.compute.v2024_03_01.operations._operations.Virt>
...
**# After deletion call, funky /status failed logs start to appear.**
Sep 20 02:04:36 hostname python3[2854305]: 2024-09-20 02:04:36,011 [lib.cuckoo.core.guest] WARNING: Task #7: Virtual Machine cape-guest-vmss-win10x64_37 /status failed. This can indicate the guest losing network connectivity
Sep 20 02:04:37 hostname python3[2854305]: 2024-09-20 02:04:37,015 [lib.cuckoo.core.guest] DEBUG: Task #7: Analysis is still running (id=cape-guest-vmss-win10x64_37, ip=10.2.2.7)
...
**# Deletion times out.**
2024-09-20 02:06:24,174 [modules.machinery.az] DEBUG: Deleting machines ['36', '37'] in cape-guest-vmss-win10x64 took too long.
2024-09-20 02:06:24,174 [modules.machinery.az] DEBUG: Deleting instances ['36', '37'] in cape-guest-vmss-win10x64 took 120s
...
**# Finally, machine 37 "finishes" the task.**
2024-09-20 02:08:41,187 [lib.cuckoo.core.guest] DEBUG: Task #7: Analysis is still running (id=cape-guest-vmss-win10x64_37, ip=10.2.2.7)
2024-09-20 02:08:43,251 [lib.cuckoo.core.guest] WARNING: Task #7: Virtual Machine cape-guest-vmss-win10x64_37 /status failed. This can indicate the guest losing network connectivity
2024-09-20 02:08:44,255 [lib.cuckoo.core.guest] INFO: Task #7: End of analysis reached! (id=cape-guest-vmss-win10x64_37, ip=10.2.2.7)
2024-09-20 02:08:44,403 [lib.cuckoo.core.resultserver] DEBUG: Task #7: Stopped tracking machine 10.2.2.7
2024-09-20 02:08:44,404 [modules.machinery.az] DEBUG: Stopping machine 'cape-guest-vmss-win10x64_37'
**# Reimaging attempt is made**
2024-09-20 02:08:44,627 [modules.machinery.az] DEBUG: Trying <bound method VirtualMachineScaleSetsOperations.begin_reimage_all of <azure.mgmt.compute.v2024_03_01.operations._operations.VirtualMa>
**# Attempt fails because the machine was removed from the VMSS.**
2024-09-20 02:08:44,757 [modules.machinery.az] WARNING: Failed to <bound method VirtualMachineScaleSetsOperations.begin_reimage_all of <azure.mgmt.compute.v2024_03_01.operations._operations.Virt>
Code: InvalidParameter
Message: The provided instanceId 37 is not an active Virtual Machine Scale Set VM instanceId.
Target: instanceIds': '(InvalidParameter) The provided instanceId 37 is not an active Virtual Machine Scale Set VM instanceId.
Code: InvalidParameter
Message: The provided instanceId 37 is not an active Virtual Machine Scale Set VM instanceId.
Target: instanceIds'.
2024-09-20 02:08:44,758 [modules.machinery.az] ERROR: CuckooMachineError('(InvalidParameter) The provided instanceId 37 is not an active Virtual Machine Scale Set VM instanceId.\nCode: InvalidPa>
lib.cuckoo.common.exceptions.CuckooMachineError: (InvalidParameter) The provided instanceId 37 is not an active Virtual Machine Scale Set VM instanceId.
Code: InvalidParameter
Message: The provided instanceId 37 is not an active Virtual Machine Scale Set VM instanceId.
Target: instanceIds:(InvalidParameter) The provided instanceId 37 is not an active Virtual Machine Scale Set VM instanceId.
Code: InvalidParameter
Message: The provided instanceId 37 is not an active Virtual Machine Scale Set VM instanceId.
Target: instanceIds
2024-09-20 02:08:44,766 [modules.machinery.az] WARNING: Machine cape-guest-vmss-win10x64_37 does not exist anymore. Deleting from database.
2024-09-20 02:08:49,436 [lib.cuckoo.core.analysis_manager] INFO: Task #7: Completed analysis successfully.
2024-09-20 02:08:49,442 [lib.cuckoo.core.analysis_manager] INFO: Task #7: analysis procedure completed

@ChrisThibodeaux
Copy link
Contributor

Hey @cccs-mog, very big thanks to you and @cccs-kevin for all the work y'all have done on this module and in the replies across different issues. Both have been an immense help to me.

I have a PR with a fix for what you faced here. I addressed a few other bugs dealing with timeouts, machine failures, etc. I tried to keep my changes in step with the spirit of the machinery as it was written, so please let me know if anything looks out of place.

Hope this helps!

@doomedraven
Copy link
Collaborator

so i guess we can close this now?

@cccs-mog
Copy link
Contributor Author

Exactly. Thanks a lot to you two !

@kevoreilly
Copy link
Owner

Much gratitude 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants