Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add SIGTERM handling #297

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

raizo07
Copy link
Contributor

@raizo07 raizo07 commented Nov 5, 2024

Add SIGTERM handling to scripts/data/client.py

Copy link

vercel bot commented Nov 5, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
raito ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 16, 2024 9:13am

@raizo07
Copy link
Contributor Author

raizo07 commented Nov 5, 2024

@maciejka It exits correctly.
image

logger.debug("Producer is waiting for weight to be released.")
weight_lock.wait() # Wait for the condition to be met
weight_lock.wait()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen when SIGTERM is sent when thread is waiting for a lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threads waiting on the lock will wait until they are notified (either by a timeout or by other threads) before they can check the shutdown_event and exit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means SIGTERM will not terminate the program immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will not terminate the program immediately rather it initiates a graceful shutdown process which will allow active threads to finish their current tasks and lets waiting threads proceed once they acquire the lock.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of sending SIGTERM is to stop immediately.

@raizo07 raizo07 requested a review from maciejka November 6, 2024 07:34
…nding jobs, stop all running processes shutdown the thread pool without waiting and exit
def process_batch(job):
arguments_file = job.batch_file.as_posix().replace(".json", "-arguments.json")
if shutdown_event.is_set():
Copy link
Collaborator

@maciejka maciejka Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdown_event is checked in job_consumer, no need to recheck it here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

error = result.stdout or result.stderr
if result.returncode == -9:
match = re.search(r"gas_spent=(\d+)", result.stdout)
while process.poll() is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract process running logic into a standalone function (name suggestion: run?) that will throw an exception in case of shutdown.

# If executor exists, shutdown immediately
if executor:
logger.info("Shutting down thread pool...")
executor.shutdown(wait=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wait is False? Don't we want to wait until it is shutdown?

scripts/data/client.py Outdated Show resolved Hide resolved
shutdown_event.set()

# Clear the job queue
while not job_queue.empty():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signal handler modifying job_queue does not look like a good design. I would just set shutdown_event here.
In job_producer when shutdown is detected all consumers can be notified with notify_all. Cleaning job_queue does not seem necessary.

def process_batch(job):
arguments_file = job.batch_file.as_posix().replace(".json", "-arguments.json")
if shutdown_event.is_set():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

af.write(str(format_args(job.batch_file, False, False)))

result = subprocess.run(
def run_process(arguments_file, job):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_process should be more generic. I mean it should just manage subprocess and return error (stdout or stderr). Error interpretation should be done in process_batch. I want to keep clear separation between functions that facilitate multiprocessing and those that handle bussines logic (job_generator, process_batch)

for _ in range(THREAD_POOL_SIZE):
job_queue.put(None)

if not shutdown_event.is_set():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary? If there are no more jobs we just let consumers finish, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you let this happen during shutdown you will not need to wait with timeout in line 224. Information that there is no more work will reach consumers.

def job_producer(job_gen):
global current_weight

try:
for job, weight in job_gen:
# Wait until there is enough weight capacity to add the new block
if shutdown_event.is_set():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? shutdown_event is checked few lines below.

for height in height_range:
if shutdown_event.is_set():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check shutdown_event here, it is checked in job_producer:
image

logger = logging.getLogger(__name__)

# Constants
MAX_WEIGHT_LIMIT = 8000 # Total weight limit for all jobs
THREAD_POOL_SIZE = os.cpu_count() # Number of threads for processing
QUEUE_MAX_SIZE = THREAD_POOL_SIZE * 2 # Maximum size of the job queue

BASE_DIR = Path(".client_cache")
BASE_DIR = Path(__file__).resolve().parent / ".client_cache" # Use absolute path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change is necessary?

af.write(str(format_args(job.batch_file, False, False)))

result = subprocess.run(
def run_process(arguments_file):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should take whole command as an argument



# Producer function: Generates data and adds jobs to the queue
def job_producer(job_gen):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you remove comments?

scripts/data/client.py Show resolved Hide resolved
(job, weight) = work_to_do

# Process the block
job, weight = work_to_do
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why log is removed here?


# Mark job as done
job_queue.task_done()
finally:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why except block is removed? Exceptions might happen during execution. You probably need to handle process termination exception explicitly.

@maciejka
Copy link
Collaborator

Hey @raizo07 there are many unanswered comments. Please respond to all of them.

@raizo07
Copy link
Contributor Author

raizo07 commented Nov 23, 2024

@maciejka Okay, I'll respond to them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feat] Add SIGTERM handling to scripts/data/client.py
2 participants