Skip to content

Conversation

Gawthaman
Copy link

Integrating MongoDB MVP

@Copilot Copilot AI review requested due to automatic review settings July 7, 2025 21:48
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates a MongoDB-backed MVP for persisting and replaying Ax optimization trials for the Branin benchmark.

  • Adds a MongoDB client connection and collections for storing experiments and trials
  • Implements loading of existing trials and replaying them before starting new ones
  • Persists each completed trial back to MongoDB
Comments suppressed due to low confidence (1)

scripts/mongodbintegrationmvp.py:22

  • [nitpick] The variable name tmongo_client is unclear. Rename it to mongo_client for better readability and consistency.
tmongo_client = MongoClient("mongodb://localhost:27017/") 

@sgbaird
Copy link
Member

sgbaird commented Jul 7, 2025

@Gawthaman thanks! Could you have a look at facebook/Ax#2975 (comment)? I think you may need to use the built in serialization function directly.

Copy link
Member

@sgbaird sgbaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this avoids the issue of never switching to the next generation strategy, there are some other issues with it related to preserving space-filling properties. Let's try with Ax's built-in method (see comment).

remaining_sobol = max(SOBOL_TRIALS - n_existing, 0)

if remaining_sobol > 0:
generation_strategy = GenerationStrategy([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give this a try using Ax's built-in serialization method (save_to_json_file) and using the JSON snapshots that get sent to and reloaded from. In the current implementation, it doesn't preserve the Sobol sequence.

It should follow the same logic in https://colab.research.google.com/drive/1A2p1oUSsD8Edlu2haaB-FBSSjim3HTLS?usp=sharing

Please also add to each MongoDB document the timestamp of when the snapshot is saved and use the timestamp to load the most recent snapshot each time.

Copy link
Member

@sgbaird sgbaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you were ready for a review, but I took a look and made a couple comments. Makes sense to have the save and load helper functions. Overall the logic looks solid to me


if ax_client is None:
# Create new experiment
generation_strategy = create_generation_strategy(SOBOL_TRIALS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please unwrap this function

Copy link
Member

@sgbaird sgbaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gawthaman nice! Overall seems to capture the right logic. Many print statements and try-excepts can be removed. See other comments.

Once addressed, could you record a video demonstrating the usage of the script and share? (Running the campaign, artificially/ungracefully stopping partway at different points in the script, restarting it, etc.).

Probably upload to YouTube as unlisted and paste here (video would probably be over 100 MB). Narrated or unnarrated is fine.

except errors.ServerSelectionTimeoutError:
print("Failed to connect to MongoDB. Is MongoDB running?")
exit(1)
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this explicit error handling? Seems like it would just bubble up naturally (unless you found that the error that bubbled up naturally was non-descript).

As a note for later, we'll set this up with a MongoDB Atlas cluster

SOBOL_TRIALS = 5


def save_ax_snapshot_to_mongodb(ax_client, experiment_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return the database ID of the snapshot?

if record:
# Save snapshot data to temporary file
temp_file = f"temp_{experiment_name}_snapshot.json"
with open(temp_file, "w") as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this handle the case where the file already exists? Worth adding the database ID to the filename in addition to experiment name to avoid write conflicts?



# Load existing experiment or create new one
ax_client = load_ax_snapshot_from_mongodb(obj1_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't reuse obj1_name like this. Instead define a separate variable. Could incorporate obj1_name via f-string. Probably good to also add a hard-coded set of 4 characters (randomly generated externally) to it.

max_parallelism=5,
model_kwargs={"seed": 999}, # For reproducibility
),
GenerationStep(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're using snapshots, probably ok to leave to defaults and not specify an explicit generation strategy

print(f"Created new experiment with {SOBOL_TRIALS} Sobol trials")

# Save initial snapshot
save_ax_snapshot_to_mongodb(ax_client, obj1_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about experiment name

print("Resuming existing experiment")

# Get current trial count to determine how many more trials to run
current_trials = ax_client.get_trials_data_frame()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point about needing to handle max trials in the for loop. I suppose an alternative would be to have a budget variable stored in MongoDB that gets updated, but ignore that for now. More of a note to self/musing.

f"Trial {trial_index}: result={results:.3f} | "
f"Best so far: {best_value:.3f}"
)
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably remove these try-excepts

print(f"Total trials completed: {len(trials_df)}")
print(f"Best objective value: {trials_df[obj1_name].min():.6f}")

except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no need for try except here

print(f"Error getting best parameters: {e}")

# Clean up MongoDB connection
mongo_client.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to close the client. I suppose a top-level try-except-finally could be implemented at some point if we find that lots of mongo connections are being leftover during restarts, but I think probably not an issue. Again, just a musing

@sgbaird
Copy link
Member

sgbaird commented Jul 24, 2025

Also good to switch to using a free-tier MongoDB Atlas instance prior to recording video, and showing the snapshots in the cloud.

Some (not all) bits from https://github.com/ACC-HelloWorld/5-data-logging#mongodb-setup-and-github-secrets can help with setup. Ignore AWS setup for example.

@Gawthaman Gawthaman requested a review from sgbaird July 28, 2025 18:18
@Gawthaman
Copy link
Author

Copy link
Member

@sgbaird sgbaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Made a few comments. Overall looks clean and minimal. Thanks for making the edits I requested related to unwrapping

Please put this under a subdir called hitl-bo within https://github.com/Gawthaman/ac-training-lab/tree/main/scripts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively a helper script for your demo video, correct?

return ''.join(random.choices(string.ascii_lowercase + string.digits, k=length))


def get_user_choice():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e., if the script gets interrupted after a user has already begun their wetlab experiment, they get to say whether they want to provide an update of results from the previous experiment or not?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, whenever the script is run, it asks the user if they want to start a new experiment or continue from the last experiment. It mainly works as a workaround since I dont think the script knows if the last experiment was finished or not, so if it was finished, then the user can just start a new experiment. It also gives the user the chance to start a new experiment from scratch if they decide they don't want to start from the last (interrupted) experiment.



# Get user choice and setup configuration
user_choice = get_user_choice()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nvm - experiment in the Ax definition (i.e., the whole campaign). This seems fine.

print(f"Trial {trial_index}: x1={x1:.3f}, x2={x2:.3f}")

# Save snapshot before running experiment (preserves pending trial)
save_ax_snapshot_to_mongodb(ax_client, experiment_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good that you're doing before and after snapshots. Not sure if we'd want to distinguish this difference in the payload (or filename)

obj1_name = "branin"
MAX_TRIALS = 19 # Configuration constant

# Experiment identifier (separate from objective name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Experiment identifier (separate from objective name)
# Experiment identifier (separate from objective name) with hardcoded unique ID


# Start the experiment as a subprocess
process = subprocess.Popen([
sys.executable, "Untitled-1.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see lots of references to "Untitled-1.py"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially working on this in a different temp folder to test out different scenarios, but I forgot to fix file names, I'll fix that asap

…ation scripts

- Deleted old persistence testing scripts: simple_test.py and test_persistence.py.
- Added new scripts for manual and automated testing of MongoDB persistence: check_persistence.py, mongodbintegration.py, mongodbintegrationmvp.py, simple_test.py, and test_persistence.py.
- Enhanced MongoDB connection handling and error reporting in all scripts.
- Implemented functionality to save and load experiment snapshots to/from MongoDB.
- Added user prompts for experiment continuation or creation in the new integration scripts.
- Improved trial management and recovery testing in the new test scripts.
Copy link
Member

@sgbaird sgbaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need to think about what happens if someone decides to change the optimization setup later on. For example, adding a new parameter to explore. This script more or less assumes that the campaign doesn't change from start to end, and likewise that it starts from scratch (that's fine, this is meant to be a MWE, and the ability to stop/restart is well-received).

The previously saved AxClient objects will serve as a record regardless. While using these as checkpoints is convenient, it also reduces some of the flexibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HiTL BO tutorial - Integrate BO with MongoDB instead of relying solely on a locally stored save_to_json_file
2 participants