Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network clustering #1053

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

edlerd
Copy link
Collaborator

@edlerd edlerd commented Jan 11, 2025

Done

  • adjust network list to show cluster member specific networks
  • split network list entries and detail pages to be one entry/page per cluster member for physical interfaces
  • add pattern for cluster specific inputs for a physical managed networks parent. This is to be reused for other cluster specific inputs like in server settings or for storage pool configuration

QA

  1. Run the LXD-UI:
  2. Perform the following QA steps:
    • Browse the network list in an unclustered backend, check the filters
    • Browse the network list in a clustered backend, use the filters and clicking on the cluster member chips applies filtering
    • Create and edit a physical network in an unclustered backend
    • Create and edit a physical network in a clustered backend, ensure the connections diagram is updating and the chips in it linking correctly. Ensure the chips in the "parent" selector link correctly. Try changing and breaking the parent selector in the clustered backend when creating or editing a physical network.
    • Browse a physical unmanaged network in a clustered and unclustered backend

@webteam-app
Copy link

@edlerd edlerd force-pushed the network-clustering branch 6 times, most recently from 54c1afa to e2e4075 Compare January 15, 2025 15:44
@edlerd edlerd changed the title Network clustering (wip) Network clustering Jan 15, 2025
@edlerd edlerd marked this pull request as ready for review January 15, 2025 15:46
@edlerd edlerd force-pushed the network-clustering branch 2 times, most recently from add1260 to 2379d77 Compare January 15, 2025 16:49
Copy link

@MasWho MasWho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some interim code comments. The network list with filtering looks pretty good (cluster and non-cluster conditions) from QA perspective. Will go through the other QA items as well.

src/pages/networks/NetworkSearchFilter.tsx Outdated Show resolved Hide resolved
src/pages/networks/NetworkSearchFilter.tsx Outdated Show resolved Hide resolved
src/pages/networks/NetworkSearchFilter.tsx Outdated Show resolved Hide resolved
src/api/networks.tsx Outdated Show resolved Hide resolved
src/api/networks.tsx Outdated Show resolved Hide resolved
src/api/networks.tsx Outdated Show resolved Hide resolved
src/api/networks.tsx Outdated Show resolved Hide resolved
src/api/networks.tsx Outdated Show resolved Hide resolved
isLoading: isClusterNetworksLoading,
} = useQuery({
queryKey: [queryKeys.networks, "default", queryKeys.cluster],
queryFn: () => fetchClusterMemberNetworks("default", clusterMembers),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a specific scenario we need to take into consideration here. It is possible to create OVN networks for specific restricted projects (even with ip ranges that overlaps other exisitng ovn networks), not sure if fetching networks from the default project will surface those?

See this video ref, watch from around 17:00

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. We should use the current project here and not "default". We want to fetch all real interfaces for each cluster member. Those are either available on the current project (if has features.networks set to false) or not available. In the later case, we should not show them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a pretty niche case though, afaik this only applies to OVN networks due to their virtual nature

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uplink network for OVN needs to be in the default project. In the uplink selector we do fetch networks from the default project regardless of which project the user is currently in. I think we should not show those in the network list, though. So I think we are doing the right thing even for all cases.

src/pages/networks/NetworkList.tsx Outdated Show resolved Hide resolved
@mas-who
Copy link
Collaborator

mas-who commented Jan 16, 2025

Some QA observations below:

  1. When clicking on the cluster member resource links in the network list table, all existing search params gets cleared. Why not add the the member search to the existing search params? Is it because it would be a bit weird in the case that the member search is already present, and it would look like nothing is happening?

  2. Noticed that on smaller screen sizes the action buttons and the search filter are positioned weirdly. Maybe we can add the actions in a contextual menu similar to how we do it in the instance detail page. Then there will be more space for the search and filter as well. wdyt?
    Screenshot from 2025-01-16 13-40-30

  3. When creating a clustered physical network with the following parent configs
    Screenshot from 2025-01-16 14-00-19
    On submission I get the following error:
    Screenshot from 2025-01-16 14-01-03
    Trying to create the network with the same name fails at this point as it already exist in LXD.
    Screenshot from 2025-01-16 14-01-46
    I think we should disable the submit button if parents are not selected for all members?
    NOTE: the edit case seems fine, the backend seems to block the operation and the configs does not get persisted.

  4. After creating a physical network with some parent interface across cluster members, trying to create another network using the same parent interfaces results in a creation error indicating they are in use. However, the network still gets created in an "Errored" state.

  5. Clicking on the member resource link on a network detail page does not result in search params being set after redirect to the network list page. Is that intended?

  6. After creating an OVN network with a physical uplink, it is possible to delete the physical uplink. However, on the network topology for the OVN network, the deleted uplink still shows up some how.

  7. Observations for when a cluster member is down:
    a. Trying to create a physical network results in an error message "peer node 10.94.160.130:8443 is down". However, the network gets created and shows up in the network list with the "Cluster-wide" member category.
    b. It is not possible to edit a physical network with the same error message as above.
    c. It is not possible to delete the physical network with the same error message as above.
    d. trying to visit the physical network detail page for the vm that is down results in a 500 error "Missing event connection with target cluster member". Currently the detail page loads for a long time then shows a blank page, this should probably be handled by displaying the error message.

@edlerd edlerd force-pushed the network-clustering branch 3 times, most recently from 8f766bf to b0927a5 Compare January 16, 2025 17:22
@edlerd
Copy link
Collaborator Author

edlerd commented Jan 16, 2025

Resolved issues 1-3 and 5.

  1. After creating a physical network with some parent interface across cluster members, trying to create another network using the same parent interfaces results in a creation error indicating they are in use. However, the network still gets created in an "Errored" state.

This sounds like the expected behaviour. Wdyt?

  1. After creating an OVN network with a physical uplink, it is possible to delete the physical uplink. However, on the network topology for the OVN network, the deleted uplink still shows up some how.

This shows up, because the uplink config of the OVN network doesn't change when deleting the uplink network. It might be that LXD should refuse to delete the uplink, but that should be in the API then, not in the UI itself.

  1. Observations for when a cluster member is down:
    a. Trying to create a physical network results in an error message "peer node 10.94.160.130:8443 is down". However, the network gets created and shows up in the network list with the "Cluster-wide" member category.
    b. It is not possible to edit a physical network with the same error message as above.
    c. It is not possible to delete the physical network with the same error message as above.
    d. trying to visit the physical network detail page for the vm that is down results in a 500 error "Missing event connection with target cluster member". Currently the detail page loads for a long time then shows a blank page, this should probably be handled by displaying the error message.

This needs future work.

@edlerd edlerd force-pushed the network-clustering branch 2 times, most recently from 016a2d5 to 5b8158b Compare January 16, 2025 17:48
@mas-who
Copy link
Collaborator

mas-who commented Jan 17, 2025

Resolved issues 1-3 and 5.

  1. After creating a physical network with some parent interface across cluster members, trying to create another network using the same parent interfaces results in a creation error indicating they are in use. However, the network still gets created in an "Errored" state.

This sounds like the expected behaviour. Wdyt?

My main concern is that the network gets created even when it's not valid. Would it be possible to check upon parent selection if it is already used by another network and reflect that as an error message on the creation / edit form? If that's too complex I think the current behaviour is also fine.

  1. After creating an OVN network with a physical uplink, it is possible to delete the physical uplink. However, on the network topology for the OVN network, the deleted uplink still shows up some how.

This shows up, because the uplink config of the OVN network doesn't change when deleting the uplink network. It might be that LXD should refuse to delete the uplink, but that should be in the API then, not in the UI itself.

Noted, perhaps we should raise this with the core team?

  1. Observations for when a cluster member is down:
    a. Trying to create a physical network results in an error message "peer node 10.94.160.130:8443 is down". However, the network gets created and shows up in the network list with the "Cluster-wide" member category.
    b. It is not possible to edit a physical network with the same error message as above.
    c. It is not possible to delete the physical network with the same error message as above.
    d. trying to visit the physical network detail page for the vm that is down results in a 500 error "Missing event connection with target cluster member". Currently the detail page loads for a long time then shows a blank page, this should probably be handled by displaying the error message.

This needs future work.
Noted 👍

@edlerd edlerd force-pushed the network-clustering branch from 5b8158b to 9c42ee6 Compare January 17, 2025 08:45
@edlerd
Copy link
Collaborator Author

edlerd commented Jan 17, 2025

Improved the error handling, that should resolve 7.

  1. After creating a physical network with some parent interface across cluster members, trying to create another network using the same parent interfaces results in a creation error indicating they are in use. However, the network still gets created in an "Errored" state.

This sounds like the expected behaviour. Wdyt?

My main concern is that the network gets created even when it's not valid. Would it be possible to check upon parent selection if it is already used by another network and reflect that as an error message on the creation / edit form? If that's too complex I think the current behaviour is also fine.

I am not 100% sure we can never reuse an interface as parent. Reported this edge case to lxd: canonical/lxd#14810

  1. After creating an OVN network with a physical uplink, it is possible to delete the physical uplink. However, on the network topology for the OVN network, the deleted uplink still shows up some how.

This shows up, because the uplink config of the OVN network doesn't change when deleting the uplink network. It might be that LXD should refuse to delete the uplink, but that should be in the API then, not in the UI itself.

Noted, perhaps we should raise this with the core team?

I tried to reproduce it, but was getting an error. I couldn't delete the uplink in use by the OVN network. So this might not be an issue after all.

@edlerd edlerd force-pushed the network-clustering branch 2 times, most recently from bf5ade3 to fb1b4e9 Compare January 17, 2025 11:55
@edlerd edlerd force-pushed the network-clustering branch from fb1b4e9 to b52956e Compare January 17, 2025 12:22
Copy link
Collaborator

@mas-who mas-who left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! Couldn't actually find that many issues QA wise, left some code comments

export const fetchNetworksFromClusterMembers = (
project: string,
clusterMembers: LxdClusterMember[],
): Promise<LXDNetworkOnClusterMember[]> => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love how simple things look now with the latest type update! 👍

project: string,
parentsPerClusterMember?: ClusterSpecificValues,
): Promise<void> => {
if (!parentsPerClusterMember) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite sure why we need this here, for this scenario, can't we just use updateNetwork directly where ever we are calling updateClusterNetwork? The handling of undefined parentsPerClusterMember perhaps should be in the component instead?

if (error) {
notify.failure("Loading networks failed", error);
}
useEffect(() => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be in an useEffect?

export const intersection = (lists: string[][]): string[] => {
const result = [];

for (let i = 0; i < lists.length; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a potential way to improve the efficiency and readability of this could be to hold an object with keys being unique strings from lists, and values indicate the count of occurrence for each unique string. As we iterate the lists, we convert each list into a set, then increment the count for each unique string. At the end, all unique string items with a count that equates to the length of the input lists would have appeared in all.

networkOnMembers?: LXDNetworkOnClusterMember[],
): NetworkFormValues => {
const parentPerClusterMember: ClusterSpecificValues = {};
networkOnMembers?.map(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
networkOnMembers?.map(
networkOnMembers?.forEach(


const setValueForAllMembers = (value: string) => {
const update: ClusterSpecificValues = {};
options.map((item) => (update[item.memberName] = value));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
options.map((item) => (update[item.memberName] = value));
options.forEach((item) => (update[item.memberName] = value));

label="Same for all cluster members"
checked={!isSpecific}
onChange={() => {
if (isSpecific) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we place the setValueForAllMembers logic inside a useEffect?

<ResourceLink
type="cluster-member"
value={item.memberName}
to={`/ui/project/${project}/networks?member=${item.memberName}`}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we always want to redirect to the networks page with member filter? This feels a little strange, not sure what's the best way to approach it though, maybe expose this as a function prop?

Comment on lines +117 to +119
onChange={(value) => {
void formik.setFieldValue("parentPerClusterMember", value);
}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
onChange={(value) => {
void formik.setFieldValue("parentPerClusterMember", value);
}}
onChange={(value) => void formik.setFieldValue("parentPerClusterMember", value)}

network: LXDNetworkOnClusterMember;
}

const NetworkClusterMemberChip: FC<Props> = ({ network }) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice extraction!

@mas-who
Copy link
Collaborator

mas-who commented Jan 17, 2025

QA comments:

  1. On medium screen size, the network list table gets cut off, should we make the table width adjust with viewport width until they turn into cards?
    Screenshot from 2025-01-17 15-47-06

  2. When creating a physical network, should we pre-select a network interface when the "Same for all cluster members" option is checked?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants