Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotnet workload restore runtime jumps from 25 seconds to 4 minutes when nuget feed is AzDO instead of nuget.org #43870

Open
AArnott opened this issue Oct 3, 2024 · 20 comments
Labels
Area-Workloads needs team triage Requires a full team discussion
Milestone

Comments

@AArnott
Copy link
Contributor

AArnott commented Oct 3, 2024

Describe the bug

Frequent pipeline failures due to nuget.org not responding to package restore requests drives me to consider using a private Azure Artifacts feed instead of nuget.org in nuget.config.

But this change has an unexpected side-effect: the time to run dotnet workload restore jumps extremely:

Agent OS nuget.org AzDO feed Time multiplier
Windows 30 s 80 s 2.6X
Linux 17 s 58 s 3.4X
macOS 25 s 244 s 9.7X

The macOS agent is the only one to display the following errors, which it does only after the change:

Installing workloads: 

Failed to update the advertising manifest microsoft.net.sdk.android: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.sdk.ios: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.sdk.maccatalyst: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.sdk.macos: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.sdk.maui: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.sdk.tvos: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.workload.mono.toolchain.current: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.workload.emscripten.current: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.workload.emscripten.net6: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.workload.emscripten.net7: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.workload.mono.toolchain.net6: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.workload.mono.toolchain.net7: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.sdk.aspire: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Failed to update the advertising manifest microsoft.net.sdk.android: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.sdk.android doesn't exist.
Failed to update the advertising manifest microsoft.net.sdk.ios: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.sdk.ios doesn't exist.
Failed to update the advertising manifest microsoft.net.sdk.maccatalyst: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.sdk.maccatalyst doesn't exist.
Failed to update the advertising manifest microsoft.net.sdk.macos: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.sdk.macos doesn't exist.
Failed to update the advertising manifest microsoft.net.sdk.maui: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.sdk.maui doesn't exist.
Failed to update the advertising manifest microsoft.net.sdk.tvos: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.sdk.tvos doesn't exist.
Failed to update the advertising manifest microsoft.net.workload.mono.toolchain.current: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.workload.mono.toolchain.current doesn't exist.
Failed to update the advertising manifest microsoft.net.workload.emscripten.current: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.workload.emscripten.current doesn't exist.
Failed to update the advertising manifest microsoft.net.workload.emscripten.net6: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.workload.emscripten.net6 doesn't exist.
Failed to update the advertising manifest microsoft.net.workload.emscripten.net7: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.workload.emscripten.net7 doesn't exist.
Failed to update the advertising manifest microsoft.net.workload.mono.toolchain.net6: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.workload.mono.toolchain.net6 doesn't exist.
Failed to update the advertising manifest microsoft.net.workload.mono.toolchain.net7: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..
Advertising manifest not updated. Manifest package for microsoft.net.workload.mono.toolchain.net7 doesn't exist.
Failed to update the advertising manifest microsoft.net.sdk.aspire: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json..

Eventually the work completes (successfully I guess, because the rest of the build succeeds.)

This is repeatable. And the macOS agent has the authentication necessary to pull from the private feed.

Further technical details

 OS Platform: Darwin
 RID:         osx-arm64
 Base Path:   /Users/buildagent/myagent/_work/_tool/dotnet/sdk/8.0.400/

.NET workloads installed:
Configured to use loose manifests when installing new manifests.
 [ios]
   Installation Source: SDK 8.0.400
   Manifest Version:    18.0.8303/8.0.100
   Manifest Path:       /Users/buildagent/myagent/_work/_tool/dotnet/sdk-manifests/8.0.100/microsoft.net.sdk.ios/18.0.8303/WorkloadManifest.json
   Install Type:        FileBased


Host:
  Version:      8.0.8
  Architecture: arm64
  Commit:       08338fcaa5

.NET SDKs installed:
  8.0.300 [/Users/buildagent/myagent/_work/_tool/dotnet/sdk]
  8.0.400 [/Users/buildagent/myagent/_work/_tool/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 8.0.5 [/Users/buildagent/myagent/_work/_tool/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.8 [/Users/buildagent/myagent/_work/_tool/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 8.0.5 [/Users/buildagent/myagent/_work/_tool/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.6 [/Users/buildagent/myagent/_work/_tool/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.7 [/Users/buildagent/myagent/_work/_tool/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.8 [/Users/buildagent/myagent/_work/_tool/dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  DOTNET_ROOT       [/Users/buildagent/myagent/_work/_tool/dotnet]
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Workloads untriaged Request triage from a team member labels Oct 3, 2024
@marcpopMSFT marcpopMSFT added the needs team triage Requires a full team discussion label Oct 4, 2024
@marcpopMSFT
Copy link
Member

Triage: @AArnott does that feed exist and does it work with dotnet restore but not with workloads? Can you try adding --ignore-failed-sources? Something is not configured right with that feed and my guess is that the error case is not failing fast?

@marcpopMSFT marcpopMSFT added this to the Discussion milestone Oct 8, 2024
@marcpopMSFT marcpopMSFT removed the untriaged Request triage from a team member label Oct 8, 2024
@AArnott
Copy link
Contributor Author

AArnott commented Oct 9, 2024

@marcpopMSFT Yes, the feed exists, and the dotnet restore step worked successfully before dotnet workload restore runs.

Can you try adding --ignore-failed-sources?

My nuget.config only has one source specified, with <clear/> above that to prevent any ambient source from being added. I also set NUGET_PACKAGES to the agent temp dir so that it's fully restore from scratch on every run of the pipeline, so I'm certain that restore works (i.e. actually downloads packages), but workload restore fails.

@marcpopMSFT
Copy link
Member

@nkolev92 @aortiz-msft are there logs we can collect to see the time spent in nuget for this? Have you heard of any discrepancy between nuget.org and AzDO?

@nkolev92
Copy link
Contributor

Since we're talking about dotnet workload restore, I'd need to get some hints to that code so we can figure out which logs might be available.
All NuGet remote calls end up calling: HttpRetryHandler which will tell you long it takes.

Normally that would be on informational verbosity, assuming the logger is passed down.

Have you heard of any discrepancy between nuget.org and AzDO?

I'd generally expect nuget.org to be faster than AzDO.
Auth is one starting point. nuget.org is optimized for parallel downloads, with static blobs and CDNs, while AzDO does more compute, so it's not unusual for nuget.org to be noticeably faster than AzDO.

I don't know if it fully explains the 25s to 4 mins jump, as that'd be driven by the number of calls made, packages download and their size.

The logs posted here being full of lines saying the service index wasn't loaded are probably what I'd look into first, why there's so many failures fetching the service index resource.

Another thing I'd look into is whether the SourceRepository object is shared across all of these workload calls.

@AArnott
Copy link
Contributor Author

AArnott commented Oct 16, 2024

Normally that would be on informational verbosity, assuming the logger is passed down.

Probably not. Most msbuild args aren't copied to the inner invocation of msbuild, sadly.

@marcpopMSFT
Copy link
Member

@AArnott you could set the msbuild args as properties and they should get picked up by the inner invocation. Let us know if you have more info for us to dig into based on nkolev92's reply. We probably need to take a look at improving how we report nuget errors and data since we're a pass through in a lot of cases and it's hard to track down.

@marcpopMSFT
Copy link
Member

also, you say you're set up for auth. Do you get the unable to load message on all attempts to hit AzDO? I wonder if auth is not configured correctly (reminder that secure feeds don't work well with workloads on mac in particular because of limitations of the credential manager extension)

@marcpopMSFT
Copy link
Member

@AArnott any update whether you were seeting the unable to load message whenever hitting AzDO? Can you try enabling nuget logging and try on windows? NUGET_PLUGIN_ENABLE_LOG set to true and NUGET_PLUGIN_LOG_DIRECTORY_PATH to an available path

@AArnott
Copy link
Contributor Author

AArnott commented Nov 7, 2024

you could set the msbuild args as properties and they should get picked up by the inner invocation

I'm not familiar with how to do this. I have filed other bugs about dotnet workload restore not behaving like msbuild and the response has been that it being msbuild based is an implementation detail and thus I shouldn't expect to be able to control it (e.g. #42485).
I'm happy to try setting some properties though if it'll help.

Do you get the unable to load message on all attempts to hit AzDO?

Definitely not. Ultimately it works, as attested to by the fact that the build succeeds, and I'm on agents that have no local nuget cache to begin with, so nuget packages (and presumably workloads) must have been downloaded (from AzDO).

Can you try enabling nuget logging and try on windows? NUGET_PLUGIN_ENABLE_LOG set to true and NUGET_PLUGIN_LOG_DIRECTORY_PATH to an available path

Yes, I'll try that and get back to you.

@AArnott
Copy link
Contributor Author

AArnott commented Nov 7, 2024

@marcpopMSFT
Copy link
Member

@nkolev92 what should we be looking for in those logs?

@nkolev92
Copy link
Contributor

I quickly looked at the plugin logs focusing on the timings, to see if acquiring a token is taking long or anything like that.
All of the token acquisitions return very quickly, so auth isn't the thing that's taking too long here.

@marcpopMSFT
Copy link
Member

@nkolev92 so how do we determine what is taking the time? If it's not in the nuget logs, where might it be? Is the time on our side and we need our own perf trace from the SDK?

@nkolev92
Copy link
Contributor

We haven't looked at NuGet specific logs yet, so I think we need to go back to the original idea and run
dotnet workload restore --verbosity diag and see those logs.

When I run dotnet workload restore --verbosity diag locally, I do see see a lot of log messages:

[NuGet Manager] [Info] GET https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/45bacae2-5efb-47c8-91e5-8ec20c22b4f8/nuget/v3/flat2/microsoft.net.sdk.aspire.manifest-8.0.100.msi.x64/8.2.2/microsoft.net.sdk.aspire.manifest-8.0.100.msi.x64.8.2.2.nupkg
[NuGet Manager] [Info] OK https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/45bacae2-5efb-47c8-91e5-8ec20c22b4f8/nuget/v3/flat2/microsoft.net.workload.mono.toolchain.net6.manifest-9.0.100-rc.2.msi.x64/index.json 127ms

Each HTTP request will be logged and it'll tell us if some are taking longer.

@AArnott can you try that?

@AArnott
Copy link
Contributor Author

AArnott commented Dec 13, 2024

It's hard to test this, because #45447 is now blocking me. :(

@AArnott
Copy link
Contributor Author

AArnott commented Dec 13, 2024

I've found a workaround for that, and now all the agents are at least running again except for the mac, which is failing with:

Workload installation failed: Unable to load the service index for source https://pkgs.dev.azure.com/andrewarnott/Git/_packaging/PrivateConsumption/nuget/v3/index.json.

This, despite the fact that the immediately preceding Azure Pipelines task is the NuGet authenticate task, which authenticates this feed:

Installing the Azure Artifacts Credential Provider (.NET Framework) to '/Users/buildagent/.nuget/plugins/netfx/CredentialProvider.Microsoft'. This credential provider is compatible with nuget.exe 4.8.0.5385 or later, and MSBuild 15.8.166.59604 or later.

Installing the Azure Artifacts Credential Provider (.NET Core) to '/Users/buildagent/.nuget/plugins/netcore/CredentialProvider.Microsoft'. This credential provider is compatible with .NET SDK 6 or later.

Setting up the credential provider to use the identity 'Project Collection Build Service (andrewarnott)' for feeds in your organization/collection starting with:
  https://pkgs.dev.azure.com/andrewarnott/
  https://andrewarnott.pkgs.visualstudio.com/

@nkolev92
Copy link
Contributor

@AArnott

Can you add --verbosity diag to your workload restore call and post those logs?

@AArnott
Copy link
Contributor Author

AArnott commented Dec 17, 2024

I am still blocked on those logs from the mac. But here are the logs from linux, which in this run show it took 4.5 minutes, compared to the 27 seconds baseline. The increased time I attribute to the switch from nuget.org to Azure Artifacts feeds and (likely) the diagnostic level of the logs.

verbose logs.zip

@marcpopMSFT
Copy link
Member

@nkolev92 anything in the new logs that would help?

@nkolev92
Copy link
Contributor

nkolev92 commented Jan 8, 2025

Looking at the logs, they match the 4.5 mins Andrew is talking about.

  • 2024-12-13T03:23:46.8356679Z
  • 2024-12-13T03:28:15.3739076Z

There are ~166 http requests.

There seems to be steps in the process:

Workload manifests download:

  • 2024-12-13T03:24:18.9300862Z
  • 2024-12-13T03:24:22.1663003Z

The total time is just over 3s, but there's plenty of http requests that take ~400ms to download.
Based on the timestamps there's plenty of parallelization.

Then for the workload, 34 different packs are installed.

Installing pack Microsoft.Android.Sdk.Linux version 35.0.24...

  • 2024-12-13T03:24:25.4921219Z
  • 2024-12-13T03:24:30.7350657Z

Notable:
2024-12-13T03:24:30.7350657Z [NuGet Manager] [Info] OK https://pkgs.dev.azure.com/andrewarnott/43cb9df6-bc58-41fb-9368-d6f27c885421/_packaging/c1d30ce9-bcec-4436-b63d-10d78b42ebcc/nuget/v3/flat2/microsoft.android.sdk.linux/35.0.24/microsoft.android.sdk.linux.35.0.24.nupkg 5385ms

5s for this 1 nupkg to be downloaded.

Then it's:

Installing pack Microsoft.Android.Sdk.Linux version 34.0.145...

  • 2024-12-13T03:24:37.7706878Z
  • 2024-12-13T03:24:38.2911122Z

Going we can see that the installation of these workloads is happening sequentially and the download times of the nupkgs are:

  • 1313ms
  • 2400ms
  • 1569ms
  • 1578ms
  • 5555ms
  • 3372ms

By the time they're done, we've spent 2.5 mins there.

2024-12-13T03:24:25.4920842Z
2024-12-13T03:26:57.3953893Z

tldr;

The majority of the time is caught up in download the nupkgs from the packs (guessing they're large files, I don't have access to check the size). This is just the difference between AzDO & nuget.org.
Based on the logs, the pack installation is sequential. If possible, parallelizing the downloads might improve things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Workloads needs team triage Requires a full team discussion
Projects
None yet
Development

No branches or pull requests

3 participants