Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a local secure repositories in the test-suite. #9540

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

andreabedini
Copy link
Collaborator

The test-suite prelude provides the function withRemoteRepo to set up
a secure remote repository using hackage-repo-tool. The created
repository is then served using python3 internal http server.

Creating the http server and waiting for it to be ready turns out to be
very fragile. Moreover what we really need in the test-suite it not to
check any remote connectivity but to test features related to
hackage-security and index-state. This can be more reliably achieved
using a local secure repository (e.g. file://).

This change:

  • Renames withRemoteRepo to withSecureRepo, to better indicate the
    main difference from withRepo.
  • Configures the repository with a file:// url rather than http://
  • Removes the need for python3 and any retry logic.
  • At variance with withRepo, withSecureRepo should work on Windows.
  • Invokes hackage-repo-tool in such a way index timestamps can be made
    predictable and controllable from the test.

This solves some recent failures we have seen in CI.

Please read Github PR Conventions and then fill in one of these two templates.


Template Α: This PR modifies cabal behaviour

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary.
  • Manual QA notes have been included.
  • Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

Template Β: This PR does not modify cabal behaviour (documentation, tests, refactoring, etc.)

Include the following checklist in your PR:

The test-suite prelude provides the function `withRemoteRepo` to set up
a secure remote repository using hackage-repo-tool. The created
repository is then served using python3 internal http server.

Creating the http server and waiting for it to be ready turns out to be
very fragile. Moreover what we really need in the test-suite it not to
check any remote connectivity but to test features related to
hackage-security and index-state. This can be more reliably achieved
using a local secure repository (e.g. file://).

This change:

- Renames withRemoteRepo to withSecureRepo, to better indicate the
  main difference from withRepo.
- Configures the repository with a file:// url rather than http://
- Removes the need for python3 and any retry logic.
- At variance with withRepo, withSecureRepo should work on Windows.
- Invokes hackage-repo-tool in such a way index timestamps can be made
  predictable and controllable from the test.
@ulysses4ever
Copy link
Collaborator

This sounds awesome! I’ll try to have a look this week. I wonder if this will bring any performance benefit…

Copy link
Collaborator

@ulysses4ever ulysses4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks!

I’m surprised withRemoteRepo had so few clients…

cabal-testsuite/src/Test/Cabal/Prelude.hs Outdated Show resolved Hide resolved
cabal-testsuite/src/Test/Cabal/Prelude.hs Outdated Show resolved Hide resolved
cabal-testsuite/src/Test/Cabal/Prelude.hs Outdated Show resolved Hide resolved
cabal-testsuite/src/Test/Cabal/Prelude.hs Outdated Show resolved Hide resolved
-- backoff starting from 50ms, up to a maximum wait of 60s
_ <- waitTcpVerbose putStrLn (limitRetriesByCumulativeDelay 60000000 $ exponentialBackoff 50000) "localhost" "8000"
runReaderT m (env { testHaveRepo = True }))
liftIO $ appendFile (testUserCabalConfigFile env)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an option, you could use cabal user-config update? But, perhaps, this doesn’t matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm, TIL, how does that work? I have never used it before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's the right way to call cabal-install in this part of the code, but from the command line you can do something like this to experiment:

❯ CABAL_DIR=. cabal update
❯ CABAL_DIR=. cabal user-config update --augment="repository test-local-repo
    url: file:some/dir/file
    secure: True
    root-keys: mykeys"

and observe ./config. This seems to work for a repository, but trying to update remote-repo-cache doesn't do what is desired here: it updates the existing field from the default value ./packages to whatever new value you supply. Unless I misunderstand what appendFile achieves: if it overrides the earlier value of remote-repo-cache, then maybe it's the same.

I see now that it's not your code: you just moved around what was there before, so I want to clarify that my remark about user-config shouldn't derail the PR.

Co-authored-by: Artem Pelenitsyn <[email protected]>
Copy link
Collaborator

@mpickering mpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very nice improvement to me. Good job @andreabedini

Copy link
Member

@Kleidukos Kleidukos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fine PR, thank you!

@Mikolaj
Copy link
Member

Mikolaj commented Dec 21, 2023

Is this a fix for #9530? I can't find any reference to that issue in the description. I guess we'll know when CI runs for a couple of days and the curl timeouts no longer show up?

@ulysses4ever
Copy link
Collaborator

@andreabedini there's an issue with the new time dependency, which I tried to quickly fix (the branch is updated) but failed. The Data.Time.Format.ISO8601, which you use, only appeared in time-1.9 but we can't use "such new" version in the CI. The testsuite is build-type: Custom, and that's why (I think) you're limited in the selection of versions for boot libraries to whatever was shipped with GHC at the time. And we support pretty old GHCs... CUrrently, the oldest is GHC 8.4, which shipped time-1.8.0.2.

@ulysses4ever ulysses4ever force-pushed the testsuite-secure-repos branch from 23ce661 to b1edc91 Compare December 25, 2023 02:21
@ulysses4ever
Copy link
Collaborator

The time issue is solved via time-compat (thanks, Andrea, for the idea).

But we have a new challenge now. Windows strikes back…

stderr:
*** Exception: Command "C:\Windows\SYSTEM32\tar.exe" "-czf" "D:\a\cabal\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo\package\pkg-1.0.tar.gz" "--force-local" "-C" "D:\a\cabal\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\.\repo\\" "pkg-1.0" failed.
Output:
tar.exe: Option --force-local is not supported

GNU tar supports the options but other tar's don't. The current Windows
setup on github runners seem to use a tar that doesn't.
@ulysses4ever ulysses4ever force-pushed the testsuite-secure-repos branch from 372df0f to e9ea717 Compare December 25, 2023 16:04
@ulysses4ever
Copy link
Collaborator

I tried to remove --force-local (e9ea717) but now we get another error on Windows that doesn't seem to be related to tar...

-----BEGIN CABAL OUTPUT-----

Error: [Cabal-7085]

-----END CABAL OUTPUT-----

-----BEGIN CABAL OUTPUT-----

Error parsing config file D:\a\cabal\cabal\cabal-testsuite\PackageTests\NewUpdate\RejectFutureIndexStates\cabal.dist\home\.cabal\config:3:

Parse of field 'url' failed.

https://github.com/haskell/cabal/actions/runs/7322918701/job/19944966081?pr=9540

@jasagredo
Copy link
Collaborator

jasagredo commented Dec 28, 2023

I cannot reproduce the error on my MSYS2 because of this failing command:

+ "C:\msys64\usr\bin\tar.exe" "-czf" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo\package\pkg-1.0.tar.gz" "-C" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\.\repo\\" "pkg-1.0"
tar (child): Cannot connect to C: resolve failed
/usr/bin/tar: Child returned status 128
/usr/bin/tar: Error is not recoverable: exiting now
update-index-state.test.hs: Command "C:\msys64\usr\bin\tar.exe" "-czf" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo\package\pkg-1.0.tar.gz" "-C" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\.\repo\\" "pkg-1.0" failed.
Output:
tar (child): Cannot connect to C: resolve failed
/usr/bin/tar: Child returned status 128
/usr/bin/tar: Error is not recoverable: exiting now

Not sure what is going on here.

Note that:

cabal (e9ea717) [$] via λ lts-18.28
❯ tar -czf $(cygpath -u "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\RejectFutureIndexStates\cabal.dist\repo\package\pkg-1.0.tar.gz") -C $(cygpath -u "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\RejectFutureIndexStates\.\repo\\") pkg-1.0

cabal (e9ea717) [$?] via λ lts-18.28
➜ tar -czf C:\\Users\\Javier\\code\\cabal\\cabal-testsuite\\PackageTests\\NewUpdate\\RejectFutureIndexStates\\cabal.dist\\repo\\package\\pkg-1.0.tar.gz -C C:\\Users\\Javier\\code\\cabal\\cabal-testsuite\\PackageTests\\NewUpdate\\RejectFutureIndexStates\\.\\repo\\ pkg-1.0
tar (child): Cannot connect to C: resolve failed
tar: Child returned status 128
tar: Error is not recoverable: exiting now

So it seems GNU tar doesn't understand C:\\ paths. However sanitized *nix paths work properly. Perhaps we should account for this?

@geekosaur
Copy link
Collaborator

What's happening there is that standard tar recognizes tarball names with a colon in them and treats them as hostname:filename, handing them off to rmt. That's what --force-local was disabling.

@jasagredo
Copy link
Collaborator

jasagredo commented Dec 28, 2023

I replaced my tar.exe with the Windows one for now (just to test this) and I found what is going on with the url:

repository test-local-repo
  url: file:C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo
  secure: True
  root-keys: ...

This is what is being written to the configuration. While uris to files in windows seem to use (ref https://en.wikipedia.org/wiki/File_URI_scheme):

repository test-local-repo
  url: file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo
  secure: True
  root-keys: ...

Although that still doesn't work for me...

➜ cabal update --verbose
Including the following directories in PATH:
Project settings changed, reconfiguring...
Trying to locate mirrors via DNS for initial bootstrap of secure repository
'file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo'
...
Warning: Caught exception during _mirrors lookup:user error (DnsQuery_A failed
with 9003)
Warning: No mirrors found for
file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo
fdOpen: does not exist (The system cannot find the path specified.)
❯ ls /c/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo
01-index.tar  01-index.tar.gz  index  mirrors.json  package  root.json  snapshot.json  timestamp.json

Also if I ctrl+click that file:///... URI in the Windows terminal it opens a explorer.exe in the expected folder, where the contents do exist.

@jasagredo
Copy link
Collaborator

So I think I have a veredict on what could be going wrong here. TL;DR: see below the horizontal rule at the end of this comment.

First of all, local secure repos are parsed in this chunk of code:

    withRepo _ callback | uriScheme remoteRepoURI == "file:" = do
      dir <- Sec.makeAbsolute $ Sec.fromFilePath (uriPath remoteRepoURI)

Now looking into those functions from Hackage.Security.Util.Paths, I see the following when playing with GHCi:

λ: u = "file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist"
λ: Just uri = parseURI nu
λ: makeAbsolute  $ fromFilePath $ Network.URI.uriPath uri
Path "C:\\/Users/Javier/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist"

So it is interpreting the path as relative (see the duplicated Users/Javier/Users/Javier). Why is this happening? I dug a bit more:

λ: unPathNative (Path fp) = FP.Native.joinPath . FP.Posix.splitDirectories $ fp
λ: (\(FsPath p) -> putStrLn $ unPathNative p) $ fromFilePath $ Network.URI.uriPath uri
C:Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist

So at this point it is already wrong.

λ: (\(FsPath p) -> FP.Native.isAbsolute  $ unPathNative p) $ fromFilePath $ Network.URI.uriPath uri
False

But this looks fine, except for the first /:

λ: (\(FsPath (Path p)) -> putStrLn p) $ fromFilePath $ Network.URI.uriPath uri
/C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist

Which can be fixed with a call to FP.Native.normalise:

λ: (\(FsPath (Path p)) -> putStrLn $ FP.Native.normalise  p) $ fromFilePath $ Network.URI.uriPath uri
C:/Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist
λ: (\(FsPath (Path p)) -> FP.Native.isAbsolute $ FP.Native.normalise p) $ fromFilePath $ Network.URI.uriPath uri
True
λ: (\(FsPath (Path p)) -> FP.Native.isAbsolute p) $ fromFilePath $ Network.URI.uriPath uri
False

Note however that we should use the FP.Native.normalise, otherwise it doesn't work:

λ: (\(FsPath (Path p)) -> FP.Native.isAbsolute $ FP.Posix.normalise  p) $ fromFilePath $ Network.URI.uriPath uri
False

I'm unsure on how to follow here. The code in hackage-security says explicitly:

newtype Path a = Path FilePath -- always a Posix style path internally

So I think the issue is more general. Filepaths in windows will have a preceding / that makes the whole logic break, in particular this line:

fromFilePath :: FilePath -> FsPath
fromFilePath fp
    | FP.Native.isAbsolute fp = FsPath (mkPathNative fp  :: Path Absolute)

will return False. How should we handle this? It seems all RFC3986 parsers prepend this / at the beginning (https://timothygu.me/urltester/):
image

@jasagredo
Copy link
Collaborator

Another possible outcome of this would be: "hackage-security and in turn local repositories do not work on Windows systems", which I think would be suboptimal.

@ulysses4ever
Copy link
Collaborator

@gbaz, do you have thoughts on this issue related to hackage-security? (See just above.)

@hasufell
Copy link
Member

hasufell commented Dec 29, 2023

I was pinged to comment here and I'm not sure I understand the issue.

In file://C:/foo/bar C: is parsed as the authority:

      URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

      hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty

We will lose :, because it's interpreted as an empty port:

      authority   = [ userinfo "@" ] host [ ":" port ]

      port        = *DIGIT

file:///C:/foo/bar is similar to file:/C:/foo/bar afaiu. In the first instance, // denotes a following authority component, but that is empty (indeed, the host component can be):

      authority   = [ userinfo "@" ] host [ ":" port ]

      host        = IP-literal / IPv4address / reg-name

      reg-name    = *( unreserved / pct-encoded / sub-delims )

Also see:

When authority is not present, the path cannot begin with two slash characters ("//"). These restrictions result in five different ABNF rules for a path (Section 3.3), only one of which will match any given URI reference.

file:/C:/foo/bar on the other hand follows path-absolute afaiu, which must also start with /:

      path          = path-abempty    ; begins with "/" or is empty
                    / path-absolute   ; begins with "/" but not "//"
                    / path-noscheme   ; begins with a non-colon segment
                    / path-rootless   ; begins with a segment
                    / path-empty      ; zero characters

      path-abempty  = *( "/" segment )
      path-absolute = "/" [ segment-nz *( "/" segment ) ]
      path-noscheme = segment-nz-nc *( "/" segment )
      path-rootless = segment-nz *( "/" segment )
      path-empty    = 0<pchar>


      segment       = *pchar
      segment-nz    = 1*pchar
      segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                    ; non-zero-length segment without any colon ":"

      pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

file:C:/foo/bar is likely also valid and parses under path-rootless.

The meaning of all these paths depends on the scheme, which is file here. RFC 3986 does not specify semantics or additional restrictions of the schemes.

URI schemes are registered with the IANA: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

The relevant RFC for file is 8089.

The ABNF here is:

      file-URI       = file-scheme ":" file-hier-part

      file-scheme    = "file"

      file-hier-part = ( "//" auth-path )
                     / local-path

      auth-path      = [ file-auth ] path-absolute

      local-path     = path-absolute

      file-auth      = "localhost"
                     / host

From what I understand here, all paths are considered "absolute":

The path component represents the absolute path to the file in the file system. See Appendix D for some discussion of system-specific concerns including absolute file paths and file system roots.

There is an appendix for windows/DOS paths: https://www.rfc-editor.org/rfc/rfc8089.html#appendix-E.2

So the updated ABNF is

      local-path     = [ drive-letter ] path-absolute

      drive-letter   = ALPHA ":"

And the spec gives two examples of absolute windows filepaths:

file:c:/path/to/file

file:///c:/path/to/file

These are equivalent. It doesn't say so, but I believe the third form is file:/c:/path/to/file as described above.

So the spec clearly shows what the semantics are for windows filepaths encoded in URI. I'm not sure what your question is. The filepath package does not deal with URIs. All paths must be assumed to be absolute according to the spec, so you could just normalize:

  • drop leading slashes and try to parse as absolute windows path

How you distinguish between windows and unix, I don't really know.

Edit: yes, leading slash is a relative path on windows.

@andreabedini
Copy link
Collaborator Author

andreabedini commented Jan 17, 2024

@jasagredo wow, thanks for the great invetigation. I will need some time to understand what to do.

@andreabedini
Copy link
Collaborator Author

@jasagredo You have done many experiments with file:/// but I am unsure what happens with

repository test-local-repo
  url: file:C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo
  secure: True
  root-keys: ...

as we are producing now.

This seems to be valid as per RFC 8089, and parsed correctly by network-uri (the path is C:\Users\...). So what is failing? If I understand correctly the failure you report in #9540 (comment) is related to gnutar?

@hasufell
Copy link
Member

This seems to be valid as per RFC 8089

No, backslashes are not permitted as per RFC 3986 and neither per RFC 8089. Quoting the latter:

Historically, some usages have copied entire file paths into the path
components of file URIs. Where DOS or Windows file paths were thus
copied, the resulting URI strings contained unencoded backslash ""
characters, which are forbidden by both [RFC1738] and [RFC3986].

It may be possible to translate or update such an invalid file URI by
replacing all backslashes "" with slashes "/" if it can be
determined with reasonable certainty that the backslashes are
intended as path separators.

None of the proposed ABNFs in RFC 8089 include backslashes. So you'll have to go with a normalization function before attempting proper parsing.

@hasufell
Copy link
Member

hasufell commented Jan 20, 2024

I've written a library that parses correctly, with the spec-deviation that when requesting to parse windows paths it would reject posix paths (otherwise it's hard to disambiguate and the ABNF is actually ambiguous): https://hackage.haskell.org/package/file-uri-0.1.0.0/docs/System-URI-File.html#v:parseFileURI

This means when you get a windows path, it either starts with a drive letter or is a UNC path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants