Skip to content

Commit 35884ed

Browse files
tomsmedingMikolaj
authored andcommitted
Ignore invalid Unicode in pkg-config descriptions
Previously, if any of the pkg-config packages on the system had invalid Unicode in their description fields (like the Intel vpl package has at the time of writing, 2024-01-11, see haskell#9608), cabal would crash because it tried to interpret the entire `pkg-config --list-all` output as Unicode. This change, as suggested by gbaz in haskell#9608 (comment) switches to using a lazy ByteString for reading in the output, splitting on the first space in byte land, and then parsing only the package _name_ to a String. For further future-proofing, package names that don't parse as valid Unicode don't crash Cabal, but are instead ignored.
1 parent f01e000 commit 35884ed

File tree

2 files changed

+17
-7
lines changed

2 files changed

+17
-7
lines changed

cabal-install-solver/cabal-install-solver.cabal

+1
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ library
110110
, mtl >=2.0 && <2.4
111111
, pretty ^>=1.1
112112
, transformers >=0.4.2.0 && <0.7
113+
, text
113114

114115
if flag(debug-expensive-assertions)
115116
cpp-options: -DDEBUG_EXPENSIVE_ASSERTIONS

cabal-install-solver/src/Distribution/Solver/Types/PkgConfigDb.hs

+16-7
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,22 @@ module Distribution.Solver.Types.PkgConfigDb
2323
import Distribution.Solver.Compat.Prelude
2424
import Prelude ()
2525

26-
import Control.Exception (handle)
27-
import Control.Monad (mapM)
28-
import qualified Data.Map as M
29-
import System.FilePath (splitSearchPath)
26+
import Control.Exception (handle)
27+
import Control.Monad (mapM)
28+
import qualified Data.ByteString.Lazy as LBS
29+
import Data.Either (rights)
30+
import qualified Data.Map as M
31+
import qualified Data.Text as T
32+
import qualified Data.Text.Encoding as T
33+
import System.FilePath (splitSearchPath)
3034

3135
import Distribution.Compat.Environment (lookupEnv)
3236
import Distribution.Package (PkgconfigName, mkPkgconfigName)
3337
import Distribution.Parsec
3438
import Distribution.Simple.Program
3539
(ProgramDb, getProgramOutput, pkgConfigProgram, needProgram, ConfiguredProgram)
36-
import Distribution.Simple.Program.Run (getProgramInvocationOutputAndErrors, programInvocation)
40+
import Distribution.Simple.Program.Run
41+
(getProgramInvocationOutputAndErrors, programInvocation, getProgramInvocationLBS)
3742
import Distribution.Simple.Utils (info)
3843
import Distribution.Types.PkgconfigVersion
3944
import Distribution.Types.PkgconfigVersionRange
@@ -63,10 +68,14 @@ readPkgConfigDb verbosity progdb = handle ioErrorHandler $ do
6368
case mpkgConfig of
6469
Nothing -> noPkgConfig "Cannot find pkg-config program"
6570
Just (pkgConfig, _) -> do
66-
pkgList <- lines <$> getProgramOutput verbosity pkgConfig ["--list-all"]
71+
-- To prevent malformed Unicode in the descriptions from crashing cabal,
72+
-- read without interpreting any encoding first. (#9608)
73+
pkgList <- LBS.split 10 <$> getProgramInvocationLBS verbosity (programInvocation pkgConfig ["--list-all"])
6774
-- The output of @pkg-config --list-all@ also includes a description
6875
-- for each package, which we do not need.
69-
let pkgNames = map (takeWhile (not . isSpace)) pkgList
76+
let pkgNamesLBS = map (LBS.takeWhile (not . isSpace . chr . fromIntegral)) pkgList
77+
-- Now decode as UTF8 and convert to String, dropping any that fail decoding.
78+
let pkgNames = rights $ map (fmap T.unpack . T.decodeUtf8' . LBS.toStrict) pkgNamesLBS
7079
(outs, _errs, exitCode) <-
7180
getProgramInvocationOutputAndErrors verbosity
7281
(programInvocation pkgConfig ("--modversion" : pkgNames))

0 commit comments

Comments
 (0)