Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve preg_split() function ReturnType #3757

Open
wants to merge 24 commits into
base: 2.0.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion resources/functionMap.php
Original file line number Diff line number Diff line change
Expand Up @@ -9081,7 +9081,7 @@
'preg_replace' => ['string|array|null', 'regex'=>'string|array', 'replace'=>'string|array', 'subject'=>'string|array', 'limit='=>'int', '&w_count='=>'int'],
'preg_replace_callback' => ['string|array|null', 'regex'=>'string|array', 'callback'=>'callable(array<int|string, string>):string', 'subject'=>'string|array', 'limit='=>'int', '&w_count='=>'int'],
'preg_replace_callback_array' => ['string|array|null', 'pattern'=>'array<string,callable>', 'subject'=>'string|array', 'limit='=>'int', '&w_count='=>'int'],
'preg_split' => ['list<string>|false', 'pattern'=>'string', 'subject'=>'string', 'limit='=>'?int', 'flags='=>'int'],
'preg_split' => ['__benevolent<list<string>|list<array{string, int<0, max>}>|false>', 'pattern'=>'string', 'subject'=>'string', 'limit='=>'?int', 'flags='=>'int'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other preg_ method are not using benevolent union, so I would think more consistent to not use a benevolent union here too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VincentLanglet @ondrejmirtes

I understand. I would like to remove benevolent union.

On the other hand, I think that preg_split should not return false unless there is an issue with the regular expression. Furthermore, in this PR, I have modified the code so that if the regular expression is incorrect, an error is returned early in the parsing process.

Therefore, if the regular expression is correct, I am considering not adding false as a Union.
(In this case, this bug can also be fixed.

public function testBug7554(): void
{
$errors = $this->runAnalyse(__DIR__ . '/data/bug-7554.php');
$this->assertCount(2, $errors);
$this->assertSame(sprintf('Parameter #1 $%s of function count expects array|Countable, list<array<int, int<0, max>|string>>|false given.', PHP_VERSION_ID < 80000 ? 'var' : 'value'), $errors[0]->getMessage());
$this->assertSame(26, $errors[0]->getLine());
$this->assertSame('Cannot access offset int<1, max> on list<array{string, int<0, max>}>|false.', $errors[1]->getMessage());
$this->assertSame(27, $errors[1]->getLine());
}
)

If you think not to use benevolent union, do you think it would be fine to remove false? I would like to hear your opinion on this. I would like to get your opinion before making any modifications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like any other preg method I think it can returns false if an internal error occurs like

  • some memory limit is reached
  • too many recursion
  • some invalid encoding

And in the pho.ini there is some config like pcre.recursion_limit or pcre.backtrack_limit.

So I would keep a non-benevolent union AND false.

If we decide to remove false from the signature it should be removed from all the preg methods. But I dont think we should go this way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand.
I modified it to keep a non-benevolent union AND false.

37f9b3e

'prev' => ['mixed', '&rw_array_arg'=>'array|object'],
'print_r' => ['string|true', 'var'=>'mixed', 'return='=>'bool'],
'printf' => ['int', 'format'=>'string', '...values='=>'__stringAndStringable|int|float|null|bool'],
Expand Down
134 changes: 127 additions & 7 deletions src/Type/Php/PregSplitDynamicReturnTypeExtension.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,29 @@
use PHPStan\Reflection\FunctionReflection;
use PHPStan\TrinaryLogic;
use PHPStan\Type\Accessory\AccessoryArrayListType;
use PHPStan\Type\Accessory\AccessoryNonEmptyStringType;
use PHPStan\Type\Accessory\NonEmptyArrayType;
use PHPStan\Type\ArrayType;
use PHPStan\Type\BitwiseFlagHelper;
use PHPStan\Type\Constant\ConstantArrayType;
use PHPStan\Type\Constant\ConstantArrayTypeBuilder;
use PHPStan\Type\Constant\ConstantBooleanType;
use PHPStan\Type\Constant\ConstantIntegerType;
use PHPStan\Type\Constant\ConstantStringType;
use PHPStan\Type\DynamicFunctionReturnTypeExtension;
use PHPStan\Type\ErrorType;
use PHPStan\Type\IntegerRangeType;
use PHPStan\Type\IntegerType;
use PHPStan\Type\MixedType;
use PHPStan\Type\StringType;
use PHPStan\Type\Type;
use PHPStan\Type\TypeCombinator;
use PHPStan\Type\TypeUtils;
use function count;
use function is_array;
use function is_int;
use function preg_match;
use function preg_split;
use function strtolower;

final class PregSplitDynamicReturnTypeExtension implements DynamicFunctionReturnTypeExtension
Expand All @@ -36,17 +48,125 @@ public function isFunctionSupported(FunctionReflection $functionReflection): boo

public function getTypeFromFunctionCall(FunctionReflection $functionReflection, FuncCall $functionCall, Scope $scope): ?Type
{
$flagsArg = $functionCall->getArgs()[3] ?? null;
$args = $functionCall->getArgs();
if (count($args) < 2) {
return null;
}
$patternArg = $args[0];
$subjectArg = $args[1];
$limitArg = $args[2] ?? null;
$flagArg = $args[3] ?? null;
$patternType = $scope->getType($patternArg->value);
$patternConstantTypes = $patternType->getConstantStrings();
$subjectType = $scope->getType($subjectArg->value);
$subjectConstantTypes = $subjectType->getConstantStrings();

if (
count($patternConstantTypes) > 0
&& @preg_match($patternConstantTypes[0]->getValue(), '') === false
) {
return new ErrorType();
}

if ($limitArg === null) {
$limits = [-1];
} else {
$limitType = $scope->getType($limitArg->value);
$limits = $limitType->getConstantScalarValues();
}

if ($flagArg === null) {
$flags = [0];
} else {
$flagType = $scope->getType($flagArg->value);
$flags = $flagType->getConstantScalarValues();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By replacing it as follows, type checking within multiple Constant loops will no longer be necessary.

$flags = [];
$flagType = $scope->getType($flagArg->value);
foreach ($flagType->getConstantScalarValues() as $flag) {
    if (!is_int()) {
        return new ErrorType();
    }

    $flags[] = $flag;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved 8cb3030


if (count($patternConstantTypes) === 0 || count($subjectConstantTypes) === 0) {
$returnNonEmptyStrings = $flagArg !== null && $this->bitwiseFlagAnalyser->bitwiseOrContainsConstant($flagArg->value, $scope, 'PREG_SPLIT_NO_EMPTY')->yes();
if ($returnNonEmptyStrings) {
$returnStringType = TypeCombinator::intersect(
new StringType(),
new AccessoryNonEmptyStringType(),
);
} else {
$returnStringType = new StringType();
}

if ($flagsArg !== null && $this->bitwiseFlagAnalyser->bitwiseOrContainsConstant($flagsArg->value, $scope, 'PREG_SPLIT_OFFSET_CAPTURE')->yes()) {
$type = new ArrayType(
new IntegerType(),
new ConstantArrayType([new ConstantIntegerType(0), new ConstantIntegerType(1)], [new StringType(), IntegerRangeType::fromInterval(0, null)], [2], [], TrinaryLogic::createYes()),
$capturedArrayType = new ConstantArrayType(
[new ConstantIntegerType(0), new ConstantIntegerType(1)],
[$returnStringType, IntegerRangeType::fromInterval(0, null)],
[2],
[],
TrinaryLogic::createYes(),
);
return TypeCombinator::union(TypeCombinator::intersect($type, new AccessoryArrayListType()), new ConstantBooleanType(false));

$returnInternalValueType = $returnStringType;
if ($flagArg !== null) {
$flagState = $this->bitwiseFlagAnalyser->bitwiseOrContainsConstant($flagArg->value, $scope, 'PREG_SPLIT_OFFSET_CAPTURE');
if ($flagState->yes()) {
$capturedArrayListType = TypeCombinator::intersect(
new ArrayType(new IntegerType(), $capturedArrayType),
new AccessoryArrayListType(),
);

if ($subjectType->isNonEmptyString()->yes()) {
$capturedArrayListType = TypeCombinator::intersect($capturedArrayListType, new NonEmptyArrayType());
}

return TypeUtils::toBenevolentUnion(TypeCombinator::union($capturedArrayListType, new ConstantBooleanType(false)));
}
if ($flagState->maybe()) {
$returnInternalValueType = TypeCombinator::union(new StringType(), $capturedArrayType);
}
}

$returnListType = TypeCombinator::intersect(new ArrayType(new MixedType(), $returnInternalValueType), new AccessoryArrayListType());
if ($subjectType->isNonEmptyString()->yes()) {
$returnListType = TypeCombinator::intersect(
$returnListType,
new NonEmptyArrayType(),
);
}

return TypeUtils::toBenevolentUnion(TypeCombinator::union($returnListType, new ConstantBooleanType(false)));
}

$resultTypes = [];
foreach ($patternConstantTypes as $patternConstantType) {
foreach ($subjectConstantTypes as $subjectConstantType) {
foreach ($limits as $limit) {
if (!is_int($limit)) {
return null;
}
foreach ($flags as $flag) {
if (!is_int($flag)) {
return null;
}
$result = @preg_split($patternConstantType->getValue(), $subjectConstantType->getValue(), $limit, $flag);
if ($result === false) {
continue;
}
$constantArray = ConstantArrayTypeBuilder::createEmpty();
foreach ($result as $key => $value) {
if (is_array($value)) {
$valueConstantArray = ConstantArrayTypeBuilder::createEmpty();
$valueConstantArray->setOffsetValueType(new ConstantIntegerType(0), new ConstantStringType($value[0]));
$valueConstantArray->setOffsetValueType(new ConstantIntegerType(1), new ConstantIntegerType($value[1]));
$returnInternalValueType = $valueConstantArray->getArray();
} else {
$returnInternalValueType = new ConstantStringType($value);
}
$constantArray->setOffsetValueType(new ConstantIntegerType($key), $returnInternalValueType);
}

$resultTypes[] = $constantArray->getArray();
}
}
}
}

return null;
return TypeCombinator::union(...$resultTypes);
}

}
19 changes: 5 additions & 14 deletions tests/PHPStan/Analyser/AnalyserIntegrationTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
use PHPStan\Type\Constant\ConstantStringType;
use function extension_loaded;
use function restore_error_handler;
use function sprintf;
use const PHP_VERSION_ID;

class AnalyserIntegrationTest extends PHPStanTestCase
Expand Down Expand Up @@ -842,13 +841,11 @@ public function testOffsetAccess(): void
public function testUnresolvableParameter(): void
{
$errors = $this->runAnalyse(__DIR__ . '/data/unresolvable-parameter.php');
$this->assertCount(3, $errors);
$this->assertSame('Parameter #2 $array of function array_map expects array, list<string>|false given.', $errors[0]->getMessage());
$this->assertSame(18, $errors[0]->getLine());
$this->assertSame('Method UnresolvableParameter\Collection::pipeInto() has parameter $class with no type specified.', $errors[1]->getMessage());
$this->assertCount(2, $errors);
$this->assertSame('Method UnresolvableParameter\Collection::pipeInto() has parameter $class with no type specified.', $errors[0]->getMessage());
$this->assertSame(30, $errors[0]->getLine());
$this->assertSame('PHPDoc tag @param for parameter $class contains unresolvable type.', $errors[1]->getMessage());
$this->assertSame(30, $errors[1]->getLine());
$this->assertSame('PHPDoc tag @param for parameter $class contains unresolvable type.', $errors[2]->getMessage());
$this->assertSame(30, $errors[2]->getLine());
}

public function testBug7248(): void
Expand Down Expand Up @@ -890,13 +887,7 @@ public function testBug7500(): void
public function testBug7554(): void
{
$errors = $this->runAnalyse(__DIR__ . '/data/bug-7554.php');
$this->assertCount(2, $errors);

$this->assertSame(sprintf('Parameter #1 $%s of function count expects array|Countable, list<array<int, int<0, max>|string>>|false given.', PHP_VERSION_ID < 80000 ? 'var' : 'value'), $errors[0]->getMessage());
$this->assertSame(26, $errors[0]->getLine());

$this->assertSame('Cannot access offset int<1, max> on list<array{string, int<0, max>}>|false.', $errors[1]->getMessage());
$this->assertSame(27, $errors[1]->getLine());
$this->assertCount(0, $errors);
}

public function testBug7637(): void
Expand Down
60 changes: 50 additions & 10 deletions tests/PHPStan/Analyser/nsrt/preg_split.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,50 @@ class HelloWorld
{
public function doFoo()
{
assertType('list<string>|false', preg_split('/-/', '1-2-3'));
assertType('list<string>|false', preg_split('/-/', '1-2-3', -1, PREG_SPLIT_NO_EMPTY));
assertType('list<array{string, int<0, max>}>|false', preg_split('/-/', '1-2-3', -1, PREG_SPLIT_OFFSET_CAPTURE));
assertType('list<array{string, int<0, max>}>|false', preg_split('/-/', '1-2-3', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_OFFSET_CAPTURE));
assertType('*ERROR*', preg_split('/[0-9a]', '1-2-3'));
assertType("array{''}", preg_split('/-/', ''));
assertType("array{}", preg_split('/-/', '', -1, PREG_SPLIT_NO_EMPTY));
assertType("array{'1', '-', '2', '-', '3'}", preg_split('/ *(-) */', '1- 2-3', -1, PREG_SPLIT_DELIM_CAPTURE));
assertType("array{array{'', 0}}", preg_split('/-/', '', -1, PREG_SPLIT_OFFSET_CAPTURE));
assertType("array{}", preg_split('/-/', '', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_OFFSET_CAPTURE));
assertType("array{'1', '2', '3'}", preg_split('/-/', '1-2-3'));
assertType("array{'1', '2', '3'}", preg_split('/-/', '1-2-3', -1, PREG_SPLIT_NO_EMPTY));
assertType("array{'1', '3'}", preg_split('/-/', '1--3', -1, PREG_SPLIT_NO_EMPTY));
assertType("array{array{'1', 0}, array{'2', 2}, array{'3', 4}}", preg_split('/-/', '1-2-3', -1, PREG_SPLIT_OFFSET_CAPTURE));
assertType("array{array{'1', 0}, array{'2', 2}, array{'3', 4}}", preg_split('/-/', '1-2-3', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_OFFSET_CAPTURE));
assertType("array{array{'1', 0}, array{'', 2}, array{'3', 3}}", preg_split('/-/', '1--3', -1, PREG_SPLIT_OFFSET_CAPTURE));
assertType("array{array{'1', 0}, array{'3', 3}}", preg_split('/-/', '1--3', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_OFFSET_CAPTURE));
}

public function doWithVariables(string $pattern, string $subject, int $offset, int $flags): void
{
assertType('(list<array{string, int<0, max>}|string>|false)', preg_split($pattern, $subject, $offset, $flags));
assertType('(list<array{string, int<0, max>}|string>|false)', preg_split("//", $subject, $offset, $flags));

assertType('(non-empty-list<array{string, int<0, max>}|string>|false)', preg_split($pattern, "1-2-3", $offset, $flags));
assertType('(list<array{string, int<0, max>}|string>|false)', preg_split($pattern, $subject, -1, $flags));
assertType('(list<non-empty-string>|false)', preg_split($pattern, $subject, $offset, PREG_SPLIT_NO_EMPTY));
assertType('(list<array{string, int<0, max>}>|false)', preg_split($pattern, $subject, $offset, PREG_SPLIT_OFFSET_CAPTURE));
assertType("(list<string>|false)", preg_split($pattern, $subject, $offset, PREG_SPLIT_DELIM_CAPTURE));
assertType('(list<array{string, int<0, max>}>|false)', preg_split($pattern, $subject, $offset, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_OFFSET_CAPTURE));
}

/**
* @param non-empty-string $nonEmptySubject
*/
public function doWithNonEmptySubject(string $pattern, string $nonEmptySubject, int $offset, int $flags): void
{
assertType('(non-empty-list<string>|false)', preg_split("//", $nonEmptySubject));

assertType('(non-empty-list<array{string, int<0, max>}|string>|false)', preg_split($pattern, $nonEmptySubject, $offset, $flags));
assertType('(non-empty-list<array{string, int<0, max>}|string>|false)', preg_split("//", $nonEmptySubject, $offset, $flags));

assertType('(non-empty-list<array{string, int<0, max>}>|false)', preg_split("/-/", $nonEmptySubject, $offset, PREG_SPLIT_OFFSET_CAPTURE));
assertType('(non-empty-list<non-empty-string>|false)', preg_split("/-/", $nonEmptySubject, $offset, PREG_SPLIT_NO_EMPTY));
assertType('(non-empty-list<string>|false)', preg_split("/-/", $nonEmptySubject, $offset, PREG_SPLIT_DELIM_CAPTURE));
assertType('(non-empty-list<array{string, int<0, max>}>|false)', preg_split("/-/", $nonEmptySubject, $offset, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_OFFSET_CAPTURE));
assertType('(non-empty-list<array{non-empty-string, int<0, max>}>|false)', preg_split("/-/", $nonEmptySubject, $offset, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_OFFSET_CAPTURE));
assertType('(non-empty-list<non-empty-string>|false)', preg_split("/-/", $nonEmptySubject, $offset, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE));
}

/**
Expand All @@ -24,24 +64,24 @@ public function doFoo()
*/
public static function splitWithOffset($pattern, $subject, $limit = -1, $flags = 0)
{
assertType('list<array{string, int<0, max>}>|false', preg_split($pattern, $subject, $limit, $flags | PREG_SPLIT_OFFSET_CAPTURE));
assertType('list<array{string, int<0, max>}>|false', preg_split($pattern, $subject, $limit, PREG_SPLIT_OFFSET_CAPTURE | $flags));

assertType('list<array{string, int<0, max>}>|false', preg_split($pattern, $subject, $limit, PREG_SPLIT_OFFSET_CAPTURE | $flags | PREG_SPLIT_NO_EMPTY));
assertType('(list<array{string, int<0, max>}>|false)', preg_split($pattern, $subject, $limit, $flags | PREG_SPLIT_OFFSET_CAPTURE));
assertType('(list<array{string, int<0, max>}>|false)', preg_split($pattern, $subject, $limit, PREG_SPLIT_OFFSET_CAPTURE | $flags));
assertType('(list<array{non-empty-string, int<0, max>}>|false)', preg_split($pattern, $subject, $limit, PREG_SPLIT_OFFSET_CAPTURE | $flags | PREG_SPLIT_NO_EMPTY));
}

/**
* @param string $pattern
* @param string $subject
* @param int $limit
*/
public static function dynamicFlags($pattern, $subject, $limit = -1) {
public static function dynamicFlags($pattern, $subject, $limit = -1)
{
$flags = PREG_SPLIT_OFFSET_CAPTURE;

if ($subject === '1-2-3') {
$flags |= PREG_SPLIT_NO_EMPTY;
}

assertType('list<array{string, int<0, max>}>|false', preg_split($pattern, $subject, $limit, $flags));
assertType('(list<array{string, int<0, max>}>|false)', preg_split($pattern, $subject, $limit, $flags));
}
}
Loading