Skip to content

Multiline Option with ^ and $ anchors  #57

@kmalski

Description

@kmalski

Hi,

I am struggling with proper configuration of Option passed to search method with the Syntax.ECMAScript. I would expect that with Option.DEFAULT / Option.NONE regex with usage of ^ ,$ anchors and no explicit newline will fail with newline character. For example

byte[] pattern = "^[a-z]{1,10}$".getBytes();
byte[] str = "a\nb".getBytes();

Regex regex = new Regex(pattern, 0, pattern.length, Option.NONE, UTF8Encoding.INSTANCE, Syntax.ECMAScript);
Matcher matcher = regex.matcher(str);
int result = matcher.search(0, str.length, Option.DEFAULT);

should results with -1 but currently results with 0. Even passing Option.SINGLELINE does not change it. What I did to make this work, was to subtract the Option.MULTILINE

int result = matcher.search(0, str.length, -Option.MULTILINE)

I have tested this case with multiple online regex tools and JavaScript regex implementation in my browser and this example always gives me no match (as I expect). Only adding multiline option gives me similar result as with Joni library.

Setting syntax to Java works as expected and gives similar result as this snippet with built-in java regex

String pattern = "^[a-z]{1,10}$";
String str = "a\nb";

Pattern p = Pattern.compile(pattern);
java.util.regex.Matcher m = p.matcher(str);
boolean result = m.find();

Is the MULTILINE option default for library ECMAScript syntax and should it be? I was digging into the ECMAScript and looks like multiline = false is the default (user has to explicitly pass m flag).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions