Skip to content

Releases: mudge/re2

1.4.0: Fix crash when using RE2::Scanner#scan with an invalid regular expression

29 Mar 12:28
7415e55
Compare
Choose a tag to compare

As reported by @serch in #52, fix a crash when using RE2::Scanner#scan with an invalid regular expression.

This was caused by the underlying re2 library returning -1 as the number of capturing groups for an invalid regular expression and using that value to initialize memory when incrementally scanning with RE2::Scanner. We now explicitly check an expression is valid and return nil if it isn't.

> pattern = RE2::Regexp.new('???')
/tmp/re2-20210214-21445-1a81zn5/re2-2021-02-02/re2/re2.cc:205: Error parsing '???': no argument for repetition operator: ??
=> #<RE2::Regexp /???/>
> scanner = pattern.scan('Some text')
=> #<RE2::Scanner:0x00007f951d981840>
> scanner.scan
/tmp/re2-20210214-21445-1a81zn5/re2-2021-02-02/re2/re2.cc:890: Invalid RE2: no argument for repetition operator: ??
=> nil
> scanner.to_a
/tmp/re2-20210214-21445-1a81zn5/re2-2021-02-02/re2/re2.cc:890: Invalid RE2: no argument for repetition operator: ??
=> []

This also fixes an edge case when using RE2::Regexp#match and specifying a negative number of matches which would previously raise a NoMemoryError but will now raise a more informative ArgumentError:

> RE2::Regexp.new('(\d+)').match('1 2 3', -1)
ArgumentError: number of matches should be >= 0
from (pry):5:in `match'

1.3.0: Support Homebrew on Apple Silicon by default

12 Mar 13:43
372b6df
Compare
Choose a tag to compare

GitHub: #50

To make installation easier for users of Homebrew on Apple Silicon, add /opt/homebrew to the default paths searched when trying to find the underlying re2 library. While doing so, add an extra fallback to /usr (instead of only searching /usr/local).

Note we still search /usr/local first to avoid accidentally changing behaviour for existing users (e.g. suddenly compiling against a different version of re2 in /usr).

1.2.0: Stop using deprecated re2 APIs

18 Apr 11:08
Compare
Choose a tag to compare

GitHub: #40

As re2 has deprecated and now removed the utf8 option, re-implement the option in the gem in terms of the encoding and set_encoding API.

This should be entirely backward-compatible as the encoding API has been present since the initial release in 2010.

Thanks to @buzzdeee for reporting this upcoming breaking change.

1.0.0: One Point Oh

14 Nov 22:21
Compare
Choose a tag to compare

After being used in production at @altmetric for a while now and having been six years since its first release, it's time for re2 1.0.0.

The only notable change from 0.7.0 is support for recent versions of the underlying re2 library which require C++11 support.

The gem still supports older versions of re2 but any attempt to compile newer versions will require a compiler with appropriate language support (e.g. clang 3.4 on Ubuntu 12.04).

0.7.0: MatchData begin & end

25 Jan 16:27
v0.7.0
Compare
Choose a tag to compare

Thanks to an issue raised by @driskell about functionality missing from RE2's MatchData (compared to MRI's) in #20, I'm happy to announce version 0.7.0 of re2, now including RE2::MatchData#begin and RE2::MatchData#end for finding the offset of matches in your searches.

The API is the same as the standard library's begin and end:

m = RE2('w(o+)').match('he said woohoo!')
m.begin(0)
# => 8
m.end(0)
# => 11

It also works with RE2's named captures:

m = RE2('w(?P<cheers>o+)').match('he said woohoo!')
m.begin('cheers')
# => 9
m.end(:cheers)
# => 11

Note that on versions of Ruby prior to 1.9, the offset will be in bytes while later Ruby versions will return the offset in characters. This is to be consistent with other string functions (such as length and slicing with []) so as to have the least surprising behaviour when dealing with multibyte characters. This is illustrated by the specs for this behaviour which cannot rely on the exact return value of begin and end.

As a technical aside: the trickiest part of implementing this was efficiently calculating the length of the offset string as different implementations of Ruby vary in their string functions. Prior to Ruby 1.9, the offset is calculated using simple pointer arithmetic but other versions will try to use rb_str_sublen when available, falling back to rb_str_length (and incurring the cost of an extra string allocation) on implementations such as Rubinius.

Many thanks to @driskell for originally contributing this and providing invaluable feedback during its development.

0.6.0: Introducing RE2::Scanner

01 Feb 23:48
Compare
Choose a tag to compare

Scanning

Thanks to a suggestion from Matthias Kadenbach, re2 now contains an API for incrementally scanning a string for matches. To use it, call scan on an instance of RE2::Regexp with the string you want to search:

scanner = RE2('(\d+)').scan("Some 1 long 23 string 4 containing 567 numbers")
scanner.scan #=> ["1"]
scanner.scan #= ["23"]

The scanner in the example above is an instance of RE2::Scanner which has one main method -- scan -- which returns the next match. Once no more matches are found, scan will return nil. You can use rewind to reset a scanner back to the beginning of the string.

The RE2::Scanner class also implements Ruby's Enumerator interface so you can call each and to_enum on it:

scanner = RE2('(\d+)').scan("Some 1 long 23 string 4 containing 567 numbers")
scanner.each do |match|
  puts match
end

No more in-place replacement

This release removes methods that previously altered strings in-place. This means re2_sub! and re2_gsub! are gone and RE2.Replace and RE2.GlobalReplace now return new strings rather than modifying their input.

Encoding awareness

Again, thanks to a bug report by Matthias Kadenbach: in Ruby 1.9 and later, re2 will now set the correct encoding for strings.

m = RE2('(\w+)', :utf8 => true).match("foo")
m[1].encoding # => #<Encoding:UTF-8>