![Steven Levithan Profile](https://pbs.twimg.com/profile_images/3100867631/f64591ef5cfd4b649e4fdba1175b1a36_x96.jpeg)
Steven Levithan
@slevithan
Followers
851
Following
589
Statuses
1K
Creator → Regex+, Oniguruma-To-ES, https://t.co/HwKEPCL2wV, https://t.co/GBaROcUNIY Coauthor → Regular Expressions Cookbook, High Performance JavaScript
Joined May 2008
Oniguruma-To-ES v3.1.0 includes a fancy new feature that I think is new in JS: Lazy construction of RegExp objects, deferred until first use in a search (with nothing observably different before/after). It also allows doing so based on the length of the pattern, since only extremely long patterns are slow to construct in V8. This can be useful e.g. in TextMate grammars that have long lists of (sometimes extremely long) regexes that aren't always used.
0
0
1
@erikcorry I take back that unpredictability is desirable. Regex authors will almost always know that e.g. 100K backtracks is unexpected--so that works! Re: fragility of counting backtracks, just an idea: what about counting order of magnitude? 1K, 10K, 100K, 1M, 10M are useful thresholds
0
0
0
Just shipped: oniguruma-to-es reaches v1.0.0 ✨ New release includes lots of edge case fixes reported by @TheRedCMD, making its emulation even more accurate
0
2
4
.@fabiospampinato possibly of interest: @antfu7 just added support for precompiled grammars (with regexes that have been pre-run through oniguruma-to-es) in Shiki 1.26
0
0
2
RT @antfu7: In 2024, we have redistributed a total of $4,371.83 USD to the open-source community! Thanks to everyone who supported this ide…
0
6
0
Re: leftmost-first alternation, yeah, I left out almost all the great things inherited without modification from Perl and elsewhere (hence my comment that JS got many things right). Re: \w\d\s\b, it’s complicated—lots of tradeoffs/nuances, and the legacy of JS would not have made it a good idea for flags u/v to change their behavior. That would need its own flag, but then \s (only) is already Unicode based, which complicates that. Availability of Unicode properties with \p made all of these a non-issue except \b. JS still needs a sane solution for Unicode-aware \b. What do you mean that \w\b are partially affected by Unicode flags? I guess you mean for their inversions \W\B due to code-point-based matching? If so, that also applies to \D\S.
1
0
0
@erikcorry Are there implementation challenges with a timeout option? Not as satisfying/fancy as a backtracking cap, but timeout is an easier way for devs to think about it and wouldn’t affect backtracking-related optimizations.
1
0
1
@erikcorry Agree and disagree on default /s. In principle, obviously yes since JS regex matching isn’t line based. But default /s would break portability (no popular flavor does that) and it would exacerbate problems with the common misuse of too-permissive quantified dot.
0
0
1