Steven Levithan Profile
Steven Levithan

@slevithan

Followers
851
Following
589
Statuses
1K

Creator → Regex+, Oniguruma-To-ES, https://t.co/HwKEPCL2wV, https://t.co/GBaROcUNIY Coauthor → Regular Expressions Cookbook, High Performance JavaScript

Joined May 2008
Don't wanna be here? Send us removal request.
@slevithan
Steven Levithan
6 days
Oniguruma-To-ES v3.1.0 includes a fancy new feature that I think is new in JS: Lazy construction of RegExp objects, deferred until first use in a search (with nothing observably different before/after). It also allows doing so based on the length of the pattern, since only extremely long patterns are slow to construct in V8. This can be useful e.g. in TextMate grammars that have long lists of (sometimes extremely long) regexes that aren't always used.
0
0
1
@slevithan
Steven Levithan
8 days
@erikcorry I take back that unpredictability is desirable. Regex authors will almost always know that e.g. 100K backtracks is unexpected--so that works! Re: fragility of counting backtracks, just an idea: what about counting order of magnitude? 1K, 10K, 100K, 1M, 10M are useful thresholds
0
0
0
@slevithan
Steven Levithan
26 days
oniguruma-to-es hits a big milestone with v2.0 by comprehensively supporting the extremely flexible \G anchor (which has no direct equivalent in JS regexes). See
0
0
1
@slevithan
Steven Levithan
1 month
Ever wondered why there aren't good tools for converting between different regex flavors automatically? Would you use a universal regex translator in your own projects? Want to help build one?
0
0
1
@slevithan
Steven Levithan
1 month
I think it's a longshot that VS Code will adopt oniguruma-to-es, but let's see what they say 😊
0
0
1
@slevithan
Steven Levithan
1 month
Just shipped: oniguruma-to-es reaches v1.0.0 ✨ New release includes lots of edge case fixes reported by @TheRedCMD, making its emulation even more accurate
0
2
4
@slevithan
Steven Levithan
1 month
What do you think is missing that would make it easier to use or make regexes in JavaScript better?
0
0
1
@slevithan
Steven Levithan
1 month
.@fabiospampinato possibly of interest: @antfu7 just added support for precompiled grammars (with regexes that have been pre-run through oniguruma-to-es) in Shiki 1.26
0
0
2
@slevithan
Steven Levithan
1 month
RT @antfu7: In 2024, we have redistributed a total of $4,371.83 USD to the open-source community! Thanks to everyone who supported this ide…
0
6
0
@slevithan
Steven Levithan
1 month
Those 2 are top of mind from dealing with them in a different context for oniguruma-to-es 🙂 There's also SpecialCasing.txt but that doesn't change anything for \w. Not sure whether that file is already covered by CaseFolding.txt (I'm not very knowledgeable about working with UDC files)
1
0
0
@slevithan
Steven Levithan
1 month
@gibson042 @fabiospampinato If there are others, I’d love to know about them.
1
0
0
@slevithan
Steven Levithan
1 month
Re: leftmost-first alternation, yeah, I left out almost all the great things inherited without modification from Perl and elsewhere (hence my comment that JS got many things right). Re: \w\d\s\b, it’s complicated—lots of tradeoffs/nuances, and the legacy of JS would not have made it a good idea for flags u/v to change their behavior. That would need its own flag, but then \s (only) is already Unicode based, which complicates that. Availability of Unicode properties with \p made all of these a non-issue except \b. JS still needs a sane solution for Unicode-aware \b. What do you mean that \w\b are partially affected by Unicode flags? I guess you mean for their inversions \W\B due to code-point-based matching? If so, that also applies to \D\S.
1
0
0
@slevithan
Steven Levithan
1 month
@erikcorry Are there implementation challenges with a timeout option? Not as satisfying/fancy as a backtracking cap, but timeout is an easier way for devs to think about it and wouldn’t affect backtracking-related optimizations.
1
0
1
@slevithan
Steven Levithan
1 month
@erikcorry Agree and disagree on default /s. In principle, obviously yes since JS regex matching isn’t line based. But default /s would break portability (no popular flavor does that) and it would exacerbate problems with the common misuse of too-permissive quantified dot.
0
0
1