How to do case-insensitive query in tree-sitter - case-insensitive

I'm working on trying to create and use tree-sitter grammar in a language server I am implementing in order to support features like finding all references of a variable. Given the grammar I would be able to write a query to find all of the references to a variable with a specific name (ex. myVar). However, the language I am writing a language server for uses case insensitive variables (ex. myVar can be referenced as MYVAR, MyVaR, myvar, etc.).
How would I be able to write a tree-sitter query to match a pattern where a token must case-insensitively match a particular string?
I could write the query to not filter by the variable name and implement my own filtering of the results, but I was wondering if there was a way to handle this within the query itself rather than implementing custom filtering code.
Here is a simplified example case to show what I mean.
Given the following grammar, I want to query for all of the set_statements that set a new value to the variable myVar.
module.exports = grammar({
name: 'mylang',
rules: {
source_file: $ => repeat($._statement),
_statement: $ => choice(
set_statement: $ => seq(
field("variable", $.identifier),
field("value", $._expression),
_expression: $ => choice(
identifier: $ => /[a-zA-Z0-9]+/,
integer_literal: $ => /[0-9]+/,
Normally I would be able to do this with a query like the following.
variable: (identifier) #variable)
(#eq? #variable "myVar")
However, as we can see with the following example of running the query, this only picks up on the references to myVar that use the same casing as the query.
$ cat set_testing.txt
set myVar 0
set MYVAR 23
set myVar2 72
set MyVaR 14
$ tree-sitter query find_variable.query set_testing.txt
pattern: 0
capture: variable, start: (0, 4), text: "myVar"
I want to create a query that would instead find:
tree-sitter query find_variable.query set_testing.txt
pattern: 0
capture: variable, start: (0, 4), text: "myVar"
pattern: 0
capture: variable, start: (1, 4), text: "MYVAR"
pattern: 0
capture: variable, start: (3, 4), text: "MyVaR"

Change your query to match a regular expression matching all possible upper/lower combinations of an identifier, in this case myvar.
If you change find_variable.query to use match with a regular expression for all case combinations:
variable: (identifier) #variable)
(#match? #variable "^[mM][yY][vV][aA][rR]$")
Now running tree-sitter query find_variable.query set_testing.txt returns:
pattern: 0
capture: variable, start: (0, 4), text: "myVar"
pattern: 0
capture: variable, start: (1, 4), text: "MYVAR"
pattern: 0
capture: variable, start: (3, 4), text: "MyVaR"
Tree-sitter does not support case insensitive regular expression searches Issue #261 so the regular expressions are a little longer.


How to extract parameter names and values using regular expressions

I would like to know how to extract values of all this parameters.
My regular expression:
Parameter names and values that should match:
name='John Doe'
name=John Doe
organization=Acme Widgets Inc.
When i run my expression in it only finds the first parameter that matches. In this case being: name='John Doe'
Desired output is name John Doe
I am having extra trouble understanding how to find and extract parameter names and values without parantesis and equals signs.
Try this:
The keyword will be in capture group 1 (there's no need for [] around \w).
There's no need for a capture group around the equal sign.
[^"]? allows an optional quote after the equal sign. There's no need to put it in a capture group.
([^'"\n]+) then matches everything that isn't another quote or newline. So it will capture everything until either a quote or the end of the line. This value will be put into group 2.
i hope this will useful for you:
I test the pattern in:
var str = `
name='John Doe'
name=John Doe
organization=Acme Widgets Inc.
console.log( str.match(/^.*?=.*?$/gm).map(str => str.replace(/("|')/g, '').replace(/=/g, ' ') ) )

Single RegEx to catch multiple options and replace with their corresponding replacements

The problem goes like this:
value match: 218\d{3}(\d{4}) replace with 10\1 to get 10 followed by last 4 digits
for example 2181234567 would become 104567
value match: 332\d{3}(\d{4}) replace with 11\1 to get 11 followed by last 4 digits
for example 3321234567 would become 114567
value match: 420\d{3}(\d{4}) replace with 12\1 to get 12 followed by last 4 digits
..and so on
for example 4201234567 would become 124567
Is there a better way to catch different values and replace with their corresponding replacements in a single RegEx than creating multiple expressions?
Like (218|332|420)\d{3}(\d{4}) to replace 10\4|11\4|12\4) and get just their corresponding results when matched.
Edit: Didn't specify the use case: It's for my PBX, that just uses RegEx to match patterns and then replace it with the values I want it to go out with. No code. Just straight up RegEx in the GUI.
Also for personal use, if I can get it to work with Notepad++
Find what: (?:(218)|(332)|(420))\d{3}(\d{4})(?=#domain\.com)
Replace with: (?{1}10$4)(?{2}11$4)(?{3}12$4)
CHECK Wrap around
CHECK Regular expression
Replace all
(?: # non capture group
(218) # group 1, 218
| # OR
(332) # group 2, 332
| # OR
(420) # group 3, 420
) # end group
\d{3} # 3 digits
(\d{4}) # group 4, 4 digits
(?=#domain\.com) # positive lookahead, make sure we have "" after
# that allows to keep ""
# if you want to remove it from the result, just put "#domain\.com"
# without lookahead.
(?{1} # if group 1 exists
10 # insert "10"
$4 # insert content of group 4
) # endif
(?{2}11$4) # same as above
(?{3}12$4) # same as above
Screenshot (before):
Screenshot (after):
I don't think you can use a single regular expression to conditionally replace text as per your example. You either need to chain multiple search & replace, or use a function that does a lookup based on the first captured group (first three digits).
You did not specify the language used, regular expressions vary based on language. Here is a JavaScript code snippet that uses the function with lookup approach:
var str1 = '';
var str2 = '';
var str3 = '';
var strMap = {
'218': '10',
'332': '11',
'420': '12'
// add more as needed
function fixName(str) {
var re = /(\d{3})\d{3}(\d{4})(?=\#domain\.com)/;
var result = str.replace(re, function(m, p1, p2) {
return strMap[p1] + p2;
return result;
var result1 = fixName(str1);
var result2 = fixName(str2);
var result3 = fixName(str3);
console.log('str1: ' + str1 + ', result1: ' + result1);
console.log('str2: ' + str2 + ', result2: ' + result2);
console.log('str3: ' + str3 + ', result3: ' + result3);
str1:, result1:
str2:, result2:
str3:, result3:
#Toto has a nice answer, and there is another method if the operator (?{1}...) is not available (but thanks, Toto, I did not know this feature of NotePad++).
More details on my answer here:
Append to the end of the doc:
Search for:
Replace with
watch in action:

Groovy regex PatternSyntaxException when parsing GString-style variables

Groovy here. I'm being given a String with GString-style variables in it like:
String target = 'How now brown ${animal}. The ${role} has oddly-shaped ${bodyPart}.'
Keep in mind, this is not intended to be used as an actual GString!!! That is, I'm not going to have 3 string variables (animal, role and bodyPart, respectively) that Groovy will be resolving at runtime. Instead, I'm looking to do 2 distinct things to these "target" strings:
I want to be able to find all instances of these variables refs ("${*}") in the target string, and replace it with a ?; and
I also need to find all instances of these variables refs and obtain a list (allowing dupes) with their names (which in the above example, would be [animal,role,bodyPart])
My best attempt thus far:
class TargetStringUtils {
private static final String VARIABLE_PATTERN = "\${*}"
// Example input: 'How now brown ${animal}. The ${role} has oddly-shaped ${bodyPart}.'
// Example desired output: 'How now brown ?. The ? has oddly-shaped ?.'
static String replaceVarsWithQuestionMarks(String target) {
target.replaceAll(VARIABLE_PATTERN, '?')
// Example input: 'How now brown ${animal}. The ${role} has oddly-shaped ${bodyPart}.'
// Example desired output: [animal,role,bodyPart] } list of strings
static List<String> collectVariableRefs(String target) {
...produces PatternSytaxException anytime I go to run either method:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition near index 0
Any ideas where I'm going awry?
The issue is that you have not escaped the pattern properly, and findAll will only collect all matches, while you need to capture a subpattern inside the {}.
def target = 'How now brown ${animal}. The ${role} has oddly-shaped ${bodyPart}.'
println target.replaceAll(/\$\{([^{}]*)\}/, '?') // => How now brown ?. The ? has oddly-shaped ?.
def lst = new ArrayList<>();
def m = target =~ /\$\{([^{}]*)\}/
(0..<m.count).each { lst.add(m[it][1]) }
println lst // => [animal, role, bodyPart]
See this Groovy demo
Inside a /\$\{([^{}]*)\}/ slashy string, you can use single backslashes to escape the special regex metacharacters, and the whole regex pattern looks cleaner.
\$ - will match a literal $
\{ - will match a literal {
([^{}]*) - Group 1 capturing any characters other than { and }, 0 or more times
\} - a literal }.

In emacs, can I use alternation in the regexp for align-regexp?

For example, I have the following snippet:
'abc' => 1,
'abcabc' =>2,
'abcabcabc' => 3,
And I want to format it to:
'abc' => 1,
'abcabc' => 2,
'abcabcabc' => 3,
I know there are easier ways to do it but here I'm just want to practice my understanding of align-regexp. I've tried this command but it does not work:
C-u M-x align-regexp \(\s-+\)=\|\(>\s-*\)\d 1 1 y
Where I'm wrong?
So the question is: With \(\s-+\)=\|\(>\s-*\)\d matching \(\s-+\)= or \(>\s-*\)\d1, can we use align-regexp to align on each of those alternatives throughout a line.
The answer is no -- align-regexp modifies one specific matched group of the regexp. In this case it was group 1, and group 1 is the \(\s-+\) at the beginning. Group 1 of the regexp does not vary depending on what was actually matched, and so it never refers to \(>\s-*\)2.
If you can express your regexp such that it really is a single group of the regexp which should be replaced for every match throughout the line, you can get the effect you want, however.
e.g. >?\(\s-*\)[0-9=] would -- at least for the data shown -- give the desired result.
1 In Emacs \d matches d. That should be [0-9].
2 You generally don't want any non-whitespace in the alignment group, as Emacs replaces the content of that group.

.NET regex with quote and space

I'm trying to create a regex to match this:
/tags/ud617/?sort=active&page=2" >2
So basically, "[number]" is the only dynamic part:
/tags/ud617/?sort=active&page=[number]" >[number]
The closest I've been able to get (in PowerShell) is:
[regex]::matches('/tags/ud617/?sort=active&page=2" >2
But this doesn't provide me with a full match of the dynamic string.
Ultimately, I'll be creating a capture group:
Seems easy enough:
$regex = '/tags/ud617/\?sort=active&page=(\d+)"\s>2'
'/tags/ud617/?sort=active&page=2" >2' -match $regex > $nul
[regex]::matches('/tags/ud617/?sort=active&page=3000 >2','/tags/ud617/\?sort=active&page=(\d+) >(\d+)')
Groups : {/tags/ud617/?sort=active&page=3000 >2, 3000, 2}
Success : True
Captures : {/tags/ud617/?sort=active&page=3000 >2}
Index : 0
Length : 41
Value : /tags/ud617/?sort=active&page=3000 >2
This captures the page value and the number after the greater than i.e. 2