Regex: Syntax pop on two linefeeds - regex

I'm currently writing a Sublime syntax mode (YAML, for v3) for a language which has an unusual comment format.
Documentation comments:
start with the symbol # as the first character in a LOC, and
end with two newlines
A simple example is this:
# The following function returns
the opposite of what you think it does.
code...
and a worst-case example:
#
This is a comment,
this is still the same comment.
This, too. These don't matter: # foobar ##
code...
My current approach is to use the stack.
Push:
- match: '#'
scope: punctuation.definition.comment.mona
push: doc_comment
Pop:
line_comment:
- meta_scope: comment.line.mona
- match: '\n\n'
pop: true
That doesn't work though. I tried to fix this by using s, thinking that it would produce behavior like this, but it produces a Sublime error (invalid option for capture group).
How can I match this comment format correctly with S3 YAML?

Apparently I'm a bit thick today. The answer was obvious, to match on an empty line:
- match: '^$'
pop: true

Related

Ansible: Insert word in GRUB cmdline

I'd like to use Ansible's lineinfile or replace module in order to add the word splash to the cmdline in GRUB.
It should work for all the following examples:
Example 1:
Before: GRUB_CMDLINE_DEFAULT=""
After: GRUB_CMDLINE_DEFAULT="splash"
Example 2:
Before: GRUB_CMDLINE_DEFAULT="quiet"
After: GRUB_CMDLINE_DEFAULT="quiet splash"
Example 3:
Before: GRUB_CMDLINE_DEFAULT="quiet nomodeset"
After: GRUB_CMDLINE_DEFAULT="quiet nomodeset splash"
The post Ansible: insert a single word on an existing line in a file explained well how this could be done without quotes. However, I can't get it to insert the word within the quotes.
What is the required entry in the Ansible role or playbook in order to add the word splash to the cmdline as shown?
You can do this without a shell output, with 2 lineinfiles modules.
In your example you're searching for splash:
- name: check if splash is configured in the boot command
lineinfile:
backup: true
path: /etc/default/grub
regexp: '^GRUB_CMDLINE_LINUX=".*splash'
state: absent
check_mode: true
register: grub_cmdline_check
changed_when: false
- name: insert splash if missing
lineinfile:
backrefs: true
path: /etc/default/grub
regexp: "^(GRUB_CMDLINE_LINUX=\".*)\"$"
line: '\1 splash"'
when: grub_cmdline_check.found == 0
notify: update grub
The trick is to try to remove the line if we can find splash somewhere, but doing a check only check_mode: true. If the term was found (found > 0) then we don't need to update the line. If it's not found, it means we need to insert it. We append it at the end with the backrefs.
Inspired by Adam's answer, I use this one to enable IOMMU:
- name: Enable IOMMU
ansible.builtin.lineinfile:
path: /etc/default/grub
regexp: '^GRUB_CMDLINE_LINUX_DEFAULT="((:?(?!intel_iommu=on).)*?)"$'
line: 'GRUB_CMDLINE_LINUX_DEFAULT="\1 intel_iommu=on"'
backup: true
backrefs: true
notify: update-grub
Please note I've had to set backrefs to true in order to \1 reference to work otherwise the captured group was not replaced.
Idempotency works fine as well.
EDIT: Please note this snippet only works with an Intel CPU and might to be updated to fit your platform.
A possible solution is the definition of two entries as follows:
- name: "Checking GRUB cmdline"
shell: "grep 'GRUB_CMDLINE_LINUX_DEFAULT=.*splash.*' /etc/default/grub"
register: grub_cfg_grep
changed_when: false
failed_when: false
- name: "Configuring GRUB cmdline"
replace:
path: '/etc/default/grub'
regexp: '^GRUB_CMDLINE_LINUX_DEFAULT="((\w.?)*)"$'
replace: 'GRUB_CMDLINE_LINUX_DEFAULT="\1 splash"'
when: '"splash" not in grub_cfg_grep'
Explanation: We first check if the splash keyword is present in the required line using grep. Since grep gives a negative return code when a string is not found, we suppress the errors using failed_when: false. The output of grep is saved to the grub_cfg_grep variable.
Next, we bind the replace module to the condition that the keyword splash is in the standard output of grep. The regular expression takes the old content in the quotes and adds the splash keyword behind it.
Note: In the case of an empty string before the execution, the result reads " splash" (with a space in front) but it is still a valid cmdline.
The difficulty is this line in the replace module page: "It is up to the user to maintain idempotence by ensuring that the same pattern would never match any replacements made."https://docs.ansible.com/ansible/latest/modules/replace_module.html#id4 It's easy to insert the item but actually quite tricky to make it idempotent, so the target file doesn't grow every time you run the task.
I found a way to do it in one shot with the replace module. You should be able to adapt this. My task checks the GRUB_CMDLINE_LINUX_DEFAULT line for "vt.default_red" and inserts some colour codes if not found.
My method was to copy-and-paste various nearly-there examples into the regex tester website and fiddle until it worked. I still don't grok the result, but it worked in my tests at https://www.regextester.com/ and it works in my playbook.
One problem I had was that Ansible's regex implementation apparently doesn't support conditionals, which gave me odd errors for a while.
- name: colours | configured grub command
replace:
path: /etc/default/grub
regexp: '^GRUB_CMDLINE_LINUX_DEFAULT="((:?(?!vt\.default_red).)*?)"$'
replace: 'GRUB_CMDLINE_LINUX_DEFAULT="\1 vt.default_red=0xee,..."'
The regex matches the literal string ("GRUB_CMDLINE_LINUX_DEFAULT=" and a double quote mark) at the start and the double quote mark at the end. Deconstructing the rest...
( - open capture group #1 (creates backref #1)
(:? - open a non-capturing group (not sure what the question mark is here)
(?! - negative lookahead (ie. don't match if the following string comes next)
vt\.default_red - the string to look for, literal dot is escaped
) - close negative lookahead
.) - match a single char (why?) and close the non-capturing group
* - try to match the non-capturing group zero or more times
? - ... lazily (ie. get the smallest possible match)
) - close capture group #1
What about doing this in Ansible, use perl to address your need.
- name: Change items in the file
ansible.builtin.command:
command: perl -i pe 's/DEFAULT="/DEFAULT="splash"/'
Another way of looking at it. This is an old conversation, but it is still relevant.

Parsing whitespace-oriented conf file with Regex

I'm trying to parse a gitolite.conf file, which is a whitespace-oriented conf file with a few regexes. The worst problem is that some options might appear anywhere:
#staff = dilbert alice # line 1
#projects = foo bar # line 2
repo #projects baz # line 3
RW+ = #staff # line 4
- master = ashok # line 5
RW = ashok # line 6
R = wally # line 7
config hooks.emailprefix = '[%GL_REPO] ' # line 8
Check the "master" attribute. Some repos have them, others do not. It's a real pain.
This answer assumes a goal of extracting key/value pairs into capturing groups, where key consists of contiguous non-whitespace before = and value includes everything after = but before #, trimmed of leading/trailing whitespace.
Basic version
([^\s]+)\s*=\s*((?:\s*[^\s#]+)*)
More advanced version
The regex above doesn't handle quoted strings very well (e.g. prefix = ' Quoted with # and leading/trailing whitespace '). Regex isn't great at this kind of thing but simple cases can be handled as follows:
([^\s]+)\s*=\s*('[^']*'|"[^"]*"|(?:(?:\s*[^\s#]+)*))
Here's the demo if you need to see what is captured and play around with it more: Debuggex Demo
First, you should know that this isn't entirely possible with Regex. Regex is a great tool for parsing regular languages (including some types of configuration files), but as soon as you get into "Well, this line is actually a header line and we need all lines under it, and some lines might have this token, and others might not", it gets quite messy. I'm not saying it's impossible, but you're going to waste a lot of time debugging your Regex pattern instead of just writing a parser in whatever language you're using this with.
Second, if you're going to ask a quesiton about Regex, it is always helpful to know what you want out of the expression. Do you want to tokenize everything, do you only want the configuration keys, do you only want the comments?
That being said, I took my best guess, here's an expression to get you started:
^(?:([^=#]+?)\s.?=?\s.?([^=#]+?)\s.?(?:#|$))
With this expression, please apply the g and m flags (global and multiline). In PCRE, this would look like:
/^(?:([^=#]+?)\s.?=?\s.?([^=#]+?)\s.?(?:#|$))/gm
There are two capture groups, one is whatever is before the = sign, and the other is whatever is after. If there is no = sign, the first capture group contains everything. Anything after "#" is ignored.
Here's a fiddle to demonstrate: http://www.rexfiddle.net/eQexbZU

vim syntax highlighting: match, contains

I am trying to make an angularjs syntax highlighting file for vim. A piece of the file is:
syn match ngMethods /\.[0-9A-Za-z_\-\$]\+\s*\((\|=\)/ contains=AngularMethods
syn keyword AngularMethods contained $addControl $anchorScroll $animate ...
syn match ngObjMethods /^\s*[0-9A-Za-z_\-\$]\+\s*:/ contains=AngularObjectMethods
syn keyword AngularObjectMethods contained compile controller link ...
etc...
Down below I have:
hi def link AngularMethods Function
hi def link AngularObjectMethods Function
The first regular expression (for AngularMethods) is supposed to capture things like $addControl in the following:
myelement.$addControl()
myelement.$addControl = function ()
The second regular expression (for AngularObjectMethods) captures things like compile in:
compile : function () {}
The AngularMethods one does NOT work but the latter one does. Can anyone see the problem? I've also tried using the regexes:
/\.\zs[0-9A-Za-z_\-\$]\+\ze\s*\((\|=\)/
/\.[0-9A-Za-z_\-\$]\+\s*\((\|=\)\#=/
The former matches the exact word. The latter is something I saw in another syntax file. Any ideas? Thanks for your help!
Edit:
Kent (below) was correct about the keyword. This uncovered the real problem which is that I have another regex:
syn match ngProperties /\.[0-9A-Za-z_\-\$]\+\s*[^(=]/ contains=AngularProperties
syn keyword AngularProperties contained $attr $dirty $error ...
which is supposed to be the complement of the ngMethods regex. If I comment out the ngProperties regex, the ngMethods regex works. This means ngProperties is bad. It is supposed to catch things like $attr in:
var myAttribute = element.$attr;
I will try to fix this. Can someone post the correct regex just in case?
The regex is not the problem for your syntax.
What very likely the cause of the problem is, your iskeyword option doesn't have the dollar ($) sign.
what you can test is:
remove the $ from contained $addControl $anchorScrol, to see if it will work
or
execute: set iskeyword+=$ to see if it works.

Selecting URLs using RegExp but ignoring them when surrounded by double quotes

I've searched around quite a bit now, but I can't get any suggestions to work in my situation. I've seen success with negative lookahead or lookaround, but I really don't understand it.
I wish to use RegExp to find URLs in blocks of text but ignore them when quoted. While not perfect yet I have the following to find URLs:
(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?
I want it to match the following:
www.test.com:50/stuff
http://player.vimeo.com/video/63317960
odd.name.amazone.com/pizza
But not match:
"www.test.com:50/stuff
http://plAyerz.vimeo.com/video/63317960"
"odd.name.amazone.com/pizza"
Edit:
To clarify, I could be passing a full paragraph of text through the expression. Sample paragraph of what I'd like below:
I would like the following link to be found www.example.com. However this link should be ignored "www.example.com". It would be nice, but not required, to have "www.example.com and www.example.com" ignored as well.
A sample of a different one I have working below. language is php:
$articleEntry = "Hey guys! Check out this cool video on Vimeo: player.vimeo.com/video/63317960";
$pattern = array('/\n+/', '/(https?\:\/\/)?(player\.vimeo\.com\/video\/[0-9]+)/');
$replace = array('<br/><br/>',
'<iframe src="http://$2?color=40cc20" width="500" height="281" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>');
$articleEntry = preg_replace($pattern,$replace,$articleEntry);
The result of the above will replace any new lines "\n" with a double break "" and will embed the Vimeo video by replacing the Vimeo address with an iframe and link.
I've found a solution!
(?=(([^"]+"){2})*[^"]*$)((https?:\/\/)?(\w+\.)+\w{2,}(:[0-9]+)?((\/\w+)+(\.\w+)?)?\/?)
The first part from (? to *$) what makes it work for me. I found this as an answer in java Regex - split but ignore text inside quotes? by https://stackoverflow.com/users/548225/anubhava
While I had read that question before, I had overlooked his answer because it wasn't the one that "solved" the question. I just changed the single quote to double quote and it works out for me.
add ^ and $ to your regex
^(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?$
please notice you might need to escape the slashes after http (meaning https?\:\/\/)
update
if you want it to be case sensitive, you shouldn't use \w but [a-z]. the \w contains all letters and numbers, so you should be careful while using it.

Why is it selecting this file?

I have the following statement:
Directory.GetFiles(filePath, "A*.pdf")
.Where(file => Regex.IsMatch(Path.GetFileName(file), "[Aa][i-lI-L].*"))
.Skip((pageNum - 1) * pageSize)
.Take(pageSize)
.Select(path => new FileInfo(path))
.ToArray()
My problems is that the above statement also finds the file "Adali.pdf" which it should not - but i cannot figure out why.
The above statement should only select files starting with a, and where the second letter is in the range i-l.
Because it matches Adali taking 3rd and 4th characters (al):
Adali
--
Try using ^ in your regex which allows looking for start of the string (regex cheatsheet):
Regex.IsMatch(..., "^[Aa][i-lI-L].*")
Also I doubt you need asterisk at all.
PS: As a sidenote let me notice that this question doesn't seem to be written that good. You should try debugging this code yourself and particularly you should try checking your regex against your cases without LINQ. I'm sure there is nothing to do here with LINQ (the tag you have in your question), but the issue is about regular expressions (which you didn't mention in tags at all).
You are not anchoring the string. This makes the regex match the al in Adali.pdf.
Change the regex to ^[Aa][i-lI-L].* You can do just ^[Aa][i-lI-L] if you don't need anything besides matching.
You should to do this
var f = Directory.GetFiles(tb_Path.Text, "A*.pdf").Where(file => Regex.IsMatch(Path.GetFileName(file), "[Aa][i-lI-L].pdf")).ToArray();
When you call ".*" Adali accept in Regex