vim - code folding by expression - regex

I have some sourcecode with curly brackets code blocks
I want to be able to fold the blocks having some if condition in front, and leave the other code blocks unfolded.
example input:
print "this is a test"
if a == b {
{ x = 1
y = 2
z = 3
}
k = [1, 2, 3]
}
{ l = 5 }
return "foo"
expected output:
print "this is a test"
if a == b {
+-- 6 lines:
}
{ l = 5 }
return "foo"
I've read this and this, but still no idea how to face the problem.
Any suggestions ?

Assuming that the if closing '}' brace is at the beginning of a line, you can use:
:g/if.*{/+,/^}/-fold
This folds the statements within the {} braces of the if, excluding the braces themselves.
This is achieved through the + and - movements put after the patterns that define the g range (there's a coma between the patterns): + moves down the range by one line from the first matched pattern (/if.*{/) and the - moves the range one line up from the second matched pattern (/^}/)
If you have indented closing '}' braces or for any circumstance where the above command does not apply, you can try to look for other patterns that you can exploit and change the ex command above as needed.

Related

Further define a GAWK match and divide operation

I have some TXT files with numbers in them that I need to divide by 4.
Text-line I'm matching and changing is:-
scale = 23 23
My little GAWK file looks like this:-
/scale [\=] [0-9]+ [0-9]+/ {
$3 = int($3/4)
$4 = int($4/4) }
{print}
So I successfully get "scale = 5 5"
But, I have 3 more requirements, however, and would love some help...
1) the "scale" parameter should only be that following another match called "detail" on some lines above it.
(so instead of simply matching every "scale = " it would be "detail(.....)scale = ") (any number/letter/+newline between them)
2) these values of "scale" should never be lower than 1.
(dividing anything lower than 6 should always give a result of 1 (just changing "scale = 0" to "scale = 1" after will do))
3) values should preferably round up instead of down.
(so instead of 5 here from 23, it is actually 5.75 and should round up to 6 (this isn't SO important, but would be nice))
Something like this perhaps?
awk '/detail/ { d=1 }
d && /scale = [0-9]+ [0-9]+/ && $3>1 && $4>1 {
$3 = $3<6 ? 1 : sprintf("%1.0f", $3/4)
$4 = $4<5 ? 1 : sprintf("%1.0f", $4/4)
d = 0 }
1'
sprintf with a suitable format specifier applies rounding (see e.g. https://www.gnu.org/software/gawk/manual/html_node/Round-Function.html)
The ternary operator x ? y : z produces y if x is true, otherwise z.
Notice also the minor simplifications (= doesn't need a backslash or a character class, and {print} can be shortened to just 1).

How to split expression containing brackets correctly

I am trying to write an expression handler that will correctly split brackets, until today it has worked very well, but I've now encountered a problem I hadn't thought of.
I try to split the expression by the content of brackets first, once these are evaluated I replace the original content with the results and process until there are no brackets remaining.
The expression may contain marcos/variables. Macros are denoted by text wrapped in $macro$.
A typical expression:
($exampleA$ * 3) + ($exampleB$ / 2)
Macros are replaced before the expression is evaluated, the above works fine because the process is as follows:
Split expression by brackets, this results in two expressions:
$exampleA$ * 3
$exampleB$ / 2
Each expression is then evaluated, if exampleA = 3 and exampleB = 6:
$exampleA$ * 3 = 3 * 3 = 9
$exampleB$ / 2 = 6 / 2 = 3
The expression is then rebuilt using the results:
9 + 3
The final expression without any brackets is then evaluated to:
12
This works fine until an expressions with nested brackets is used:
((($exampleA$ * 3) + ($exampleB$ / 2) * 2) - 1)
This breaks completely because the regular expression I'm using:
regex("(?<=\\()[^)]*(?=\\))");
Results in:
($exampleA$ * 3
$exampleB$ / 2
So how can I correctly decode this, I want the above to be broken down to:
$exampleA$ * 3
$exampleB$ / 2
I am not exactly sure what you are trying to do. If you want to match the innermost expressions, wouldn't this help?:
regex("(?<=\\()[^()]*(?=\\))");
By the way, are the parentheses in your example unbalanced on purpose?
Traditional regex cannot handle recursive structures like nested brackets.
Depending on which regex flavor you are using, you may be able to use regex recursion. Otherwise, you will probably need a new method for parsing the groups. I think the traditional way is to represent the expression as a stack: start with an empty stack, push when you find a '(', pop when you find a ')'.
You can't really do this with regex. You really need a recursive method, like this:
using System;
using System.Data;
using System.Xml;
public class Program
{
public static void Main() {
Console.WriteLine(EvaluateExpression("(1 + 2) * 7"));
}
public static int EvaluateExpression(string expression) {
// Recursively evaluate parentheses as sub expressions
var expr = expression.ToLower();
while (expr.Contains("(")) {
// Find first opening bracket
var count = 1;
var pStart = expr.IndexOf("(", StringComparison.InvariantCultureIgnoreCase);
var pos = pStart + 1;
// Find matching closing bracket
while (pos < expr.Length && count > 0) {
if (expr.Substring(pos, 1) == "(") count++;
if (expr.Substring(pos, 1) == ")") count--;
pos++;
}
// Error if no matching closing bracket
if (count > 0) throw new InvalidOperationException("Closing parentheses not found.");
// Divide expression into sub expression
var pre = expr.Substring(0, pStart);
var subexpr = expr.Substring(pStart + 1, pos - pStart - 2);
var post = expr.Substring(pos, expr.Length - pos);
// Recursively evaluate the sub expression
expr = string.Format("{0} {1} {2}", pre, EvaluateExpression(subexpr), post);
}
// Replace this line with you're own logic to evaluate 'expr', a sub expression with any brackets removed.
return (int)new DataTable().Compute(expr, null);
}
}
I'm assuming your using C# here... but you should get the idea and be able to translate it into whatever.
If you use the following regex, you can capture them as group(1). group(0) will have parenthesis included.
"\\(((?:\"\\(|\\)\"|[^()])+)\\)"
Hope it helps!

regex with all components optionals, how to avoid empty matches

I have to process a comma separated string which contains triplets of values and translate them to runtime types,the input looks like:
"1x2y3z,80r160g255b,48h30m50s,1x3z,255b,1h,..."
So each substring should be transformed this way:
"1x2y3z" should become Vector3 with x = 1, y = 2, z = 3
"80r160g255b" should become Color with r = 80, g = 160, b = 255
"48h30m50s" should become Time with h = 48, m = 30, s = 50
The problem I'm facing is that all the components are optional (but they preserve order) so the following strings are also valid Vector3, Color and Time values:
"1x3z" Vector3 x = 1, y = 0, z = 3
"255b" Color r = 0, g = 0, b = 255
"1h" Time h = 1, m = 0, s = 0
What I have tried so far?
All components optional
((?:\d+A)?(?:\d+B)?(?:\d+C)?)
The A, B and C are replaced with the correct letter for each case, the expression works almost well but it gives twice the expected results (one match for the string and another match for an empty string just after the first match), for example:
"1h1m1s" two matches [1]: "1h1m1s" [2]: ""
"11x50z" two matches [1]: "11x50z" [2]: ""
"11111h" two matches [1]: "11111h" [2]: ""
This isn't unexpected... after all an empty string matches the expression when ALL of the components are empty; so in order to fix this issue I've tried the following:
1 to 3 quantifier
((?:\d+[ABC]){1,3})
But now, the expression matches strings with wrong ordering or even repeated components!:
"1s1m1h" one match, should not match at all! (wrong order)
"11z50z" one match, should not match at all! (repeated components)
"1r1r1b" one match, should not match at all! (repeated components)
As for my last attempt, I've tried this variant of my first expression:
Match from begin ^ to the end $
^((?:\d+A)?(?:\d+B)?(?:\d+C)?)$
And it works better than the first version but it still matches the empty string plus I should first tokenize the input and then pass each token to the expression in order to assure that the test string could match the begin (^) and end ($) operators.
EDIT: Lookahead attempt (thanks to Casimir et Hippolyte)
After reading and (try to) understanding the regex lookahead concept and with the help of Casimir et Hippolyte answer I've tried the suggested expression:
\b(?=[^,])(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b
Against the following test string:
"48h30m50s,1h,1h1m1s,11111h,1s1m1h,1h1h1h,1s,1m,1443s,adfank,12322134445688,48h"
And the results were amazing! it is able to detect complete valid matches flawlessly (other expressions gave me 3 matches on "1s1m1h" or "1h1h1h" which weren't intended to be matched at all). Unfortunately it captures emtpy matches everytime a unvalid match is found so a "" is detected just before "1s1m1h", "1h1h1h", "adfank" and "12322134445688", so I modified the Lookahead condition to get the expression below:
\b(?=(?:\d+[ABC]){1,3})(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b
It gets rid of the empty matches in any string which doesn't match (?:\d+[ABC]){1,3}) so the empty matches just before "adfank" and "12322134445688" are gone but the ones just before "1s1m1h", "1h1h1h" are stil detected.
So the question is: Is there any regular expression which matches three triplet values in a given order where all component is optional but should be composed of at least one component and doesn't match empty strings?
The regex tool I'm using is the C++11 one.
Yes, you can add a lookahead at the begining to ensure there is at least one character:
^(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)$
If you need to find this kind of substring in a larger string (so without to tokenize before), you can remove the anchors and use a more explicit subpattern in a lookahead:
(?=\d+[ABC])((?:\d+A)?(?:\d+B)?(?:\d+C)?)
In this case, to avoid false positive (since you are looking for very small strings that can be a part of something else), you can add word-boundaries to the pattern:
\b(?=\d+[ABC])((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b
Note: in a comma delimited string: (?=\d+[ABC]) can be replaced by (?=[^,])
I think this might do the trick.
I am keying on either the beginning of the string to match ^ or the comma separator , for fix the start of each match: (?:^|,).
Example:
#include <regex>
#include <iostream>
const std::regex r(R"~((?:^|,)((?:\d+[xrh])?(?:\d+[ygm])?(?:\d+[zbs])?))~");
int main()
{
std::string test = "1x2y3z,80r160g255b,48h30m50s,1x3z,255b";
std::sregex_iterator iter(test.begin(), test.end(), r);
std::sregex_iterator end_iter;
for(; iter != end_iter; ++iter)
std::cout << iter->str(1) << '\n';
}
Output:
1x2y3z
80r160g255b
48h30m50s
1x3z
255b
Is that what you are after?
EDIT:
If you really want to go to town and make empty expressions unmatched then as far as I can tell you have to put in every permutation like this:
const std::string A = "(?:\\d+[xrh])";
const std::string B = "(?:\\d+[ygm])";
const std::string C = "(?:\\d+[zbs])";
const std::regex r("(?:^|,)(" + A + B + C + "|" + A + B + "|" + A + C + "|" + B + C + "|" + A + "|" + B + "|" + C + ")");

Regex for parenthesis (JavaScript)

This is the regexp I created so far:
\((.+?)\)
This is my test string: (2+2) + (2+3*(2+3))
The matches I get are:
(2+2)
And
(2+3*(2+3)
I want my matches to be:
(2+2)
And
(2+3*(2+3))
How should I modify my regular expression?
You cannot parse parentesized expressions with regular expression.
There is a mathematical proof that regular expressions can't do this.
Parenthesized expressions are a context-free grammar, and can thus be recognized by pushdown automata (stack-machines).
You can, anyway, define a regular expression that will work on any expression with less than N parentheses, with an arbitrary finite N (even though the expression will get complex).
You just need to acknowledge that your parentheses might contain another arbitrary number of parenteses.
\(([^()]+(\([^)]+\)[^)]*)*)\)
It works like this:
\(([^()]+ matches an open parenthesis, follwed by whatever is not a parenthesis;
(\([^)]+\)[^)]*)* optionally, there may be another group, formed by an open parenthesis, with something inside it, followed by a matching closing parenthesis. Some other non-parenthesis character may follow. This can be repeated an arbitrary amount of times. Anyway, at last, there must be
)\) another closed parenthesis, which matches with the first one.
This should work for nesting depth 2. If you want nesting depth 3, you have to further recurse, allowing each of the groups I described at point (2) to have a nested parenthesized group.
Things will get much easier if you use a stack. Such as:
foundMatches = [];
mStack = [];
start = RegExp("\\(");
mid = RegExp("[^()]*[()]?");
idx = 0;
while ((idx = input.search(start.substr(idx))) != -1) {
mStack.push(idx);
//Start a search
nidx = input.substr(idx + 1).search(mid);
while (nidx != -1 && idx + nidx < input.length) {
idx += nidx;
match = input.substr(idx).match(mid);
match = match[0].substr(-1);
if (match == "(") {
mStack.push(idx);
} else if (mStack.length == 1) {
break;
}
nidx = input.substr(idx + 1).search(mid);
}
//Check the result
if (nidx != -1 && idx + nidx < input.length) {
//idx+nidx is the index of the last ")"
idx += nidx;
//The stack contains the index of the first "("
startIdx = mStack.pop();
foundMatches.push(input.substr(startIdx, idx + 1 - startIdx));
}
idx += 1;
}
How about you parse it yourself using a loop without the help of regex?
Here is one simple way:
You would have to have a variable, say "level", which keeps track of how many open parentheses you have come across so far (initialize it with a 0).
You would also need a string buffer to contain each of your matches ( e.g. (2+2) or (2+3 * (2+3)) ) .
Finally, you would need somewhere you can dump the contents of your buffer into whenever you finish reading a match.
As you read the string character by character, you would increment level by 1 when you come across "(", and decrement by 1 when you come across ")". You would then put the character into the buffer.
When you come across ")" AND the level happens to hit 0, that is when you know you have a match. This is when you would dump the contents of the buffer and continue.
This method assumes that whenever you have a "(" there will always be a corresponding ")" in the input string. This method will handle arbitrary number of parentheses.

Regex to calculate straight poker hand - Using ASCII CODE

In another question I learned how to calculate straight poker hand using regex (here).
Now, by curiosity, the question is: can I use regex to calculate the same thing, using ASCII CODE?
Something like:
regex: [C][C+1][C+2][C+3][C+4], being C the ASCII CODE (or like this)
Matches: 45678, 23456
Doesn't matches: 45679 or 23459 (not in sequence)
Your main problem is really going to be that you're not using ASCII-consecutive encodings for your hands, you're using numerics for non-face cards, and non-consecutive, non-ordered characters for face cards.
You need to detect, at the start of the strings, 2345A, 23456, 34567, ..., 6789T, 789TJ, 89TJQ, 9TJQK and TJQKA.
These are not consecutive ASCII codes and, even if they were, you would run into problems since both A2345 and TJQKA are valid and you won't get A being both less than and greater than the other characters in the same character set.
If it has to be done by a regex, then the following regex segment:
(2345A|23456|34567|45678|56789|6789T|789TJ|89TJQ|9TJQK|TJQKA)
is probably the easiest and most readable one you'll get.
There is no regex that will do what you want as the other answers have pointed out, but you did say that you want to learn regex, so here's another meta-regex approach that may be instructional.
Here's a Java snippet that, given a string, programmatically generate the pattern that will match any substring of that string of length 5.
String seq = "ABCDEFGHIJKLMNOP";
System.out.printf("^(%s)$",
seq.replaceAll(
"(?=(.{5}).).",
"$1|"
)
);
The output is (as seen on ideone.com):
^(ABCDE|BCDEF|CDEFG|DEFGH|EFGHI|FGHIJ|GHIJK|HIJKL|IJKLM|JKLMN|KLMNO|LMNOP)$
You can use this to conveniently generate the regex pattern to match straight poker hands, by initializing seq as appropriate.
How it works
. metacharacter matches "any" character (line separators may be an exception depending on the mode we're in).
The {5} is an exact repetition specifier. .{5} matches exactly 5 ..
(?=…) is positive lookahead; it asserts that a given pattern can be matched, but since it's only an assertion, it doesn't actually make (i.e. consume) the match from the input string.
Simply (…) is a capturing group. It creates a backreference that you can use perhaps later in the pattern, or in substitutions, or however you see fit.
The pattern is repeated here for convenience:
match one char
at a time
|
(?=(.{5}).).
\_________/
must be able to see 6 chars ahead
(capture the first 5)
The pattern works by matching one character . at a time. Before that character is matched, however, we assert (?=…) that we can see a total of 6 characters ahead (.{5})., capturing (…) into group 1 the first .{5}. For every such match, we replace with $1|, that is, whatever was captured by group 1, followed by the alternation metacharacter.
Let's consider what happens when we apply this to a shorter String seq = "ABCDEFG";. The ↑ denotes our current position.
=== INPUT === === OUTPUT ===
A B C D E F G ABCDE|BCDEFG
↑
We can assert (?=(.{5}).), matching ABCDEF
in the lookahead. ABCDE is captured.
We now match A, and replace with ABCDE|
A B C D E F G ABCDE|BCDEF|CDEFG
↑
We can assert (?=(.{5}).), matching BCDEFG
in the lookahead. BCDEF is captured.
We now match B, and replace with BCDEF|
A B C D E F G ABCDE|BCDEF|CDEFG
↑
Can't assert (?=(.{5}).), skip forward
A B C D E F G ABCDE|BCDEF|CDEFG
↑
Can't assert (?=(.{5}).), skip forward
A B C D E F G ABCDE|BCDEF|CDEFG
↑
Can't assert (?=(.{5}).), skip forward
:
:
A B C D E F G ABCDE|BCDEF|CDEFG
↑
Can't assert (?=(.{5}).), and we are at
the end of the string, so we're done.
So we get ABCDE|BCDEF|CDEFG, which are all the substrings of length 5 of seq.
References
regular-expressions.info/Dot, Repetition, Grouping, Lookaround
Something like regex: [C][C+1][C+2][C+3][C+4], being C the ASCII CODE (or like this)
You can not do anything remotely close to this in most regex flavors. This is simply not the kinds of patterns that regex is designed for.
There is no mainstream regex pattern that will succintly match any two consecutive characters that differ by x in their ASCII encoding.
For instructional purposes...
Here you go (see also on ideone.com):
String alpha = "ABCDEFGHIJKLMN";
String p = alpha.replaceAll(".(?=(.))", "$0(?=$1|\\$)|") + "$";
System.out.println(p);
// A(?=B|$)|B(?=C|$)|C(?=D|$)|D(?=E|$)|E(?=F|$)|F(?=G|$)|G(?=H|$)|
// H(?=I|$)|I(?=J|$)|J(?=K|$)|K(?=L|$)|L(?=M|$)|M(?=N|$)|N$
String p5 = String.format("(?:%s){5}", p);
String[] tests = {
"ABCDE", // true
"JKLMN", // true
"AAAAA", // false
"ABCDEFGH", // false
"ABCD", // false
"ACEGI", // false
"FGHIJ", // true
};
for (String test : tests) {
System.out.printf("[%s] : %s%n",
test,
test.matches(p5)
);
}
This uses meta-regexing technique to generate a pattern. That pattern ensures that each character is followed by the right character (or the end of the string), using lookahead. That pattern is then meta-regexed to be matched repeatedly 5 times.
You can substitute alpha with your poker sequence as necessary.
Note that this is an ABSOLUTELY IMPRACTICAL solution. It's much more readable to e.g. just check if alpha.contains(test) && (test.length() == 5).
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
SOLVED!
See in http://jsfiddle.net/g48K9/3
I solved using closure, in js.
String.prototype.isSequence = function () {
If (this == "A2345") return true; // an exception
return this.replace(/(\w)(\w)(\w)(\w)(\w)/, function (a, g1, g2, g3, g4, g5) {
return code(g1) == code(g2) -1 &&
code(g2) == code(g3) -1 &&
code(g3) == code(g4) -1 &&
code(g4) == code(g5) -1;
})
};
function code(card){
switch(card){
case "T": return 58;
case "J": return 59;
case "Q": return 60;
case "K": return 61;
case "A": return 62;
default: return card.charCodeAt();
}
}
test("23456");
test("23444");
test("789TJ");
test("TJQKA");
test("8JQKA");
function test(cards) {
alert("cards " + cards + ": " + cards.isSequence())
}
Just to clarify, ascii codes:
ASCII CODES:
2 = 50
3 = 51
4 = 52
5 = 53
6 = 54
7 = 55
8 = 56
9 = 57
T = 84 -> 58
J = 74 -> 59
Q = 81 -> 60
K = 75 -> 61
A = 65 -> 62