Extracting portion of a file within braces using regular expression - regex

I have a file which is of the following form :
some text
some more text
. . .
. . .
data {
1 2 3 5 yes 10
2 3 4 5 no 11
}
some text
some text
I want to extract the data portion of the file using regular expression using the following procedure:
proc ExtractData {fileName} {
set sgd [open $fileName r]
set sgdContents [read $sgd]
regexp "data \\{(?.*)\\}" $sgdContents -> data
puts $data
}
But this is giving the following error:
couldn't compile regular expression pattern: quantifier operand invalid
I am not able figure out what is wrong with regular expression. Any help would be highly appreciated.

Use this regular expression
regexp {data \{(.*)\}} $sgdContents wholematch submatch
puts $submatch
wholematch matches the entire pattern. In your case it is
data {
1 2 3 5 yes 10
2 3 4 5 no 11
}
And submatch matches only the content inside braces like below:
1 2 3 5 yes 10
2 3 4 5 no 11

The following regexp line works
regexp "data \\{\\\n(.*?)\\\n\\s*\\}" $sgdContents -> data
The only major thing wrong with the original regular expression was misplacement of the non-greedy match indicator (?), which directs the regular expression engine to stop matching as soon as first match is found.

Related

Regular expression for boolean expression almost working

My regular expression is:
(?:^ *)?(?:\\(*|())[0-9](?: +(?:AND|OR) +(?:\\(|[0-9]))?(?: *\\)|\\1)
And I am trying to use this as a test string:
1 AND 2 OR (3 AND 4 OR (2 AND 1))
If I replace all matches it finds, I end up with 1 OR (1 OR 1) when the final string should just be 1 (replacing each match with 1).
I think it is the +(?:\(|[0-9])) part. The regex seems to disregard instances of number JOIN (number
I pulled this regex from the second answer on this question
And the comments say it is supposed to find situations of 3 AND (1 etc... but it is not when I use it.
Does anyone know how I might modify this regular expression to properly group a boolean expression?

Regular expression for mobile number error in case 1 1 1 1 1 1 [duplicate]

This question already has answers here:
How to validate phone numbers using regex
(43 answers)
Closed 5 years ago.
I want a regular expression for mobile number validation. I have tried below regular expression. If I enter 1 1 1 1 1 1 it will not be accepted. Will you please help to to find what is my mistake in regular expression.
Regular expression is: ^\s*\+?\s*([0-9][\s-]*){6,}$
You can use regex like this ::::
/^[+]?[(]?[0-9]{3}[)]?[-\s.]?[0-9]{3}[-\s.]?[0-9]{4,6}$/im
Here i makes the expression case-insensitive and m performs multi-line searches
Or
^([0-9]*){6,}$
Check any number with this regex
My suggestion to match "1 1 1 1 1 1" would be:
^\s*([0-9\s-]){6,}$
The error in your regular expression is the multiple repeat at its beginning:
\s*+
Check the result on pythex.

Convert a regex expression to erlang's re syntax?

I am having hard time trying to convert the following regular expression into an erlang syntax.
What I have is a test string like this:
1,2 ==> 3 #SUP: 1 #CONF: 1.0
And the regex that I created with regex101 is this (see below):
([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)
:
But I am getting weird match results if I convert it to erlang - here is my attempt:
{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
Also, I get more than four matches. What am I doing wrong?
Here is the regex101 version:
https://regex101.com/r/xJ9fP2/1
I don't know much about erlang, but I will try to explain. With your regex
>{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
>re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
{match,[{0, 28},{0,3},{8,1},{16,1},{25,3}]}
^^ ^^
|| ||
|| Total number of matched characters from starting index
Starting index of match
Reason for more than four groups
First match always indicates the entire string that is matched by the complete regex and rest here are the four captured groups you want. So there are total 5 groups.
([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)
<-------> <----> <---> <--------->
First group Second group Third group Fourth group
<----------------------------------------------------------------->
This regex matches entire string and is first match you are getting
(Zero'th group)
How to find desired answer
Here we want anything except the first group (which is entire match by regex). So we can use all_but_first to avoid the first group
> re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M, [{capture, all_but_first, list}]).
{match,["1,2","3","1","1.0"]}
More info can be found here
If you are in doubt what is content of the string, you can print it and check out:
1> RE = "([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)".
"([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)"
2> io:format("RE: /~s/~n", [RE]).
RE: /([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)/
For the rest of issue, there is great answer by rock321987.

Change all numbers' tag using RegEx

Here is my simple text file:
1. Text About Question 1
2. Text About Question 2
.
.
20. Text About Question 20
I have 250 text file and all files have only 20 questions and I want to convert these files to xml, add "question" tag beginning of every number, so they will look like:
<question>1. Text About Question 1
<question>2. Text About Question 2
.
.
<question>20. Text About Question 20<question>
I have tried this regex: copy (\d{1}.) replace \1 which just effect between 1 and 9. After 10 it divides number like
1<question>0. Text About Question 10
As a second way, this regex: (\d{2}.) only effect between 10 and 20. So it looks like:
1. Text About Question 1
2. Text About Question 2
.
.
<question>20. Text About Question 20</question>
I couldn't continue with (\d{1}.) because this regex add same tags to number between 10 and 20 and looks like:
<question>1. Text About Question 1 </question>
<question>2. Text About Question 2</question>
.
.
<question><question>20. Text About Question 20</question>
Is there proper way to tag each question from 1 to 20 using regex?
You want to match all numbers between 1 and 20. Here is the regex for that
^[1-9]\.$|^1[0-9]\.$|^20\.$
Breakdown
^ - Start of line
[1-9] - Any digit between 1 and 9. Note 0 is not included
\. - Escape character before a period. Otherwise it will match any character
$ - End of regex
| - Or
^1[0-9]\.$ - Starts with a 1 and is between 10 and 19.
|^20\.$ - Or starts and ends with 20.

Not matching in perl regex

I have a variable and I want it to print success if it doesn't contained specific thing. But its always printing success even if its there.
$mystring = " 1 2 3 4 5 TEST=/my/user/test this/is/test
3 4 5 6 8 NEW=/my/new/offer this/is/offer
3 4 5 2 2 FINAL=/final/test/offer /lets/see/this";
if (($mystring !~ m/1 2 3 4 5 TEST=\/my\/user\/test this\/is\/test/i) or
($mystring !~ m/3 4 5 2 2 FINAL=\/final\/test\/offer \/lets\/see\/this/i))
{
print "success";
}
Its printing success even if the mysstring contains the string. Any help will be appreciated.
Your script is missing a ; at the end of the $mystring declaration. And the second regular expression is unterminated, missing /i at the end.
With those changes your script works fine. It prints "success" if one of the regexes does not match. In your example script, both regexes match, and it does not print "success".
If you mean to print "success" if either regex matches, use =~ instead of !~.