regular expression to find in-between content - regex

I am trying to find the content between %%EndPageSetup and LH(%%[Page: 1]%%) = using regular expression. I tried various patterns but not getting the correct output. Can someone please help me on this?
%EndPageSetup
/DeviceGray dup setcolorspace
/colspABC exch def ‹ … scol
… „A VM? Pscript_WinNT_Incr begin
%%BeginResource: file Pscript_T42Hdr
5.0 0 /asc42 0.0 d/sF42{/asc42 ~ d Ji}bind d/bS42{0 asc42 -M}bind
d/eS42{0 asc42 neg
-M}b/Is2015?{version cvi 2015 ge}bind d/AllocGlyphStorage{Is2015?{!}{{string}
forall}?}bind d/Type42DictBegin{25
dict /FontName ~ d/Encoding ~ d 4
array astore cvx/FontBBox ~
d/PaintType 0 d/FontType 42
d/FontMatrix[1 0 0 1 0 0]d
/CharStrings 256 dict/.notdef 0 d &
E d/sfnts}bind d/Type42DictEnd{& #
/FontName get ~ definefont ! E}bind
d/RDS{string currentfile ~ readstring
!} executeonly
d/PrepFor2015{Is2015?{/GlyphDirectory
16 dict d sfnts 0 get # 2 ^
(glyx)putinterval 2 ^(locx)putinterval
! !}{! !}?}bind d/AddT42Char{Is2015?
{findfont/GlyphDirectory get ` d E !
!}{findfont/sfnts get 4 ^ get 3 ^ 2 ^
LH(%%[Page: 1]%%) =
Thanks.

this may work
/EndPageSetup(.*?)LH\((?:.*?)\[Page: 1\](?:.*?)\) =/

This works with your examples
%%EndPageSetup(.*?)\(%%\[.*?Page.*?\]%%\) =
See it here online on Regexr
make sure to activate the s (dotall) modifier, so that is possible to match newline characters with the ..
Your result is then in capture group 1.
How to activate the modifier and how to get the result depends on your language.

This should work:
(?:%%EndPageSetup)(.*\n)*(?=LH\(%%\[Page: 1\]%%\) =)
Explanation
The 3rd capture group (?=LH\(%%\[Page: 1\]%%\) =) uses a positive lookahead, so you can match that group without including it in the result.
The 2nd capture group (.*\n) matches all characters including line breaks. Using *, you can match 0 or more of the preceding token/group.
The first non-capturing group matches (?:%%EndPageSetup)and omits it from the result.
Note
You can use lookbehinds too, but JavaScript doesn't support them.

Related

Convert a regex expression to erlang's re syntax?

I am having hard time trying to convert the following regular expression into an erlang syntax.
What I have is a test string like this:
1,2 ==> 3 #SUP: 1 #CONF: 1.0
And the regex that I created with regex101 is this (see below):
([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)
:
But I am getting weird match results if I convert it to erlang - here is my attempt:
{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
Also, I get more than four matches. What am I doing wrong?
Here is the regex101 version:
https://regex101.com/r/xJ9fP2/1
I don't know much about erlang, but I will try to explain. With your regex
>{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
>re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
{match,[{0, 28},{0,3},{8,1},{16,1},{25,3}]}
^^ ^^
|| ||
|| Total number of matched characters from starting index
Starting index of match
Reason for more than four groups
First match always indicates the entire string that is matched by the complete regex and rest here are the four captured groups you want. So there are total 5 groups.
([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)
<-------> <----> <---> <--------->
First group Second group Third group Fourth group
<----------------------------------------------------------------->
This regex matches entire string and is first match you are getting
(Zero'th group)
How to find desired answer
Here we want anything except the first group (which is entire match by regex). So we can use all_but_first to avoid the first group
> re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M, [{capture, all_but_first, list}]).
{match,["1,2","3","1","1.0"]}
More info can be found here
If you are in doubt what is content of the string, you can print it and check out:
1> RE = "([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)".
"([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)"
2> io:format("RE: /~s/~n", [RE]).
RE: /([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)/
For the rest of issue, there is great answer by rock321987.

Visual Basic - RegEx - Overall Length Check regardless the number of matches

I have the following problem :
This is my RegEx-Pattern :
\d*[a-z A-Z][a-zA-Z0-9 _?!()\/\\]*
It allows anything but numbers that stand alone like : 1 , 11 , 111 or so on.
My question : How can I set the overall Length of the input regardless of the matches ?
i tried it with several options like {1,30} before each match and i put the regex in a group with ( ) and then {1,30} but it still doesnt work.
If anyone could help me i would appreciate it :).
Allowed string:
Group1
Group 1
1Group
Group!?()\/
Group !()\?!
a1 a1 a1 a1
Not Allowed:
1
11
And so on. {1,30} after a match restricts the number of how many times i can input the match. What i want to know is: How can i set the maximum length of my above RegEx, like after 30 chars the input is reached regardless of the matches?
In order to disallow a numeric string input only, you can use a negative look-ahead (?!\d+$) and to set a limit to the input, use a limiting quantifier {1,30}:
(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}
See demo
Note that if you plan to match whole strings, you'd need anchors: ^ at the beginning will anchor the regex to the beginning of string, and $ will anchor at the end.
^(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}$
See another demo

Select text in regex between 2 strings

I have the following line :
3EAM7A 1 3 EI AMANDINE MRV SHP 70 W 0 SH3-A1 1 SHP 70W OVOIDE AI E27 SON PIA PLUS
I'd like to get the string : EI AMANDINE MRV SHP 70 W. So I decided to select the strings between 1 (can also be 2, 3 or 99) and 0 (can also be 1, 2, 3, 4 or 5).
I tried :
(0|1|2|3|99)(.*)(0|1|2|3|4|5)
But I have this result :
EAM7A 1 3 EI AMANDINE MRV SHP 70 W 0 SH3-A1 1 SHP 70W OVOIDE AI E
that is not what I want to obtain.
Do you have an idea in regex to make that selection work ?
Thanks !
You were pretty close! Try this:
\b(?:0|1|2|3|99) ([^0|1|2|3|99].*?) (?:0|1|2|3|4|5)\b
Regex101
I think that you want to match "word" 4 to 9?
Your desired match will be in group 1
^(\S+\s){3}((\S+\s){6})
Enable the multiline option if you have a whole file of subject strings.
You can try with:
\s(?:[0-3]|99)\s([A-Z].*?)\b(?:[0-5])\b
DEMO
and get string by group $1. Or if your language support look around, try:
(?<=\s[0-3]\s|99)[A-Z].+?(?=\s[0-5]\s)
DEMO
to get match directly.
Another solution that is based on matching all initial space + digit sequences:
\b(?:(?:[0-3]|99)\b\s*)+(.*?)\s*\b(?:[0-5])\b
See demo
The result is in Group 1.
With \b(?:(?:[0-3]|99)\b\s*)+ the rightmost number from the allowed leading set is picked.
You can use following regex :
(?:(?:[0-3]|99)\s)+(.*?)\s(?:[0-5])\s
See demo https://regex101.com/r/iX6oE1/6
Also note that for matching a range of number you can use a character class instead of multiple OR.

Regex - capture all repeated iteration

I have a variable like this
var = "!123abcabc123!"
i'm trying to capture all the '123' and 'abc' in this var.
this regex (abc|123) retrieve what i want but...
My question is: when i try this regex !(abc|123)*! it retrieve only the last iteration. what will i do to get this output
MATCH 1
1. [1-4] `123`
MATCH 2
1. [4-7] `abc`
MATCH 3
1. [7-10] `abc`
MATCH 4
1. [10-13] `123`
https://regex101.com/r/mD4vM8/3
Thank you!!
If your language supports \G then you may free to use this.
(?:!|\G(?!^))\K(abc|123)(?=(?:abc|123)*!)
DEMO

regular expression to match -100 to 0

I need a regular expression to match all numbers inclusive between -100 and 0.
So valid values are:
0
-100
-40
Invalid are:
1
100
40
Thank you!
Use this function:
/^(?:0|-100|-[1-9]\d?)$/
OK, so I'm late, but here goes:
(?: # Either match:
- # a minus sign, followed by
(?: # either...
100 # 100
| # or
[1-9]\d? # a number between 1 and 99
)
| # or...
(?<!-) # (unless preceded by a minus sign)
\b0 # the number 0 on its own
)
\b # and make sure that the number ends here.
(?!\.) # except in a decimal dot.
This will find negative integer numbers (-100 to -1) and 0 in normal text. No leading zeroes allowed.
If you already have the number isolated, then
^(?:-(?:100|[1-9]\d?)|0)$
is enough if you don't want to allow leading zeroes or -0.
If you don't care about leading zeroes or -0, then use
^-?0*(?:100|\d\d?)$
...Now what do you do if your boss tells you "Oh, by the way, from tomorrow on, we need to allow values between -184.78 and 33.53"?
How about using a capture group and then programmatically testing the value e.g.
(-?\p{Digit}{1,3})
and then testing the captured value to ensure that it is within your range?
Try ^(-[1-9][0-9]?|-100|0)$
But perhaps it would be simpled to cast it to numeric and quickly check the range then
I'm new to regular expressions would this work?
(-100|((-[1-9]?[0-9])|\b0))