Regex for non-continuous bit 1 segments - regex

Given a series of bits, e.g 10100011110010101 and 010001001010100.
How can I write a regular expression to reflect the patterns where no 1s are neighbors?
For example, 010001001010100 is a good one, but 10100011110010101 is not as there are 1s neighboring.
edit:
Sorry my above statement may be misleading. i don't want to check whether a sequence is non-1-neighboring. I want to use an regex to find all non-1-neighborings.
Assume I have a very lone series of bits, I want to write a regex to find out all sub sequences where no 1s are neighbors.

Simply "11" will return true if neighboring ones exist.
Regards

You can try with following regex:
^0*(10+)*1?$

The following regex will match any run of zeros where any embedded ones are not adjacent to another one. A leading or trailing one at beginning/end of string is accepted as well.
(^1)?0+(10+)*(1$)?
A test with your example strings yields:
bash$ grep -Eo '(^1)?0+(10+)*(1$)?' <<<10100011110010101
101000
0010101
bash$ grep -Eo '(^1)?0+(10+)*(1$)?' <<<010001001010100
010001001010100

Search for 11+, i.e. a 1 followed by at least one 1.

You can use this, if your regex flavor supports lookarounds:
(?<!1)1(?!1)
(?<!1): not preceded by 1
(?!1): not followed by 1
If your regex flavor doesn't support lookarounds, you can use a capturing group and 2 non capturing groups instead of:
(?:^|0)(1)(?:0|$)
(Note that the capturing group is usefull only if you want to catch the offset of the capture with an appropriate function)

Related

How to match string in between two words, but only the "closet" of the two words?

I am new to regex, and am trying to capture a certain pattern. There are two words (name1 and host), that I want to capture everything in between, the problem is, sometimes "everything" in between might contain 'name1'. And if it does contain 'name1', it includes everything from the previous name1, to the next 'host' word. So I basically have two 'strings' from two different 'name1' being captured.
This is the example I have:
name1{want-this-string}host,name1{want-this-string}host,name1{dont-want-this-string},name1{dont-want-this-either}name1{want-this-string}host
and this is the regex I'm using right now..
(?<=\bname1\b).*?(?=\bhost\b)
My expected output is that it matches the 3 {want-this-string}, and not the {dont-want-this} stuff. so basically:
{want-this-string}{want-this-string}{want-this-string}
But right now its grabbing the first two {want this string} and then this whole section
{dont-want-this-string},name1{dont-want-this-either}name1{want-this-string}
If you have a GNU grep, you may use
grep -oP '\bname1\{\K[^{}]*(?=}host\b)' file
With pcregrep (you may install it on MacOS if you are using that OS), you may use it like
pcregrep -oM '\bname1\{\K[^{}]*(?=}host\b)' file
See the regex demo
Details
\bname1\{ - whole word name1 and a { after
\K - match reset operator discarding the whole match
[^{}]* - 0 or more chars other than { and }
(?=}host\b) - there must be a }host as a whole word immediately to the right of the current location.
See the online grep demo:
s="name1{want-this-string}host,name1{want-this-string}host,name1{dont-want-this-string},name1{dont-want-this-either}name1{want-this-string}host"
grep -oP '\bname1\{\K[^{}]*(?=}host\b)' <<< "$s"
Output:
want-this-string
want-this-string
want-this-string
I'm not quite sure, if this pattern would pass our desired and potential inputs, yet we would similarly start to design an expression based on our cases with a likely left or if necessary right constraints, maybe such as this expression:
(^name1|}name1)({.+?})?|(host,name1)({.+?})(host,name1)
which this part can be much simplified:
(host,name1)({.+?})(host,name1)
and we are adding it here just to exemplify the implementation of a right boundary to only capture the first instance of (host,name1) value.
Demo
RegEx Circuit
jex.im visualizes regular expressions:
RegEx
If this expression wasn't desired and you wish to modify it, please visit this link at regex101.com.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

How to use an RE to match a line of ===== and the line above

I want to match two lines like the following using a Regular Expression:-
abcmnoxyz
=========
The first line is essentially random, the second line will be all the same character of a limited number of possibles (=, - and maybe a couple more). The lines can probably be required to be the same length but it would be nice if they didn't have to be. It would be OK to have multiple REs, one for each possible 'underline' character.
Can anyone come up with a way to do this?
This regex should do what you're trying to do :
regex = "(.*)\n(.)\2{2,}$"
group 1 will give you the line before the repeated linet
Live demo here
EXPLANATION
(.*)\n: match anything followed by a new line
(.)\2{2,} : capture something then check if its followed by same character 2+ more no. of times. You don't need to worry about which character is repeated.
In case you've a set of characters that can be repeated you can put a character set like this : [=-] instead of dot (.)
Use Grep's -B Flag
Matching with Alternation
Given your example, you can use extended regular expressions with alternations and a range operator. The -B flag tells grep how many lines before the match to include in the output.
$ grep -E -B1 '^(={5,}|-{5,})$' sample.txt
abcmnoxyz
=========
You can add alternations for additional characters if you want, although boundary markers ought to be as consistent as you can make them. You can also adjust the minimum number of sequential characters required for a match to suit your needs. I used a five-character range in the example because that's what was posted as the criterion in your original topic sentence, and because a shorter boundary marker is more likely to accidentally match truly random text.
Matching with a Character Class
Also, note that the following does the same job, but is a bit more concise. It uses a character class and a backreference to avoid alternations, which can get messy if you add many more boundary characters. Both versions are equally effective at matching your example.
$ grep -E -B1 '^([=-])\1{4,}$'
abcmnoxyz
========
A regex like this
^([^=\v]+)\v=+$
will do. Check it out at example 1
Explanation:
^([^=\v]+) # 1 or more matches of anything that is not a '=' or vertical space \v
\v=+$ # match a vertical space followed by 1 or more '='
If you want to extend this to more characters like '-' you could do this:
^([^=\-\v]+)\v(-|=)\2+$
Look at example 2
And, thanks to Ashish Ranjan, suppose you wanted to have = and/or - on the first line, use something like this:
^(.+)\v(-|=)\2+$
which would even allow you to have a first line like "=====". Having my doubts if OP had this in mind, though. Look at example 3
Hope this works
^([a-z]{1,})\n([=-]{1,})
\n and \r you have try both based on file format (unix or dos)
\1 will give you first line
\2 will give you second line
If the file contains same pattern over the text, then it might give you lot occurrence.
This answer is irrespective of number of characters in one line.
Ex: Tester

Get last characters up to specific character

Lets say I have a string something-123.
I need to get last 5 (or less) characters of it but only up to - if there is one in the string, so the result would be like thing, but if string has no - in it, like something123 then the result would be ng123, and if string is like 123 then the result would be 123.
I know how to mach last 5 characters:
/.{5}$/
I know how to mach everything up to first -:
/[^-]*/
But I can not figure out how to combine them, and to make things worse I need to get the match without extracting it from specific groups and similar advanced regex stuff because I want to use it in SQL Anywhere, please help.
Tank you all for the help, but looks like a complete regex solution is going to be too complicated for my problem, so I did it very simple: SELECT right(regexp_substr('something-123', '[^-]*'), 4).
One option is to group the result:
(.{4})-
Now you have captured the result but without the -.
Or using lookarounds you can:
.{4}(?=-)
which matches any 4 characters that appears before "-".
You can use:
.{5}(?=(?:-[^-]*)?$)
See the regex demo
We match 5 symbols other than a newline only before the last - in the string or at the very end of the string ((?=(?:-[^-]*)?$)). You only need to collect the matches, no need checking groups/submatches.
UPDATE
To match any 1 to 5 characters other than a hyphen before the first hyphen (if present in the string), you can use
([^-]{1,5})(?:(?:-[^-]*)*)?$
See demo. We rely on a lookahead here, that checks if there are -+non-hyphen sequences are after the expected substring.
An faster alternative:
^[^-]*?([^-]{1,5})(?:-|$)
This regex will search for any characters other than - up to 1 to 5 such characters.
Note that here, the value we need is in Group 1.
How about:
(.{5})(?:-[^-]+)?$
The result is in group 1
Try this regex:
(.{1,5})(?:-.*|$)
Group 1 has the result you need
demo

TextMate: Regex replacing $1 with following 0

I'm trying to fix a file full of 1- and 2-digit numbers to make them all 2 digits long.
The file is of the form:
10,5,2
2,4,5
7,7,12
...
I've managed to match the problem numbers with:
(^|,)(\d)(,|$)
All I want to do now is replace the offending string with:
${1}0$2$3
but TextMate gives me:
10${1}05,2
Any ideas?
Thanks in advance,
Ross
According to this, TextMate supports word boundary anchors, so you could also search for \b\d\b and replace all with 0$0. (Thanks to Peter Boughton for the suggestion!)
This has the advantage of catching all the numbers in one go - your solution will have to be applied at least twice because the regex engine has already consumed the comma before the next number after a successful replace.
Note: Tim's solution is simpler and solves this problem, but I'll leave this here for reference, in case someone has a similar but more complex problem, which using lookarounds can support.
A simpler way than your expression is to replace:
(?<!\d)\d(?!\d)
With:
0$0
Which is "replace all single digits with 0 then itself".
The regex is:
Negative lookbehind to not find a digit (?<!\d)
A single digit: \d
Negative lookahead to not find a digit (?!\d)
Single this is a positional match (not a character match), it caters for both comma and start/end positions.
The $0 part says "entire match" - since the lookbehind/ahead match positions, this will contain the single digit that was matched.
To anyone coming here, as #Amarghosh suggested, it's a bug, or intentional behavior that leads to problems if nothing else.
I just had this problem and had to use the following workaround: If you set up another capture group, and then use a conditional insertion, it will work. For example, I had a string like <WebObject name=Frage01 and wanted to replace the 01 with 02, so I captured the main string in $1 and the end number in $2, which gave me a regex that looked like (<WebObject name=(Frage|Antwort))(01).
Then the replace was $1(?2:02).
The (?2:02) is the conditional insertion, and in this instance will always find something, but it was necessary in order to work around the odd conundrum of appending a number to the end of $n. Hope that helps someone. There is documentation on the conditional insertion here
In TextMate 1.5.11 (1635) ${1} does not work (like the OP described).
I appreciate the many suggestions re altering the query string, however there is a much simpler solution, if you want to break between a capture group and a number: \u.
It is a TextMate specific replacement syntax, that converts the following character to uppercase. As there is no uppercase for numbers, it does nothing and moves on. It is described in the link from Tim Pietzcker's answer.
In my case I had to clean up a csv file, where box measurements were given in cm x cm x mm. Thus I had to add a zero to the first two numbers.
Text: "80 x 40 x 5 mm"
Desired text: "800 x 400 x 5 mm"
Find: (\d+) x (\d+) x (\d+)
Replace: $1\u0 x $2\u0 x $3 mm
Regarding the support of more than 10 capture groups, I do not know if this is a bug. But as OP and #rossmcf wrote, $10 is replaced with null.
You need not ${1} - replace strings support only up to nine groups maximum - so it won't mistake it for $10.
Replace with $10$2$3