Splitting a string into a pattern of commands

Splitting a string into a pattern of commands - regex

I need to take a string of concatenated keyword commands and numbers, and put the commands and the numbers into lists.
Pattern:
{command words} by {number} {command words} by {number} etc...
Input string:
"turn right by 1 turn left by 99 up by 11 left by 28"
I thought I might split on the word " by " but that causes the second group to have the number and the next command (eg. 1 turn left).
Regex:
\sby\s
Desired Output:
turn right by 1
turn left by 99
up by 11
left by 28
Desired Lists:
turn right,turn left,up,left
1,99,11,28
How can I split a long string of commands that follow that pattern?
The text is one big long string with no punctuation. The word by is always followed by a number and the pattern is consistent. The first part may contain one or two keyword commands.

Brief
It seems your strings all share the same structure: word or words by 111 (one or more words, followed by by literally, followed by at least one digit)
Code
See regex in use here
(\w[\w ]*?)\s+by\s+(\d+)
Results
Input
turn right by 1 turn left by 99 up by 11 left by 28
Output
Full Match: turn right by 1
Group 1: turn right
Group 2: 1
Full Match: turn left by 99
Group 1: turn left
Group 2: 99
Full Match: up by 11
Group 1: up
Group 2: 11
Full Match: left by 28
Group 1: left
Group 2: 28
Explanation
(\w[\w ]*?) Capture the following into capture group 1
\w[\w ]*? Any word character, followed by anything in the set [\w ] (any word character or space) any number of times, but as few as possible
\s+by\s+ One or more spaces followed by by literally, followed by one or more spaces.
(\d+) Capture one or more digits into capture group 2

Related

Write regex patterns for matching single digit or double digit where tens place value is 2 or 4

Below is my regex for matching 2 digit where tens place value is 2 or 3 and it is working fine.
^(?=[2,4])\d{1,2}$
As soon as I add the regex for matching single digit in above regex , It started matching single digit and as well all 2 digit number.
^(?=\d|[2,4])\d{1,2}$
I want below sample input to be matched.
0
1
2
3
24
44
48
29
28
Below not to be matched.
99
11
33
55
77
Also It will great help if I would get to know why my regex is not working.

You get a difference in matches as the positive lookahead asserts that there must be to the right what you specify. In there first pattern that is either 2 4 or , and in the second case just a single digit.
You don't have a comma in your example data, so in that case you can match an optional 2 or 4 using just [24]? followed by a digit without any lookarounds.
^[24]?\d$
See a regex demo.

Try this: ^(\d|[2,4]\d)$
Test regex here: https://regex101.com/r/aZo7fK/1
^(\d|[2,4]\d)$
^ matches the start of string
(\d|[2,4]\d) matches either a single digit(0-9) or a two digit number which starts with either 2 or 4
$ matches the end of the string
This matches either a single digit(0-9) number or a two digit number which starts with either 2 or 4.

I suggest
^[2,4]?[0-9]$
pattern; where
^ - anchor, start of the text
[2,4]? - optional 2 or 4 digit for tens
[0-9] - mandatory digit 0..9 for units
$ - anchor, end of the text
Edit: Now, let's have a look at your current patterns; the first is
^(?=[2,4])\d{1,2}$
Here
(?=[2,4]) - look ahead for 2 or 4
\d{1,2} - one or two digits
as we can see 3 doesn't match: look ahead fails to find 2 or 4. As for your second attempt
^(?=\d|[2,4])\d{1,2}$
pattern, where
(?=\d|[2,4]) - look ahead for ANY digit (note, that |[2,4] is redundant)
\d{1,2} - one or two digits
the pattern matches too many texts; technically it matches any one or two digit numbers, e.g. for:
79
we have
(?=\d|[2,4]) - look ahead - succeeds with 7
\d{1,2} - one or two digits - succeeds with 79

Exclude a combination of characters with regex or add a letter

I'm trying to adjust KODI's search filter with regex so the scrapers recognize tv shows from their original file names.
They either come in this pattern:
"TV show name S04E01 some extra info" or this "TV show name 01 some extra info"
The first is not recognized, because "S04" scrambles the search in a number of ways, this needs to go.
The second is not recognized, because it needs an 'e' before numbers, otherwise, it won't be recognized as an episode number.
So I see two approaches.
Make the filter ignore s01-99
prepend an 'e' any freestanding two-digit numbers, but I worry if regex can even do that.
I have no experience in the regex, but I've been playing around coming up with this, which unsurprisingly doesn't do the trick
^(?!s{00,99})\d{2}$

You may either find \b([0-9]{2})\b regex matches and replace with E$1, or match \bs(0[1-9]|[1-9][0-9])\b pattern in an ignore filter.
Details
\b([0-9]{2})\b - matches and captures into Group 1 any two digits that are not enclosed with letters, digits and _. The E$1 replacement means that the matched text (two digits) is replaced with itself (since $1 refers to the Group 1 value) with E prepended to the value.
\bs(0[1-9]|[1-9][0-9])\b - matches an s followed with number between 01 and 99 because (0[1-9]|[1-9][0-9]) is a capturing group matching either 0 and then any digit from 1 to 9 ([1-9]), or (|) any digit from 1 to 9 ([1-9]) and then any digit ([0-9]).
NOTE: If you need to generate a number range regex, you may use this JSFiddle of mine.

Regular Expression that find string that dont start with " with before numbers

i have a string like this.
1 2 3 4 5 "Test test"
1 2 3 4 5 Test test"
I need to find the second string, that dont start with " and before have the numbers.
I read many topics of stack overflow but i dont find the answer for me.
Reg exp have to work on visual studio code for a txt.
Thanks so much for your help
I tried:
^(?![0-9]+\t[0-9]+\t[0-9]+\t[0-9]+\t[0-9]+")
but it didn't work.

I've made the following assumptions about what is required.
the string must begin with one one or more instances of one or more digits followed by 1 or more spaces; and
the last instance of one or more digits followed by one or more spaces must be followed by a character that is not a digit, space or double quote.
That can be tested by the following regular expression.
^(?:\d+ +)+[^"\d ].*$
Demo
As shown a the link, this regular expression matches the last three strings below, but not the first three.
1 2 3 4 5 "Test test
11 22 33 44 "Test test"
11 22 33 44 The test"
1 2 3 4 5 Test test"
1 2 3 4 5 The "Test test"
11 22 33 44 The "Test test"

It can be tricky to match on what isn't there, because everything that doesn't match a pattern is a match for the negation of that pattern.
You are looking for runs of digits followed by runs of whitespace, and this sequence itself repeats
(\d+\s+)+
You want the above to be followed by anything .* that doesn't start with a digit, whitespace or the double-quote character [^\d\s"].
([^\d\s"])
Put it together
(\d+\s+)+([^\d\s"].*)
You can also make groups non-capturing. This has no logical effect but is more efficient of memory because it doesn't store the resolved groups as it searches the potential parse tree. This can be significant on large documents, especially when backreference cause deep recursion.
(?:\d+\s+)+(?:[^\d\s"].*)

You're very close. You need to change the outer [] to (). You also need to put .* after the negative lookahead to match the rest of the line when the lookahead fails.
And you don't have tabs between the numbers, you have spaces, so \t should be \s.
^(?![0-9]+\s[0-9]+\s[0-9]+\s[0-9]+\s[0-9]+\s+").*
DEMO

How would I find values in a file, but only on lines that don't start with #?

I've got a document that looks something like this:
# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)
In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.
I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.
I've read over answers such as
Regular Expressions: Is there an AND operator?
Regex: Find a character anywhere in a document but only on lines that begin with a specific word
and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*
So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?

Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times.
This would for example also match t test
What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)
Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.
(?:^#.*$|(\d+))
Details
(?: Non capturing group
^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
| Or
(\d+) capture one or more digits in a group
) Close non capturing group

I think a way simpler method would be to replace the lines with "" first with this regex:
^#.*
And then you can just match all the numbers with this:
-?\d+ (-? is for negative)

How can I replace this expression in chain regex (notepad++)?

i have this text
14 two 25 three 12 four 40 five 10
I want to obtain "14 two 14 25 three 14 25 12 four 14 25 12 40 five 14 25 12 40 10"
For example, when I replace (14 two ) for (14 two 14 ) this start after of 14 I can't start it after two.
Is there any other alternative to do?
For example using a group that is not included in match ( a group before match ) for replace it ?
please help me

This should do the trick for you:
Regex: ((?:\s?\d+\s?)+)((?:[a-zA-Z](?![^a-zA-Z]+\1))+)
Replacement: $1$2 $1
You will need to click on the "replace all" button for this to work (it cannot be done in one shot, it has to be repeated as long as it can find match. Online PHP example)
Explanation:
\s: Match a single space character
?: the previous expression must be matched 0 or 1 time.
\s?: Match a space character 0 or 1 time.
\d: Match a digit character (the equivalent of [0-9]).
+: The previous expression must be matched at least one time (u to infinite).
\d+: Match as much digit characters as you (but at least one time).
(): Capture group
(?:): Non-capturing group
((?:\s?\d+\s?)+): Match an optional space character followed by one or more digit characters followed by an optional space character. The expression is surrounded by a non-capturing group followed by a plus. That mean that the regex will try to match as much combination of space and digit character as it can (so you can end up with something like '14 25 12 40').
The capture group is meant to keep the value to reuse it in the replacement.You cannot simply add the plus at the end of the capture group without the non-capturing group within because it would only remember the last digits capture ('12' instead of the whole '14 25 12' use to build '14 25 12 40').
[a-zA-Z]: Match any English letters in any case (lower, upper).
\1: reference to what have been capture in the first group.
(?!): Negative lookahead.
[^]: Negative character class, so [^a-zA-Z] means match anything
((?:[a-zA-Z](?![^a-zA-Z]+\1))+): The negative lookahead is meant to make sure that we don't always end up matching the first "14 two" in the input text. Without it, we would end up in an infinite loop giving results as "14 two 14 14 14 14 14 14 25 three 12 four 40 five 10" (the "14" before "25" being repeated until you reach the timeout).
Basically, for every English letter we match, we lookahead to assert that the content of the first capture group (by example "14") is not present in our digit sequence.
For the replacement, $1$2 $1 means put the content of the capture group 1 and 2, add a space and put the content of the capture group 1 once more.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Splitting a string into a pattern of commands - regex

Related

Write regex patterns for matching single digit or double digit where tens place value is 2 or 4

Exclude a combination of characters with regex or add a letter

Regular Expression that find string that dont start with " with before numbers

How would I find values in a file, but only on lines that don't start with #?

How can I replace this expression in chain regex (notepad++)?

Categories

Resources