I need to find any last character in line and then add extra text (for example "word"), but without applying this rule to empty lines (no characters in line).
My expression like ([^\n])$ works for empty lines, too.
Use
(\S[^\S\n]*)
Replace with $1 word. See proof.
Explanation
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
[^\S\n]* any character except: non-whitespace
(all but \n, \r, \t, \f, and " "), '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Related
Assume I have this text:
blah blah Bob Loblaw Law blah
keep1 { i want this } blop
blah blob keep2 { and
this too } blaw blat
etc...
And I want to end up with
keep1 { i want this }
keep2 { and
this too }
or perhaps:
keep1 { i want this }
keep2 { and this too }
I haven't figured out how to get Atom's regexp find/replace mechanism to discard everything across multiple lines outside of a specific matching string. Hints?
update:
Of the many things I've tried, this gets me closest:
[\S\s]+?(keep\d\s+\{[\S\s]+?\})
which results in:
keep1 { i want this }
keep2 { and
this too }
blaw blat
etc...
This is probably good enough -- I can edit the trailing shards -- but it would be useful to know how to trim those as well.
You may use this simple regex replace in Atom for this task:
\b(keep\d+\s*{[^}]*})|.+?
Replace it with: $1
RegEx Demo
RegEx Details:
\b: Word boundary
(keep\d+\s*{[^}]*}): In capture group #1 match a string that starts with keep followed by 1+ digits followed by 0+ whitespaces followed by any text that is inside {...} spanning across the lines as well. This assumes { and } are balanced and there is no escaping of { and }.
|: OR
.+?: Lazily match 1+ of anything
PS: If you want to remove leading line break then use:
\n?\b(keep\d+\s*{[^}]*})|.+?
Atom Editor Demo
Before replacement:
After replacement:
Use
[\s\S]*?(keep\d\s+\{[^{}]*\})|(?:(?!keep\d\s+\{[^{}]*\})[\s\S])+$
See proof.
EXPLANATION
--------------------------------------------------------------------------------
[\s\S]*? any character of: whitespace (\n, \r, \t,
\f, and " "), non-whitespace (all but \n,
\r, \t, \f, and " ") (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
keep 'keep'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\} '}'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
keep 'keep'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\} '}'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[\s\S] any character of: whitespace (\n, \r,
\t, \f, and " "), non-whitespace (all
but \n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
)+ end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
I'm trying to write a regex for matching user input that will be turned into italic format using markdown.
In the string i need to find the following pattern: an asterisk followed by any kind of non-whitespace character and ending with any kind of non-whitespace character followed by an asterisk.
So basically: substring *substring substring substring* substring should spit out *substring substring substring*.
So far I came up only with /\*(?:(?!\*).)+\*/, which matches everything between two asterisks, but it doesn't take into consideration whether the substring between asterisks starts or end with whitespace - which it shouldn't.
Thank you for your input! :)
Use
\*(?![*\s])(?:[^*]*[^*\s])?\*
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[*\s] any character of: '*', whitespace (\n,
\r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^*]* any character except: '*' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[^*\s] any character except: '*', whitespace
(\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\* '*'
I have a text file with hundreds of lines. Each line contain the below information:
software.cisco.com , Added by IT, ZZ 6584
What I am trying to do is insert carriage return where the first comma is. I'm able to do this with search/replace and using the /n expression. Problem is it inserts carriage return twice leaving me with 3 lines. I am trying to insert carriage return at first comma only and keep rest of line.
Before:
software.cisco.com , Added by IT, ZZ 6584
After:
software.cisco.com
#Added by IT, ZZ 6584
Use
^(.*?),\s*
Replacement: $1\n#.
See proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
Correct code:
"key1=val1;key2=val2;key3=val3" -- Correct as each pair is having ";" at the end except the last pair
Incorrect code:
"key1=val1;key2=val2; key3=val3;" -- Invalid as last pair is having ";" at the end
"key1=val1;;;key2=val2;;;key3=val3" -- Invalid as there are multiple ";" in the middle
I got the regex below from some old link in stackoverflow, but it is not working in the above case:
^(?:\s*\w+\s*=\s*[^;]*;)+$
You might use
^\w+\s*=\s*\w+(?:;\s*\w+\s*=\s*\w+)*$
Explanation
^ Start of string
\w+\s*=\s*\w+ Match 1+ word chars, = and 1+ word chars with optional whitespace chars
(?: Non capture group
;\s*\w+\s*=\s*\w+ Match ; and the same patter as mentioned above
)* Close the group and repeat 0+ times
$ End of string
Regex demo
With the doubled backslashes
^\\w+\\s*=\\s*\\w+(?:;\\s*\\w+\\s*=\\s*\\w+)*$
Also, a shorter one:
^(?:\s*\w+\s*=\s*\w+(?:;(?!\s*$)|\s*$))+\s*$
See proof
Explanation
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end
of the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
)+ end of grouping
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Try below:
.*\w;\w.*\w;\w.*[^;]$
Test here
Explanation:
.* --> matches any character
\w --> matches any word character
[^;]$ --> Will exclude any line ending with ;
I find things like this much easier without regex. For eample with JavaScript:
function isValid(string) {
return string.split(/;/).map(e => e.split(/=/)).every(e => e.length === 2);
}
I have the following Json:
{"field1": "someText",
"field2": "Text Again",
"field3": "Text Again"}
I would need to match the first occurrence of any phrase starting with a capital letter (such as "Text Again", for example)
I have written the following:
("[A-Za-z]+\s[A-Za-z]+")
It does work fine when testing with https://regex101.com/, for instance. However, it does not seem to correctly function as part of the usage of ReplaceTextWithMapping (Apache NiFi). Is the regex incorrect?
Thank you for your help
Description
:\s*"\s*(?=[A-Z])(?![^"]*?\s[a-z])([A-Za-z\s]+)"
This regular expression does the following:
finds the first title case string in value side of what appears to be JSON encoded string
ensures each word is capitalized
returns the value inside the quotes as capture group 1
Example
Live Demo
https://regex101.com/r/eO0xW6/1
Source String
{"field1": "someText",
"field2": "Text again",
"field3": "Text Again"}
First Match
Text Again
Explanation
Summary
:\s*" validates that where only checking the value side of the JSON
\s* matches any spaces after the opening quote if they exist
(?=[A-Z]) ensure the first character in the string is uppercase
(?![^"]*?\s[a-z]) looks for any spaces that are followed by a lower case character. If found then this isn't a match
([A-Za-z\s]+) captures all the characters inside the quote
" matches the quote
Detailed
NODE EXPLANATION
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
[^"]*? any character except: '"' (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[A-Za-z\s]+ any character of: 'A' to 'Z', 'a' to
'z', whitespace (\n, \r, \t, \f, and "
") (1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
I have posted my findings on the issue to the Apache NiFi mailing list:
http://apache-nifi-developer-list.39713.n7.nabble.com/Issues-with-Regex-used-with-ReplaceTextWithMapping-where-am-I-going-wrong-tc10592.html
I have not received any confirmation from the community, but it seems to me that, although the regex [A-Z][A-Za-z]*\s[A-Z][A-Za-z]* is correct in this case, the processor (ReplaceTextWithMapping) does not deal well with blank spaces (\s) and the string contains space between two words.