Regex for matching predefined rules for italic text formatting

Regex for matching predefined rules for italic text formatting - regex

I'm trying to write a regex for matching user input that will be turned into italic format using markdown.
In the string i need to find the following pattern: an asterisk followed by any kind of non-whitespace character and ending with any kind of non-whitespace character followed by an asterisk.
So basically: substring *substring substring substring* substring should spit out *substring substring substring*.
So far I came up only with /\*(?:(?!\*).)+\*/, which matches everything between two asterisks, but it doesn't take into consideration whether the substring between asterisks starts or end with whitespace - which it shouldn't.
Thank you for your input! :)

Use
\*(?![*\s])(?:[^*]*[^*\s])?\*
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[*\s] any character of: '*', whitespace (\n,
\r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^*]* any character except: '*' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[^*\s] any character except: '*', whitespace
(\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\* '*'

Related

Atom regexp: discarding multiline text around blocks

Assume I have this text:
blah blah Bob Loblaw Law blah
keep1 { i want this } blop
blah blob keep2 { and
this too } blaw blat
etc...
And I want to end up with
keep1 { i want this }
keep2 { and
this too }
or perhaps:
keep1 { i want this }
keep2 { and this too }
I haven't figured out how to get Atom's regexp find/replace mechanism to discard everything across multiple lines outside of a specific matching string. Hints?
update:
Of the many things I've tried, this gets me closest:
[\S\s]+?(keep\d\s+\{[\S\s]+?\})
which results in:
keep1 { i want this }
keep2 { and
this too }
blaw blat
etc...
This is probably good enough -- I can edit the trailing shards -- but it would be useful to know how to trim those as well.

You may use this simple regex replace in Atom for this task:
\b(keep\d+\s*{[^}]*})|.+?
Replace it with: $1
RegEx Demo
RegEx Details:
\b: Word boundary
(keep\d+\s*{[^}]*}): In capture group #1 match a string that starts with keep followed by 1+ digits followed by 0+ whitespaces followed by any text that is inside {...} spanning across the lines as well. This assumes { and } are balanced and there is no escaping of { and }.
|: OR
.+?: Lazily match 1+ of anything
PS: If you want to remove leading line break then use:
\n?\b(keep\d+\s*{[^}]*})|.+?
Atom Editor Demo
Before replacement:
After replacement:

Use
[\s\S]*?(keep\d\s+\{[^{}]*\})|(?:(?!keep\d\s+\{[^{}]*\})[\s\S])+$
See proof.
EXPLANATION
--------------------------------------------------------------------------------
[\s\S]*? any character of: whitespace (\n, \r, \t,
\f, and " "), non-whitespace (all but \n,
\r, \t, \f, and " ") (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
keep 'keep'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\} '}'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
keep 'keep'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\} '}'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[\s\S] any character of: whitespace (\n, \r,
\t, \f, and " "), non-whitespace (all
but \n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
)+ end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Regex match specific word with dollar sign?

I have sentences like:
$COIN has a new price target increase to $400
I only want to match $COIN with regex, I am wondering how to do this?
If I do something like .*\\$.* it also matches $400. I would just like to match the $SOMEWORDNOSPACE only. Is that possible?
Thanks

If everything after $ until the end of the word is a capital letter: \$[A-Z]+
This will match the $ (\$), and then match between 1 and infinity capital letters [A-Z]+. The match stops when a character doesn't fit in the A-Z range, so \b is unnecessary. If the match can't start in the middle of the sentence you could start with \B so it starts matching on a switch of a word character to the dollar sign, in that case the regex would be \B\$[A-Z]+

Use
(?<!\S)\$[A-Z]+(?!\S)
See proof
EXPLANATION
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\$ '$'
--------------------------------------------------------------------------------
[A-Z]+ any character of: 'A' to 'Z' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-ahead

find any last character in line except empty lines

I need to find any last character in line and then add extra text (for example "word"), but without applying this rule to empty lines (no characters in line).
My expression like ([^\n])$ works for empty lines, too.

Use
(\S[^\S\n]*)
Replace with $1 word. See proof.
Explanation
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
[^\S\n]* any character except: non-whitespace
(all but \n, \r, \t, \f, and " "), '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Replace all occurrences except for the 3rd

I am scanning a QR code and need a script to replace the commas with a ( \t)
My results are:
820-20171-002, ,Nov 24, 2020,,,13,283.40,,Mike Shmow
My problem is - I don't want a comma after the date. Right now I have the following - which does work to replace commas with a tab.
decodeResults[0].content.replace(/,/g, "\t");
I am trying to replace the /,/g with an expression to replace all commas except for the 3rd occurrence.

Use
.replace(/(?<!\b[a-zA-Z]{3}\s+\d{1,2}(?=,\s*\d{4})),/g, '\t')
See proof
Explanation
--------------------------------------------------------------------------------
(?<! Negative lookbehind start, fail if pattern matches
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
[a-zA-Z]{3} any character of: 'a' to 'z', 'A' to 'Z'
(3 times)
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d{1,2} digits (0-9) (between 1 and 2 times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\d{4} digits (0-9) (4 times)
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of negative lookbehind
--------------------------------------------------------------------------------
, ','

Need help stopping at parenthesis

I'm new with regex and really need some help here. I'm trying to create a regex that finds everything before the first space and open parenthesis in the example below. Basically, I'm trying to just keep the country name or names and exclude everything after.
Falkland Islands (Malvinas)
I tried this but it isn't working:
(\w+)(?=[\s+(\w+\s+])

Use
^.*?(?=\s*\()
See proof here.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
) end of look-ahead

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex for matching predefined rules for italic text formatting - regex

Related

Atom regexp: discarding multiline text around blocks

Regex match specific word with dollar sign?

find any last character in line except empty lines

Replace all occurrences except for the 3rd

Need help stopping at parenthesis

Categories

Resources