I would like to check whether a given text begins with some currency symbols, like $€£¥. how to achieve that using regex
Depending on your language, but something like ^[\$€£¥].*
[] is a character group matching one of the characters inside.
You might have to write \$ because the $-sign has sometimes special meaning in regexps.
.* matches "everything else" (except a newline).
Edit: After re-reading your question: If you really want to match some currency symbols (maybe more than one), try ^[\$€£¥]+.*
Which regex flavor? If it's one that supports Unicode properties, you can use this:
^\p{Sc}
(I didn't add quotes or regex delimiters because I don't know which flavor you're using.)
Related
I have this text like below:
아니다
bukan
싫다
tidak suka
훌륭하다
bagus
And I am trying to remove the English line(English Alphabets) and attach it to the end of upper line(Korean Alphabets) like this:
아니다bukan
싫다tidak suka
훌륭하다bagus
Now, Finally find almost close regular expression, which is this:
[가-힣]\R
However, It makes the text file like this:
아니bukan
싫tidak suka
훌륭하bagus
The problem is removing the one word of Korean too.
How can I solve this problem?
C++ std::regex does not support Unicode property classes like \p{Hangul}, but you may use the equivalent character class, [\u1100-\u11FF\u302E\u302F\u3131-\u318E\u3200-\u321E\u3260-\u327E\uA960-\uA97C\uAC00-\uD7A3\uD7B0-\uD7C6\uD7CB-\uD7FB\uFFA0-\uFFBE\uFFC2-\uFFC7\uFFCA-\uFFCF\uFFD2-\uFFD7\uFFDA-\uFFDC], see this reference.
Besides, \R is not supported either. You may probably just use \r?\n to match Windows/Linux style line endings, or (?:\r\n?|\n) to also support MacOS line endings.
Next, if you match and consume a Korean char, when replacing, you need to capture it into a capturing group and use a backreference to the group in the replacement pattern.
So, you may use
([\u1100-\u11FF\u302E\u302F\u3131-\u318E\u3200-\u321E\u3260-\u327E\uA960-\uA97C\uAC00-\uD7A3\uD7B0-\uD7C6\uD7CB-\uD7FB\uFFA0-\uFFBE\uFFC2-\uFFC7\uFFCA-\uFFCF\uFFD2-\uFFD7\uFFDA-\uFFDC])(?:\r\n?|\n)
Replace with $1 to put back the Korean char into the resulting string.
See the regex demo online.
The regex for the set of all Korean characters in unicode is this:
\p{Hangul}
There is more information here: https://www.regular-expressions.info/unicode.html
Maybe you also need a + after your group of characters?
Use the [\p{Hangul}]+\R regular expression instead of what you're using now.
I am trying to highlight (or find) any word that is preceded by another word, being define, and another specific word to be highlighted (as), when define is present, etc. Basically, I need to find words that are found because of other regex searches, but only targetting each word independently.
For example, having the following string:
define MyFile as File
In that case, define is searched using the regex statement \b-?define\b. I also need to find MyFile if it is preceded directly by define. Plus, as needs to be found as well only if it is preceded directly by a word, in this case MyFile, which is preceded by define, and this goes on and on.
How can this be done? I have messed around quite a bit to find how to highlight MyFile correctly, without any success. As for the specific recursive search of as and File, I am clueless.
Keep in mind that all the regex expressions must be separate, since I will use this as a Sublime Text custom syntax highlight match finder.
define\s([\w]+)\sas\s([\w]+)$
This regex code would capture all words after define separated by a space and all words after as separated by space as well
check this regex : https://regex101.com/r/aQ0yO0/2
For not having context of what the data looks like...this is a naive way of doing it but it's pretty intuitive. However, it doesn't use regex. The other examples are good ways to use regex.
seq = "word1 defined as blah blahh blahhh word2 defined as hello helloo"
words_of_interest = []
list_of_words = seq.split(" ")
for i,word in enumerate(list_of_words):
if word == "defined":
words_of_interest.append(list_of_words[i-1])
print words_of_interest
#['word1', 'word2']
The regular expression is always going to encompass the "define" as well. The trick is to use capture groups and refer to them afterwards. The specific way how to do this depends on the "flavor" of your regex.
As I'm not familiar with Sublime's regex, I'm just going to present an example in sed:
$ sed -e 's/define \([A-Za-z]*\)/include \1/g' <<< "define MyFile as File"
include MyFile as File
This example replaces all "define"s with "include"s - and adds whatever was captured by what's inside the group (the regex [A-Za-z]* in this case). Not too useful, but hopefully explanatory :)
The capture group is denoted by the escaped brackets, and (in sed) referenced by the escaped number (representing the index) of the group.
I believe it's capture groups as a concept that you're looking for, rather than any specific regex.
I need a regular expression to find a specific line in a file that occurs somewhere after another line. for example, I may want to find the string "friend", but only when it occurs on a line after a line containing the string "hello". so for example:
hello there
how are you
my friend
should pass, but
how are you
my friend
hello
or
hello friend
how are you
should not pass.
The only thing I've thought of is something like hello[.\s]*\n[.\s]*friend, which does not work.
EDIT: I'm using a customized program that has a lot of limitations. I don't have access to switches or custom modes. I need a single regular expression that works for the standard python regex mode.
hello[.\s]*\n[.\s]*friend
First note that a dot inside a character class matches for a literal dot, not as a "match all" character, so you really want alternation, not character class for this. But also not that a "match all" dot will also match spaces, so you don't even need alternation.
So overall, you really just need this:
hello.*?friend
Now comes the problem with matching across new-line chars. By default the "match all" dot does not match new-line chars. You can flag/modifier it to match it, but how you do that depends on what language you are using. In php or perl, you can use the s modifier, e.g.
php:
preg_match('~hello.*?friend~s',$content);
edit:
If you are trying to use regex in something like an editor (or otherwise can't add flags/modifiers), most editors have an option to flag it as such. If not, you can try alternation with newline chars like so:
hello(.|\r?\n)*friend
You need to include two newline characters.
hello(?:.*\n)+.*friend
This expects atleast one newline character present inbetween.
I'm by no means a regex expert (particularly not in Python), but my RegexBuddy app thinks this will work:
(?s)hello.*\n+.*friend
The (?s) is apparently an inline way of specifying the "Dot matches newline" option, which seems to be necessary for the \n to work.
I know this is a elementary RegEx possibility, but I can't seem to determine the right expression to use.
What I am looking to do is find & replace "foo" and only "foo" within a set of different situations like; abc_foo, abc_foo[something], abc-foo-something, and all different combinations except when it becomes another word like "foobar". The basic 'whole word' search function was close but doesn't help when variables and underscores are factored in.
It's actually not that elementary to match a string which does not contain word characters around itself:
If your language supports negative lookbehind, which is quite rare occasion, it would be simple:
(?<!\w)foo(?!\w)
However, there is a workaround to match the string with surrounding non-word characters (including _ which is a word character but you want to treat is as non-word) and use capturing groups to sort it all out:
(^|[\W_])foo([\W_]|$)
Debuggex Demo
e.g. in javascript syntax:
str.replace(/(^|[\W_])foo([\W_]|$)/g, "$1replacement$2");
You can use a negative lookahead assertion to do this. Using regex search, foo(?!bar) will match any instance of foo not followed by bar, and the following text is not part of the match, only foo is.
I want to find all strings containing at least 1 Cyrillic character (basically /.*[А-я].*/) but with exception of comments.
Comment is a string or part of a string which starts with 2 or more / characters.
Currently I get this regex which do some part of the trick:
^(?=^.*?[А-я]+).*?((?=[\/]{2,})|(^(?:(?![\/]{2,}).)*$))
But I'd like to get less bloated and faster expression.
And as additional question: could anyone explain why this one is working? I combined it by trial-and-error but I'm not sure I completely understood how it works, because when I try to change it in any part - it stops working.
The following regex will match any cyrllic character that is not preceded by a double forward slash
(?<!/{2}.*)[А-я]
It specifies that it should not be preceded by a double slash by using a negative lookbehind.
You haven't specified what flavour of regex your using, but be aware some flavours don't support lookarounds. For example PCRE (javascript) doesn't. You are using 3 of them in your regex, so i presume its ok.