How do I only find newlines in a Regex? - regex

I have a regex which finds newlines encoded in strings as \n (not the actual newline character), but it also finds encoded escaped newlines (\\n), like before the word "anything" in the string below.
var rg = '/(\\n)/g'
var str = 'so you can do pretty much \\nanything you want with it. \n\nAt runtime Carota has'
How can I find all of the newlines and none of the escaped newlines?
Here is a link with an example. https://regexr.com/4fna7

You probably want a negative lookbehind. This is used to look for characters behind text, but not include it in the capture.
Your rewritten regex would look like:
(?<!\\)(\\n)

One way to do this is to begin with a negated character class to ensure you do not pick up the double backslash:
var rg = '/[^\\](\\n)/g'
var str = 'so you can do pretty much \\nanything you want with it. \n\nAt runtime Carota has'

I'm assuming you're using JavaScript. In which case you can simply use the Regex literals like so:
/\n/
That would match all newline characters. If you can't use a Regex literal, JS also offers a constructor which takes a string
new RegExp('\\n')
In order to match the \\n you will need to escape the backslash:
/\\n|\n/
with constructor:
new RegExp('\\\\n|\\n')
Hope that helps.

Related

data mismatch with spark regexp_extract [duplicate]

In Java RegEx, how to find out the difference between .(dot) the meta character and the normal dot as we using in any sentence. How to handle this kind of situation for other meta characters too like (*,+,\d,...)
If you want the dot or other characters with a special meaning in regexes to be a normal character, you have to escape it with a backslash. Since regexes in Java are normal Java strings, you need to escape the backslash itself, so you need two backslashes e.g. \\.
Solutions proposed by the other members don't work for me.
But I found this :
to escape a dot in java regexp write [.]
Perl-style regular expressions (which the Java regex engine is more or less based upon) treat the following characters as special characters:
.^$|*+?()[{\ have special meaning outside of character classes,
]^-\ have special meaning inside of character classes ([...]).
So you need to escape those (and only those) symbols depending on context (or, in the case of character classes, place them in positions where they can't be misinterpreted).
Needlessly escaping other characters may work, but some regex engines will treat this as syntax errors, for example \_ will cause an error in .NET.
Some others will lead to false results, for example \< is interpreted as a literal < in Perl, but in egrep it means "word boundary".
So write -?\d+\.\d+\$ to match 1.50$, -2.00$ etc. and [(){}[\]] for a character class that matches all kinds of brackets/braces/parentheses.
If you need to transform a user input string into a regex-safe form, use java.util.regex.Pattern.quote.
Further reading: Jan Goyvaert's blog RegexGuru on escaping metacharacters
Escape special characters with a backslash. \., \*, \+, \\d, and so on. If you are unsure, you may escape any non-alphabetical character whether it is special or not. See the javadoc for java.util.regex.Pattern for further information.
Here is code you can directly copy paste :
String imageName = "picture1.jpg";
String [] imageNameArray = imageName.split("\\.");
for(int i =0; i< imageNameArray.length ; i++)
{
system.out.println(imageNameArray[i]);
}
And what if mistakenly there are spaces left before or after "." in such cases? It's always best practice to consider those spaces also.
String imageName = "picture1 . jpg";
String [] imageNameArray = imageName.split("\\s*.\\s*");
for(int i =0; i< imageNameArray.length ; i++)
{
system.out.println(imageNameArray[i]);
}
Here, \\s* is there to consider the spaces and give you only required splitted strings.
I wanted to match a string that ends with ".*"
For this I had to use the following:
"^.*\\.\\*$"
Kinda silly if you think about it :D
Heres what it means. At the start of the string there can be any character zero or more times followed by a dot "." followed by a star (*) at the end of the string.
I hope this comes in handy for someone. Thanks for the backslash thing to Fabian.
If you want to end check whether your sentence ends with "." then you have to add [\.\]$ to the end of your pattern.
I am doing some basic array in JGrasp and found that with an accessor method for a char[][] array to use ('.') to place a single dot.
I was trying to split using .folder. For this use case, the solution to use \\.folder and [.]folder didn't work.
The following code worked for me
String[] pathSplited = Pattern.compile("([.])(folder)").split(completeFilePath);

Regex split by delimiter if no char before delimiter

test;136;1234567890;Som/;e;Test;test /;123;qwertyuio;dfghjk
I need to split it to:
test, 136, 1234567890, Som/;e, Test, test /;123, qwertyuio, dfghjk
The delimiter is ";" but there is the case that this char can also be in the text, so in this case I add "/" before ";" in my code. However, I don't know how to exlude it from the regex search.
Thanks for help !
Use a negative look behind:
String parts = str.split("(?<!/);");
If you want to cater for a term that ends with a /, you could escape that too:
String parts = str.split("(?<!/);|(?<=//);");
If you want to allow terms to end with "/;", use an AST language parser.
I wrote regex to find everything excluding those delimiters:
[^((?<!/);|(?<=//);)]*

Replacing multiple string Replace() with a single Regex.Replace()

if I'm doing something like that:
someString.Replace("abc","").Replace("def","").Replace(#"c:\Windows","")
How can I replace that with
Regex.Replace(someString," \\here I don't know what the pattern should be")
I've tried this:
Regex.Replace(someString, #"(?:abc|def|c:\Windows)")
but it didn't work
UPD...
The problem is when I pass the path like that
Regex.Replace(someString, #"(?:abc|def|"+aPath+")")
`But it didnt work` doesn't say much helpfull!
Try this:
someString = Regex.Replace(someString, #"(?:abc|def|ghi|c:\\Windows)", "")
It did work when I tried it. I thinks the reason why your code doesn't work is because you forgot the replacement string and you have to escape the backslash in the path.
I'm assuming the thing that "didn't work" is your C:\windows replacement. You need
someString = Regex.Replace(someString, #"(?:abc|def|C:\\windows)","");
The problem is you need to escape your backslash. An unescaped backslash has meaning in regex. In particular, in this case, \W actually matches any non-alphanumeric character.
Edit to escape an any arbitrary string, you can use Regex.Escape(yourString);

How do I need to escape search string in vim?

I need to search and replace this:
ExecIf($["${debug}" = "1"]?NoOp
with this:
GoSub(chanlog,s,1(1,[${CHANNEL}]
I can't seem to do it in vim, and I'm not sure what needs to be escaped, as nothing I've tried works.
If you want to change a long string with lots of punctuation characters, and it's an exact match (you don't want any of them to be treated as regex syntax) you can use the nomagic option, to have the search pattern interpreted as a literal string.
:set nomagic
:%s/ExecIf($["${debug}" = "1"]?NoOp/GoSub(chanlog,s,1(1,[${CHANNEL}]/
:set magic
You still have to watch out for the delimiters (the slashes of the s/// command) but you can use any character for that, it doesn't have to be a slash, so when you have something like this and there are slashes in the search or replace string, just pick something else, like s#foo#bar# or s:bar:baz:.
If you're having problems with which characters to escape in a vim substitution (:s//), remember the nomagic concept, and in particular the nomagic version of a substitute: :snomagic// or :sno//. nomagic means: interpret each character literally.
So this should work without worrying about escaping characters in the substitution:
:sno/ExecIf($["${debug}" = "1"]?NoOp/GoSub(chanlog,s,1(1, [${CHANNEL}]/
Get to know magic vs. nomagic, :sno//, and \v, \V:
:help magic
The nomagic version of a search for your string uses \V:
/\VExecIf($["${debug}" = "1"]?NoOp
you have to escape the [] and the spaces:
:s/ExecIf($\["${debug}"\ =\ "1"\]?NoOp/GoSub(chanlog,s,1(1,\[${CHANNEL}\]/
just a bit trial and error

How to ignore whitespace in a regular expression subject string?

Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for "cats", I would want "c ats" or "ca ts" to match. I can't strip out the whitespace beforehand because I need to find the begin and end index of the match (including any whitespace) in order to highlight that match and any whitespace needs to be there for formatting purposes.
You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.
/cats/ -> /c\s*a\s*t\s*s/
While the accepted answer is technically correct, a more practical approach, if possible, is to just strip whitespace out of both the regular expression and the search string.
If you want to search for "my cats", instead of:
myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)
Just do:
myString.replace(/\s*/g,"").match(/mycats/g)
Warning: You can't automate this on the regular expression by just replacing all spaces with empty strings because they may occur in a negation or otherwise make your regular expression invalid.
Addressing Steven's comment to Sam Dufel's answer
Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done?
This should do the trick:
/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/
See this page for all the different variations of 'cats' that this matches.
You can also solve this using conditionals, but they are not supported in the javascript flavor of regex.
You could put \s* inbetween every character in your search string so if you were looking for cat you would use c\s*a\s*t\s*s\s*s
It's long but you could build the string dynamically of course.
You can see it working here: http://www.rubular.com/r/zzWwvppSpE
If you only want to allow spaces, then
\bc *a *t *s\b
should do it. To also allow tabs, use
\bc[ \t]*a[ \t]*t[ \t]*s\b
Remove the \b anchors if you also want to find cats within words like bobcats or catsup.
This approach can be used to automate this
(the following exemplary solution is in python, although obviously it can be ported to any language):
you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:
def regex_search_ignore_space(regex, string):
no_spaces = ''
char_positions = []
for pos, char in enumerate(string):
if re.match(r'\S', char): # upper \S matches non-whitespace chars
no_spaces += char
char_positions.append(pos)
match = re.search(regex, no_spaces)
if not match:
return match
# match.start() and match.end() are indices of start and end
# of the found string in the spaceless string
# (as we have searched in it).
start = char_positions[match.start()] # in the original string
end = char_positions[match.end()] # in the original string
matched_string = string[start:end] # see
# the match WITH spaces is returned.
return matched_string
with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'
If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.
And the performance of this function can of course also be optimized, this example is just to show the path to a solution.
The accepted answer will not work if and when you are passing a dynamic value (such as "current value" in an array loop) as the regex test value. You would not be able to input the optional white spaces without getting some really ugly regex.
Konrad Hoffner's solution is therefore better in such cases as it will strip both the regest and test string of whitespace. The test will be conducted as though both have no whitespace.