RegExp to match visible non-letter characters before line break

RegExp to match visible non-letter characters before line break - regex

I am working on a vbs regexp that will detect a tag which contains text and a CRLF character before closing tag.
I am currently using \w+[:;?!.,""\)\]-~]*(\s)*(\r\n\s*)(<\/.*>)
Looking from the end of the expression, I am matching any closing tag, CRLF plus optionally blank spaces, an optional spaces before CRLF and it should optionally match any other visible non-letter character which occurs after any word.
This is to match things like
myword! CRLF</tag>
mywordCRLF</tag>
myword CRLF</tag>
myword...CRLF </tag>
etc.
However, I do not want to match below, as I need to detect tags containing TEXT and linebreaks.
</otherclosingtag> CRLF </tag>
I am concerned about the \w+[:;?!.,""\)\]-~]* bit as it doesn't look right to me, as I would need to insert quite a large number of characters here.
I tried replacing it with \S, \W but they all seem to match CRLF characters as well.
Any ideas?
Cheers!

How about using non-greedy modifier:
\w+\W*?\r\n\s*(<\/.*>)
or
\w+[^\r\n]*\r\n\s*(<\/.*>)

The solution that I used:
\w+[^\r\n<>]*(\r\n\s*)(<\/.*>)
It matches a word (so not ) then anything that is not the CR, LF or > (so it doesn't match openingtag> CRLF</closingtag>)
This is a modified version of what M42 has proposed, I had added <> to make sure we won't match a tag.
Thanks for suggestions!

Try this:
^.*[\n\t\s]*</.*>$ --> BAD
^.*[\r\n\t\s]*</.*>$

Related

Regex to exclude quoted strings

I know that there are tons of similar question; I read hundreds, but...for my litlle knowledge of English and my even lower knowledge of Regex, I'am still in the fog.
I need to elaborate a quite large text file which includes paragraphs in two formats: enclosed in quotes or not; in both cases paragraphs could have one or more Carriage Return. I have to process only the lines enclosed in quotes. So: "This is \r a phrase" must be processed (actually I have to replace the \r with ad dummy character like '#'), while 'This is \r a comment' must be excluded.
I tried this pattern: "[\s\S(\r)]+"
This correctly selects only the enclosed paragraphs, but the regex debugger does not report the \r group to be replaced.

Try this pattern: "[\s\S](\r)[\s\S]"
You need to escape the \ character, since \r means something specific with RegEx.

Is there regex to remove space and newline from xml input file

I would like to change an xml which is in format
<input>My
Input</input>
<input2>My
input2</input2>
to
<input>My Input</input>
<input2>My input2</input2>
The input xml file has more than 10000 records with xml in the above format which breaks the software to work properly.
Need a regex to fix it in one stroke.
I tried ('//n','') but it is not functioning as expected

If your regex flavor supports Lookbehinds, you may use something like this:
(?<!>)(\s)*[\r\n]+
..and replace with \1.
This will match any number of new-line characters, preceded by zero or more other whitespace characters and not preceded by the > character. Then, it will replace them with a whitespace character (if present) or nothing.
Demo.
If Lookbehind is not supported, you may use:
([^>])(\s)*[\r\n]+
..and replace with \1\2.

regex match file with multiple extension

I have several strings like this
XYZ_TEST_2017.txt
ASD_TEST_2017.txt.tmp
I need to extract only those strings ending with .txt
So I'm using this regex:
[A-Z]{3}_TEST_[0-9]{4}.txt
However I still get the strings with multiple extensions like the second one (.txt.tmp)
See my regex demo.
How can I handle it?

To have your regex match everything up to the end, append an "end-of-text marker" ($) to your pattern like this:
[A-Z]{3}_TEST_[0-9]{4}\.txt$
As you may have noticed, I also escaped the dot, otherwise this filename would match as well:
SOM_TEST_1234Etxt
The dot (.) would match any character (depending on your flags, even newline and carriage return), in this case, the E before txt.

Eclipse Add text to first line of all files

I need to add text to first line of all my JSP's in eclipse, this is the regex I a using \A.* but some how it selects the first line, I just want to prepend text to the start of the file. any help will be very much appreciated.

The .* pattern matches any 0+ chars other than line break characters, so it matches the first line.
It seems that Eclipse Find/Replace regex feature does not match entirely zero-width patterns (e.g. (?=,) will not find and insert a text before commas).
A workaround is to match and capture some text with (...) (where ... stand for a consuming pattern) capturing group and use $1 in the replacement pattern to reinsert the matched text.
Use
\A(.*)
Replace with MY_NEW_TEXT_HERE_AT_THE_START_OF_FILE$1.

Regex to match tag contents while simultaneously omitting leading and trailing whitespace

I am trying to write a regex that matches entire contents of a tag, minus any leading or trailing whitespace. Here is a boiled-down example of the input:
<tag> text </tag>
I want only the following to be matched (note how the whitespace before and after the match has been trimmed):
"text"
I am currently trying to use this regex in .NET (Powershell):
(?<=<tag>(\s)*).*?(?=(\s)*</tag>)
However, this regex matches "text" plus the leading whitespace inside of the tag, which is undesired. How can I fix my regex to work as expected?

You should not use regext to parse html.
Use a parser instead.
Also:
Regex to remove body tag attributes (C#)
Also also: RegEx match open tags except XHTML self-contained tags
If all that doesn't convince you, then don't use the dot in the middle of your expression. Use the alphanumeric escape. Your dot is consuming whitespace. Use \w (I think) instead.

Drop the lookarounds; they just make the job more complicated than it needs to be. Instead, use a capturing group to pick out the part you want:
<tag>\s*(.*?)\s*</tag>
The part you want is available as $matches[1].

Use these regular expressions to strip trailing and leading whitespaces. /^\s+/ and /\s+$/

test = "<tag> test </tag>";
string pattern3 = #"<tag>(.*?)</tag>";
Console.WriteLine("{0}", Regex.Match(test,pattern3).Groups[1].Value.Trim());

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

RegExp to match visible non-letter characters before line break - regex

How about using non-greedy modifier: \w+\W?\r\n\s(<\/.>) or \w+[^\r\n]\r\n\s(<\/.>)

Try this: ^.[\n\t\s]</.>$ --> BAD ^.[\r\n\t\s]</.>$

Related

Regex to exclude quoted strings

Is there regex to remove space and newline from xml input file

regex match file with multiple extension

Eclipse Add text to first line of all files

Regex to match tag contents while simultaneously omitting leading and trailing whitespace

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

RegExp to match visible non-letter characters before line break - regex

How about using non-greedy modifier: \w+\W*?\r\n\s*(<\/.*>) or \w+[^\r\n]*\r\n\s*(<\/.*>)

Try this: ^.*[\n\t\s]*</.*>$ --> BAD ^.*[\r\n\t\s]*</.*>$

Related

Regex to exclude quoted strings

Is there regex to remove space and newline from xml input file

regex match file with multiple extension

Eclipse Add text to first line of all files

Regex to match tag contents while simultaneously omitting leading and trailing whitespace

Categories

Resources

How about using non-greedy modifier: \w+\W?\r\n\s(<\/.>) or \w+[^\r\n]\r\n\s(<\/.>)

Try this: ^.[\n\t\s]</.>$ --> BAD ^.[\r\n\t\s]</.>$