How to linebreak in REGEX with brackets

How to linebreak in REGEX with brackets - regex

I got a list of data like exactly like that
51.9499, 7.555780000000027; 51.49705, 9.389030000000048; 51.249182, 6.991165099999989; 47.3163508, 11.09513949999996; 51.33424979999999, 12.574196000000029; 50.0297493, 19.196331099999952; 47.8270212, 16.25014150000004;
and I want to beautify it a bit by having linebreaks behind the "; " so it would rather look like
51.9499, 7.555780000000027;
51.49705, 9.389030000000048;
51.249182, 6.991165099999989;
...
I am using Adobe Brackets and I am trying to put a hard linebreak into the replace dialogue but that doesn't work - what would instead work?

In the replace bar, click on the .* icon to change the replace method to regular expression.
Then you can use ;\s* replace with ;\n to beautify your code accordingly.
You can see an example of this regex being run here.
Outputs the following:
51.9499, 7.555780000000027;
51.49705, 9.389030000000048;
51.249182, 6.991165099999989;
47.3163508, 11.09513949999996;
51.33424979999999, 12.574196000000029;
50.0297493, 19.196331099999952;
47.8270212, 16.25014150000004;

Related

Using regex multiple capture groups to split up a string

I have a file that looks like this...
"1234567123456","V","0","0","BLAH","BLAH","BLAH","BLAH"
"1234567123456","D","TEST1 "
"1234567123456","D","TEST 2~TEST3"
"1234567123456","R","TEST4~TEST5"
"1234567123457","V","0","0","BLAH","BLAH","BLAH","BLAH"
"1234567123457","D","TEST 6"
"1234567123457","D","TEST7"
"1234567123457","R","TEST 8~TEST9~TEST,10"
All I'm trying to do is parse the D and R lines. The ~ is used in this case as a separator. So the end results would be...
"1234567123456","V","0","0","BLAH","BLAH","BLAH","BLAH"
"1234567123456","D","TEST1 "
"1234567123456","D","TEST3"
"1234567123456","D","TEST3"
"1234567123456","R","TEST4"
"1234567123456","R","TEST5"
"1234567123457","V","0","0","BLAH","BLAH","BLAH","BLAH"
"1234567123457","D","TEST 6"
"1234567123457","D","TEST7"
"1234567123457","R","TEST 8"
"1234567123457","R","TEST9"
"1234567123457","R","TEST,10"
I'm using regex on applications like Textpad and Notepad++. I have not figured out how to use a regex like /.+/g because the applications do not like the forward slashes. So I don't think I can use things like the global modifier. I currently have the following regex...
//In a program like Textpad/Notepad++
<FIND> "(.{13})","D","([^~]*)~(.*)
<REPLACE> "\1","D","\2"\n"\1","D","\3
Now if I run a find and replace with the above params a few times it would work fine (for the D lines only). The problem is there is an unknown number of lines to be made. For example...
"1234567123456","D","TEST1~TEST2~TEST3~TEST4~TEST5"
"1234567123457","D","TEST1~TEST2~TEST3"
"1234567123458","D","TEST1~TEST2"
"1234567123459","D","TEST1~TEST2~TEST3~TEST4"
I was hoping to be able to use a MULTI capture group to make this work. I found this PAGE talking about the common mistake between repeating a capturing group and capturing a repeated group. I need to capture a repeated group. For some reason I just could not make mine work right though. Anyone else have an idea?
Note: If I could get rid of the leading and trailing spaces EX: "1234567123456","D","TEST1 " ending up as "1234567123456","D","TEST1" that would be even better but not necessary.
RESOURCES:
http://www.regular-expressions.info/captureall.html
http://regex101.com/

Regular Expression filter for Meta Fields

I want to parse text content to extract some parameters with Regular Expression.
My text looks like below:
//_META_FIELD{Parameter: S}
And, I want to filter content start with "//_META_FIELD{" and end with "}"
So, I can get the filtered content will : Parameter: S
Can any one help?

This Regex will find what you are looking for:
#^//_META_FIELD{(.+?)}$#m
^ is to make sure is at the beginning of the line and $ is to make sure nothing else is after that closing } You can remove that if you don't need it.
Also you can see an example of that RegExp here

The regex should look something like this:
^//_META_FIELD\{(.*?)\}$

Using regexp with an html string to extract text

I have the following html string:
F.V.Adamian, G.G.Akopian
I want to form a single plain text string with the author names so that it looks something like (I can fine tune the punctuation later):
F.V.Adamian, G.G.Akopian.
I'm trying to use 'regexp' in Matlab. When I do the following:
regexpi(htmlstring,'">.*</a>','match')
I get:
">F.V.Adamian</a>, G.G.Akopian,
Why? I'm trying to get it to continuously output (hence I did not use the 'once' operator) all characters between "> and , which is the author's name. It works fine for the first one but not for the second. I am happy to truncate the "> and with a regexprep(regexpstring,'','') later.
I see that regexprep(htmlstr, '<.*?>','') works and does what I want. But I don't get it...

In .*? the ? is telling the .* to be lazy as opposed to greedy. By default, .* will try to match the largest thing it can. When you add the ? it instead goes for the smallest thing it can
source

finding text between <script></script> tags with RegEx for Coldfusion including linebreaks

I am trying to extract javascript code from HTML content that I receive via CFHTTP request.
I have this simple regex that catches everyting as long as there is no linebreak in the code between the tags.
var result=REMatch("<script[^>]*>(.*?)</script>",html);
This will catch:
<script>testtesttest</script<
but not
<script>
testtest
</script>
I have tried to use (?m) for multiline, but it doesn't work like that.
I am using the reference to figure it out but I am just not getting it with regex.
Heads up, normally there would be javascript between the script tags, not simple text so also characters like {}();:-_ etc.
Can anyone help me out?
Cheers
[[UPDATE]]
Thanks guys, I will try the solutions. I favor regex because but I will look into the HTML Parser too.

(?m) multiline mode is for making ^ and $ match on line breaks (not just start/end of string as is default), but what you're trying to do here is make . include newlines - for that you want (?s) (dot-all mode).
However, I probably wouldn't do this with regex - a HTML parser is a more robust solution. Here's how to do it with jSoup:
var result = jsoup.parse(html).select('script').text();
More details on using jSoup in CF are available here, or alternatively you can use the TagSoup parser, which ships with CF10 (so you don't need to worry about jars/etc).
If you really want regex, then you can use this:
var result = rematch('<script[^>]*>(?:[^<]+|<(?!/script>))+',html);
Unlike using (?s).*? this avoids matching empty blocks (but it will still fail in certain edge cases - if accuracy is required use a HTML parser).
To extract just the text from the first script block, you can strip the script tag with this:
result = ListRest( result[1] , '>' );

You can use dot matches all mode or replace . with [\s\S] to get the same effect.
<script[^>]*>[\s\S]*?</script> would match everything including newlines.

Find/Replace regex to remove html tags

Using find and replace, what regex would remove the tags surrounding something like this:
<option value="863">Viticulture and Enology</option>
Note: the option value changes to different numbers, but using a regular expression to remove numbers is acceptable
I am still trying to learn but I can't get it to work.
I'm not using it to parse HTML, I have data from one of our company websites that we need in excel, but our designer deleted the original data file and we need it back. I have a list of the options and need to remove the HTML tags, using Notepad++ to find and replace

This works for me Notepad++ 5.8.6 (UNICODE)
search : <option value="\d+">(.*?)</option>
replace : $1
Be sure to select "Regular expression" and ". matches newline"

I have done by using following regular expression:
Find this : <.*?>|</.*?>
and
replace with : \r\n (this for new line)
By using this regular expression (<.*?>|</.*?>) we can easily find value between your HTML tags like below:
I have input:
<otpion value="123">1</option><otpion value="1234">2</option><otpion value="1235">3</option><otpion value="1236">4</option><otpion value="1237">5</option>
I need to find values between options like 1,2,3,4,5
and got below output :

This works perfectly for me:
Select "Regular Expression" in "Find" Mode.
Enter [<].*?> in "Find What" field and leave the "Replace With" field empty.
Note that you need to have version 5.9 of Notepad++ for the ? operator to work.
as found here:
digoCOdigo - strip html tags in notepad++

Something like this would work (as long as you know the format of the HTML won't change):
<option value="(\d+)">(.+)</option>

String s = "<option value=\"863\">Viticulture and Enology</option>";
s.replaceAll ("(<option value=\"[0-9]+\">)([^<]+)</option>", "$2")
res1: java.lang.String = Viticulture and Enology
(Tested with scala, therefore the res1:)
With sed, you would use a little different syntax:
echo '<option value="863">Viticulture and Enology</option>'|sed -re 's|(<option value="[0-9]+">)([^<]+)</option>|\2|'
For notepad++, I don't know the details, but "[0-9]+" should mean 'at least one digit', "[^<]" anything but a opening less-than, multiple times. Masking and backreferences may differ.
Regexes are problematic, if they span multiple lines, or are hidden by a comment, a regex will not recognize it.
However, a lot of html is genereated in a regex-friendly way, always fitting into a line, and never commented out. Or you use it in throwaway code, and can check your input before.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to linebreak in REGEX with brackets - regex

Related

Using regex multiple capture groups to split up a string

Regular Expression filter for Meta Fields

Using regexp with an html string to extract text

finding text between <script></script> tags with RegEx for Coldfusion including linebreaks

Find/Replace regex to remove html tags

Categories

Resources