I have text as:
[img]http://cimislia.net/uploads/posts/2013-07/1373995142_vrc.png[/img]
[color=#3333FF]Дата выхода: 26 октября 2012
Жанр: Racing, Simulator, 3D
Разработчик: SCS Software
Издательство: Excalibur Publishing
I want to remove everything after [/img] so above text will be:
[img]http://cimislia.net/uploads/posts/2013-07/1373995142_vrc.png[/img]
Can somebody help please? How to do this with regex?
This regex
\[img\].*\[/img\]
will match only [img] tags and everything between.
How to use it depends on the programming language you are using.
Example in C#:
var text = #"[img]http://cimislia.net/uploads/posts/2013-07/1373995142_vrc.png[/img]
[color=#3333FF]Дата выхода: 26 октября 2012
Жанр: Racing, Simulator, 3D
Разработчик: SCS Software
Издательство: Excalibur Publishing
";
Regex regex = new Regex("\\[img\\].*\\[/img\\]");
var imgOnly = regex.Match(text).Value;
you can search for \[/img\](.|\n)* and replace with [/img]
Related
I am new to using regex. I would like to use Notion to create a personal reference manager. My idea is to extract information from one column containing a bibtex entry to another column, that would contain, for instance, the title of the paper.
My idea that worked better so far:
replaceAll(
replaceAll(prop("Bibtex"), "^((.|\n)*)[tT]itle(\\s|.*)=(\\s|.*){", ""),
"}((.|\n)*)",
""
)
but it fails if the title has any curly brackets. For instance, the Bibtex entry
#article{xu2015experimental,
title = {Experimental Detection of a Majorana Mode in the core of a Magnetic Vortex inside a Topological Insulator-Superconductor ${\mathrm{Bi}}{2}{\mathrm{Te}}{3}/{\mathrm{NbSe}}_{2}$ Heterostructure},
author = {Xu, Jin-Peng and Wang, Mei-Xiao and Liu, Zhi Long and Ge, Jian-Feng and Yang, Xiaojun and Liu, Canhua and Xu, Zhu An and Guan, Dandan and Gao, Chun Lei and Qian, Dong and Liu, Ying and Wang, Qiang-Hua and Zhang, Fu-Chun and Xue, Qi-Kun and Jia, Jin-Feng},
journal = {Phys. Rev. Lett.},
volume = {114}, issue = {1},
pages = {017001},
numpages = {5},
year = {2015},
month = {Jan},
publisher = {American Physical Society},
doi = {10.1103/PhysRevLett.114.017001},
url = {https://link.aps.org/doi/10.1103/PhysRevLett.114.017001} }
becomes
#article{xu2015experimental,
title = {Experimental Detection of a Majorana Mode in the core of a Magnetic Vortex inside a Topological Insulator-Superconductor ${\mathrm{Bi
instead of
Experimental Detection of a Majorana Mode in the core of a Magnetic Vortex inside a Topological Insulator-Superconductor ${\mathrm{Bi}}{2}{\mathrm{Te}}{3}/{\mathrm{NbSe}}_{2}$ Heterostructure
Any help would be appreciated.
If I understand, make the match for any character or newline non-greedy and anchor to the start of the line.
^[tT]itle={((.|\n)*?)},
regex101.com example
Edit: This works for also for the new example (allowing for optional whitespace before the word title and around the equal sign):
^\s*?[tT]itle\s?=\s?{((.|\n)*?)},
Regular expression(May be) to find the word/string surrounded by other words.
===========================================================================
For example I have below sentences
1.I’m setting up a new server, The key is ABC and want to support UTF-8 fully in my web application. Where do I need to set the encoding/charsets?”
2.XYZ is the key for the new server I am setting and it is located at address 111 abc
3.key as of the date is WWW for the new server I am setting at 111, ABC London
4.The key for server is LMN and it is being setup at location 111, abc London.
key will be finite and will only have around 10 values. The value for key itself can be any form though. I have used ACB, XYZ, WWW, LMN as example above
I should be able to identify that Key exists in the sentence and extract value(ACB, XYZ, WWW, LMN) from all the above examples.
I have basically tried finding using if then else which is very cumbersome and dont have very good code to show yet. But will update when I can
I have basically tried finding using if then else which is very cumbersome and dont have very good code to show yet. But will update when I can
I should be able to identify that Key exists in the sentence and extract value(ACB, XYZ, WWW, LMN) from all the above examples.
Another option could be to use Spacy with dependency parsing
Any help will be greatly appreciated
This expression is likely to return the desired output, not sure though:
^(?=.*\b(ABC|XYZ|WWW|LMN)\b).*$
DEMO
Test
import re
regex = r"^(?=.*\b(ABC|XYZ|WWW|LMN)\b).*$"
test_str = """
1.I’m setting up a new server, The key is ABC and want to support UTF-8 fully in my web application. Where do I need to set the encoding/charsets?”
2.XYZ is the key for the new server I am setting and it is located at address 111 abc
3.Key as of the date is WWW for the new server I am setting at 111, ABC London
4.The key for server is LMN and it is being setup at location 111, abc London.
"""
print(re.findall(regex, test_str,re.M))
Output
['ABC', 'XYZ', 'ABC', 'LMN']
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Looking for help on building a regex that captures a 1-line string after a specific word.
The challenge I'm running into is that the program where I need to build this regex uses a single line format, in other words dot matches new line. So the formula I created isn't working. See more details below. Any advice or tips?
More specific regex task:
I'm trying to grab the line that comes after the word Details from entries like below. The goal is pull out 100% Silk, or 100% Velvet. This is the material of the product that always comes after Details.
Raw data:
<p>Loose fitted blouse green/yellow lily print.
V-neck opening with a closure string.
Small tie string on left side of top.</p>
<h3>Details</h3> <p>100% Silk.</p>
<p>Made in Portugal.</p> <h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p> <p>Size 34 measurements</p>
OR
<p>The velvet version of this dress. High waist fit with hook and zipper closure.
Seams run along edges of pants to create a box-like.</p>
<h3>Details</h3> <p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
Here is the current formula I created that's not working:
Replace (.)(\bDetails\s+(.)) with $3
The output gives the below:
<p>100% Silk.</p>
<p>Made in Portugal.</p>
<h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p>
<p>Size 34 measurements</p>
OR
<p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
`
How do I capture just the desired string? Let me know if you have any tips! Thank you!
Difficult to provide a working solution in your situation as you mention your program has "limited regex features" but don't explain what limitations.
Here is a Regex you can try to work with to capture the target string
^(?:<h3>Details<\/h3>)(.*)$
I would personally use BeautifulSoup for something like this, but here are two solutions you could use:
Match the line after "Details", then pull out the data.
matches = re.findall('(?<=Details<).*$', text)
matches = [i.strip('<>') for i in matches]
matches = [i.split('<')[0] for i in [j.split('>')[-1] for j in matches]]
Replace "Details<...>data" with "Detailsdata", then find the data.
text = re.sub('Details<.*?<.*>', '', text)
matches = re.findall('(?<=Details).*?(?=<)', text)
Hello I am using Talend to prepare product data for import into DB. I want to use the extract string parts function for Talend.
I have the following data in one cell. (The length of the data varies not a fixed width format)
Measurement: Ring Head Width: 6.8 Ring Height: 5.5 Ring Shank Width: 1.1 Ladies Band Width: 2.5 Ladies band shank Width: 1.2
I need help creating a regex format to match each measurement value and extract it to a new column.
What would be Regex to match the following text ?
Ring Head Width: 6.8
and extract the numeric value following it, which is
6.8
Similarly I want to create regex for all the above measurements. I am assuming the format will be the same.
Thank for your time and help.
If you don't bother using multiple actions to acheive this result I suggest that you use:
the "Split text in parts" action on ":"
and then use "remove whitespaces" to have a clean value.
If you really need to keep one action, you have the "Remove part of the text" action on regex that is based on the java Pattern.
Using regex ".*:\s" works fine
Looking at the xml file created by HitManPro I can see numerous entries like this one;
[Item type="Malware" malwareName="Trojan" score="0.0" status="None"]
This are the false positives.
I would like to replace the existing RegEX query that I use in a script (LabTech) with one that would look for anything like;
score="5.1" up to score="999.0"
I am new to Reg Ex queries, and I am having trouble building the search for digits inside the string score=" " .
Any help would be much appreciated. Below is a sample XML from hitmanPro
regards,
Oscar Romero
<br>
HitmanPro Scan Completed Successfully.
Threats Found!
<hr>
Scan Date: 2015-10-17T15:16:31<BR>
<p>"
[Log computer="computer name" windows="6.1.1.7601.X64/12" scan="Normal" version="3.7.9.246" date="2015-10-17T15:16:31" timeSpentInSecs="125" filesProcessed="15922"]
[Item type="Malware" malwareName="Malware" score="90.0" status="None"]
[Scanners]
[Scanner id="Bitdefender" name="Gen:Variant.Kazy.751212" /]
[/Scanners]
[File path="C:\Program Files (x86)\ESET\ESET Remote Administrator\Server\era.exe" hash="F7BB46D48B994539AFD400641CE8E4F85114FC7BA05A1BAA0D092F3A92817F13" /]
[Startup]
[Key path="HKLM\SYSTEM\CurrentControlSet\Services\ERA_SERVER\" /]
[/Startup]
[/Item]
[/Log]
"</p>
There must be a shorter version than this, but this should work.
score="(0\.[1-9]|[1-9]\.[0-9]|[1-9][0-9]\.[0-9]|[1-9][0-9][0-9]\.[0-9])"
Matches:
0.1
1.0
10.4
100.9
100.0
999.9
99.9
9.9
(etc.)
Does Not Match
0.0
0
(etc.)
Is regex the way to go?
As for whether regex is the right tool for the job, I probably agree with #Makoto that it isn't - unless you're doing a quick scan of the results as an FYI, rather than filtering results as part of a larger tool or application. In other words, except for the simplest cases, I agree with #Makoto that you want some xml parsing tool.
I have no idea on LabTech.
Anyway, the regex query that you can use:
\sscore="((?:5\.[1-9])|(?:[6-9]\.[0-9])|(?:[1-9]{1}[0-9]{1,2}\.[0-9]))"\s
or
\sscore="(5\.[1-9]|[6-9]\.[0-9]|[1-9]{1}[0-9]{1,2}\.[0-9])"\s
if you prefer without the (?: ... )
UPDATE:
Okay, I made further changes to support the 5.1 minimum, and max 999.9
PS: This is my first answer on StackOverflow