Looking at the xml file created by HitManPro I can see numerous entries like this one;
[Item type="Malware" malwareName="Trojan" score="0.0" status="None"]
This are the false positives.
I would like to replace the existing RegEX query that I use in a script (LabTech) with one that would look for anything like;
score="5.1" up to score="999.0"
I am new to Reg Ex queries, and I am having trouble building the search for digits inside the string score=" " .
Any help would be much appreciated. Below is a sample XML from hitmanPro
regards,
Oscar Romero
<br>
HitmanPro Scan Completed Successfully.
Threats Found!
<hr>
Scan Date: 2015-10-17T15:16:31<BR>
<p>"
[Log computer="computer name" windows="6.1.1.7601.X64/12" scan="Normal" version="3.7.9.246" date="2015-10-17T15:16:31" timeSpentInSecs="125" filesProcessed="15922"]
[Item type="Malware" malwareName="Malware" score="90.0" status="None"]
[Scanners]
[Scanner id="Bitdefender" name="Gen:Variant.Kazy.751212" /]
[/Scanners]
[File path="C:\Program Files (x86)\ESET\ESET Remote Administrator\Server\era.exe" hash="F7BB46D48B994539AFD400641CE8E4F85114FC7BA05A1BAA0D092F3A92817F13" /]
[Startup]
[Key path="HKLM\SYSTEM\CurrentControlSet\Services\ERA_SERVER\" /]
[/Startup]
[/Item]
[/Log]
"</p>
There must be a shorter version than this, but this should work.
score="(0\.[1-9]|[1-9]\.[0-9]|[1-9][0-9]\.[0-9]|[1-9][0-9][0-9]\.[0-9])"
Matches:
0.1
1.0
10.4
100.9
100.0
999.9
99.9
9.9
(etc.)
Does Not Match
0.0
0
(etc.)
Is regex the way to go?
As for whether regex is the right tool for the job, I probably agree with #Makoto that it isn't - unless you're doing a quick scan of the results as an FYI, rather than filtering results as part of a larger tool or application. In other words, except for the simplest cases, I agree with #Makoto that you want some xml parsing tool.
I have no idea on LabTech.
Anyway, the regex query that you can use:
\sscore="((?:5\.[1-9])|(?:[6-9]\.[0-9])|(?:[1-9]{1}[0-9]{1,2}\.[0-9]))"\s
or
\sscore="(5\.[1-9]|[6-9]\.[0-9]|[1-9]{1}[0-9]{1,2}\.[0-9])"\s
if you prefer without the (?: ... )
UPDATE:
Okay, I made further changes to support the 5.1 minimum, and max 999.9
PS: This is my first answer on StackOverflow
Related
Im trying to get the number between '-' and '-' in google sheets but after trying many things I still havent been able to find the solution.
Data record 1
England Premier League
West Ham vs Crystal Palace
2.090 - 3.47 - 3.770
Expected value = 3.47
Data record 2
England League Two
Carlisle vs Scunthorpe
2.830 - 3.15 - 2.820
Expected value = 3.15
Hopefully someone can help me out
Try either of the following
option 1.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4," \d+\.\d+ ")*1))
option 2.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4,".* - (\d+\.\d+) ")))
(Do adjust the formula according to your ranges and locale)
use:
=INDEX(IFNA(REGEXEXTRACT(A1:A, "- (\d+(?:.\d+)?) -")*1))
I have two dates in cells
A1=05.11.2021 18:16
B1=05.11.2021 20:16
I need to find difference in hours between two dates. Result should be (B1-A1)=2 I can't find an answer on the Internet, I ask for help.
use:
=TEXT((DATE(
REGEXEXTRACT(B1, "\d{4}"),
REGEXEXTRACT(B1, "\.(\d+)\."),
REGEXEXTRACT(B1, "^\d+"))+INDEX(SPLIT(B1, " "),,2))-(DATE(
REGEXEXTRACT(A1, "\d{4}"),
REGEXEXTRACT(A1, "\.(\d+)\."),
REGEXEXTRACT(A1, "^\d+"))+INDEX(SPLIT(A1, " "),,2)), "[h]")
arrayformula:
=INDEX(IFNA(TEXT((DATE(
REGEXEXTRACT(B1:B, "\d{4}"),
REGEXEXTRACT(B1:B, "\.(\d+)\."),
REGEXEXTRACT(B1:B, "^\d+"))+INDEX(SPLIT(B1:B, " "),,2))-(DATE(
REGEXEXTRACT(A1:A, "\d{4}"),
REGEXEXTRACT(A1:A, "\.(\d+)\."),
REGEXEXTRACT(A1:A, "^\d+"))+INDEX(SPLIT(A1:A, " "),,2)), "[h]")))
shorter:
=INDEX(IFERROR(1/(1/(TEXT(
REGEXREPLACE(B1:B, "(\d+).(\d+).(\d{4})", "$2/$1/$3")-
REGEXREPLACE(A1:A, "(\d+).(\d+).(\d{4})", "$2/$1/$3"), "[h]")))))
EDIT:
As what #basic mentioned in the above comment, you can format the cell where your output goes or use text with h for hour difference and [h] for the whole duration in hours (got from Cooper's answer). See usage and difference below:
Text:
=text(B1-A1, "h")
or
=text(B1-A1, "[h]")
Update:
Make sure your Date Times uses proper delimiters. / and - are acceptable (e.g. 5/11/2021 18:16:00 or 5-11-2021 18:16:00). (This depends entirely on your locale.)
If you want to show it having . as delimiter, just use a custom Date Time format and use . as its delimiter.
Using custom format:
Actual value vs Display value:
If you don't want to do any changes to the date time and want to have it as text, then replace them using regexreplace before using them in text.
RegexReplace:
=text(REGEXREPLACE(B1, "\.", "/") - REGEXREPLACE(A1, "\.", "/"), "h")
or
=text(REGEXREPLACE(B1, "\.", "/") - REGEXREPLACE(A1, "\.", "/"), "[h]")
Does pandas have a built-in string matching function for exact matches and not regex? The code below for tropical_two has a slightly higher count. Documentation tells me it does a regex search.
tropical = reviews['description'].map(lambda x: "tropical" in x).sum()
print(tropical)
tropical_two = reviews['description'].str.count("tropical").sum()
print(tropical_two)
The first way is the answer key from Kaggle but something about it seems less readable and intuitive to me compared to a .str function because when I run this it returns True instead of 2 so I am a little confused about if the answer key method is actually counting all occurrences of "tropical" and not just the first.
def in_str(text):
return "tropical" in text
in_str("tropical is tropical")
First 2 lines of dataframe:
0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe #kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss #vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
Notebook here, tropical code in cell #2
https://www.kaggle.com/mikexie0/exercise-summary-functions-and-maps
You may use str.count with word boundary markers to match the exact search term:
tropical_two = reviews['description'].str.count(r'\btropical\b').sum()
print(tropical_two)
There may not be the need for a separate exact API, as str.count can be used for exact matches as well.
I have a text file with data formatted as below. Figured out how to format the second part of the file to format it for upload into a db table. Hitting a wall trying to get the just the first 7 lines to format in the same way.
If it wasn't obvious, I'm trying to get it pipe delimited with the exact same number of columns, so I can easily upload it to the db.
Year: 2019 Period: 03
Office: NY
Dept: Sales
Acct: 111222333
SubAcct: 11122234-8
blahblahblahblahblahblahblah
Status: Pending
1000
AAAAAAAAAA
100,000.00
2000
BBBBBBBBBB
200,000.00
3000
CCCCCCCCCC
300,000.00
4000
DDDDDDDDDD
400,000.00
some kind folks answered my question about the bottom part, using the following code I can format that to look like so -
(.*)\r?\n(.*)\r?\n(.*)(?:\r?\n|$)
substitute with |||||||$1|$2|$3\n
|||||||1000|AAAAAAAAAA|100,000.00
|||||||2000|BBBBBBBBBB|200,000.00
|||||||3000|CCCCCCCCCC|300,000.00
|||||||4000|DDDDDDDDDD|400,000.00
just need help formatting the top part - to look like this, so the entire file matches with the exact same number of columns.
Year: 2019|Period: 03|Office: NY|Dept: Sales|Acct: 111222333|SubAcct: 11122234-8|blahblahblahblahblahblahblah|Status: Pending|||
I'm ok with having multiple passes on the file to get the desired end result.
I've helped you on your previous question, so I will focus now on the first part of your file.
You can use this regex:
\n|\b(?=Period)
Working demo
And use | as the replacement string
If you don't want the previous space before Period, then you can use:
\n|\s(?=Period)
Looking for help on building a regex that captures a 1-line string after a specific word.
The challenge I'm running into is that the program where I need to build this regex uses a single line format, in other words dot matches new line. So the formula I created isn't working. See more details below. Any advice or tips?
More specific regex task:
I'm trying to grab the line that comes after the word Details from entries like below. The goal is pull out 100% Silk, or 100% Velvet. This is the material of the product that always comes after Details.
Raw data:
<p>Loose fitted blouse green/yellow lily print.
V-neck opening with a closure string.
Small tie string on left side of top.</p>
<h3>Details</h3> <p>100% Silk.</p>
<p>Made in Portugal.</p> <h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p> <p>Size 34 measurements</p>
OR
<p>The velvet version of this dress. High waist fit with hook and zipper closure.
Seams run along edges of pants to create a box-like.</p>
<h3>Details</h3> <p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
Here is the current formula I created that's not working:
Replace (.)(\bDetails\s+(.)) with $3
The output gives the below:
<p>100% Silk.</p>
<p>Made in Portugal.</p>
<h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p>
<p>Size 34 measurements</p>
OR
<p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
`
How do I capture just the desired string? Let me know if you have any tips! Thank you!
Difficult to provide a working solution in your situation as you mention your program has "limited regex features" but don't explain what limitations.
Here is a Regex you can try to work with to capture the target string
^(?:<h3>Details<\/h3>)(.*)$
I would personally use BeautifulSoup for something like this, but here are two solutions you could use:
Match the line after "Details", then pull out the data.
matches = re.findall('(?<=Details<).*$', text)
matches = [i.strip('<>') for i in matches]
matches = [i.split('<')[0] for i in [j.split('>')[-1] for j in matches]]
Replace "Details<...>data" with "Detailsdata", then find the data.
text = re.sub('Details<.*?<.*>', '', text)
matches = re.findall('(?<=Details).*?(?=<)', text)