regex expression (using 'replace' operator for powershell) - replace

Was wondering if someone could help?. I have the following string and wish to ONLY get out the '2.90GHz'
Intel(R) Xeon(R) CPU E5-2690 0 # 2.90GHz
I have managed to use the following ('([0-9].[0-9][0-9]GHz$)') to get me the '2.90Ghz'. However when i try to 'negate' the value it doesn't understand it.
i.e ('^[([0-9].[0-9][0-9]GHz$)]') - what am i doing wrong ???? Any help would be appreciated
Abdul

Don't know where you are getting this string, but Win32_Processor have MaxClockSpeed or CurrentClockSpeed attributes that you can round using Math and string formating. But, answering your question:
$var.Split('#')[-1] should do, then trim any spaces if needed.

Related

How to write regular expression if the reference to the sentence is not clear

I'm learning NLP and I meet a problem when I tried to use regular expression to solve the following questions:
How much did A drop?
How much did B drop?
And the giving sentences are below:
At about 3:45, A careened to still another limit, of 30 points down, and trading was locked again.
2.Futures traders say A was signaling that B could fall as much as 200 points.
3.A had plunged 12 points
I tried to extract the correct answers 30 and 12, and my regular expression code is:
'\s?A (.+ )?(fall|drop|go\sdown|down|fell|plunged)(\sas\smuch\sas)? (\d+)'
Obviously, it's not correct. it will give the answer "200" to the 'A' and miss '30'.
Could someone please teach me how to write Regex based on this situation?
Any response will be greatly appreciated!
If we assume that the text you want to match has the format Letter ... VERB ... [0-9]+ points, then we can try using the following pattern:
\b[A-Z]\b.*?(?:fall|drop|go\sdown|down|fell|plunged|careened).*?(\d+) points
Demo

Trying to write an SQL query with regexp_matches() look behind positive in postgresql

From a PostgreSQL database, I'm trying to match 6 or more digits that come after a string that looks like "(OCoLC)" and I thought I had a working regular expression that would fit that description:
(?<=\(ocolc\))[0-9]{6,}
Here are some strings that it should return the digits for:
|a(OCoLC)08507541 will return 08507541
|a(OCoLC)174097142 will return 174097142
etc...
This seems to work to match strings when I test it on regex101.com, but when I incorporate it into my query:
SELECT
regexp_matches(v.field_content, '(?<=\(ocolc\))[0-9]{6,}', 'gi')
FROM
varfield as v
LIMIT
1;
I get this message:
ERROR: invalid regular expression: quantifier operand invalid
I'm not sure why it doesn't seem to like that expression.
UPDATE
I ended up just resorting to using a case statement, as that seemed to be the best way to work around this...
SELECT
CASE
WHEN v.field_content ~* '\(ocolc\)[0-9]{6,}'
THEN (regexp_matches(v.field_content, '[0-9]{6,}', 'gi'))[1]
ELSE v.field_content
END
FROM
varfield as v
as electricjelly noted, I'm kind of after just the numeric characters, but they have to be preceded by the "(OCoLC)" string, or they're not exactly what I'm after. This is part of a larger query, so I'm running a second case statement a boolean flag in cases where the start of the string wasn't "(OCoLC)". These seems to be more helpful anyway, as I'm going to probably want to preserve those other values somehow.
After looking over your question it seems your error is caused from a syntax problem, not so much from the function not being available on your version of PostgreSQl, as I tested it on 9.6 and I received the same error.
However, what you seem to want is to pull the numbers from a given field as in
|a(OCoLC)08507541 becomes 08507541
an easy way you could accomplish this would be to use regex_replace
the function would be:
regexp_replace('table.field', '\D', '', 'g')
the \D in the function finds all non-numbers and replaces it with a nothing (hence the '') and returns everything else
It looks like after doing some more searching, this is only a feature of versions of PostgreSQL server >= 9.6
https://www.postgresql.org/docs/9.6/static/functions-matching.html#POSIX-CONSTRAINTS-TABLE
The version I am running is version 9.4.6
https://www.postgresql.org/message-id/E1ZsIsY-0006z6-6T#gemulon.postgresql.org
So, the answer is it's not available for this version of PostgreSQL, but presumably this would work just fine in the latest version of the server.

Extract all data in between two double quotes

I'm trying to use a powershell regex to pull version data from the AssemblyInfo.cs file. The regex below is my best attempt, however it only pulls the string [assembly: AssemblyVersion(". I've put this regex into a couple web regex testers and it LOOKS like it's doing what I want, however this is my first crack at using a powershell regex so I could be looking at it wrong.
$s = '[assembly: AssemblyVersion("1.0.0.0")]'
$prog = [regex]::match($s, '([^"]+)"').Groups[1].Value
You also need to include the starting double quotes otherwise it would start capturing from the start until the first " is reached.
$prog = [regex]::match($s, '"([^"]+)"').Groups[1].Value
^
Try this regex "([^"]+)"
Regex101 Demo
Regular expressions can get hard to read, so best practice is to make them as simple as they can be while still solving all possible cases you might see. You are trying to retrieve the only numerical sequence in the entire string, so we should look for that and bypass using groups.
$s = '[assembly: AssemblyVersion("1.0.0.0")]'
$prog = [regex]::match($s, '[\d\.]+').Value
$prog
1.0.0.0
For the generic solution of data between double quotes, the other answers are great. If I were parsing AssemblyInfo.cs for the version string however, I would be more explicit.
$versionString = [regex]::match($s, 'AssemblyVersion.*([0-9].[0-9].[0-9].[0-9])').Groups[1].Value
$version = [version]$versionString
$versionString
1.0.0.0
$version
Major Minor Build Revision
----- ----- ----- --------
1 0 0 0
Update/Edit:
Related to parsing the version (again, if this is not a generic question about parsing text between double quotes) is that I would not actually have a version in the format of M.m.b.r in my file because I have always found that Major.minor are enough, and by using a format like 1.2.* gives you some extra information without any effort.
See Compile date and time and Can I generate the compile date in my C# code to determine the expiry for a demo version?.
When using a * for the third and fourth part of the assembly version, then these two parts are set automatically at compile time to the following values:
third part is the number of days since 2000-01-01
fourth part is the number of seconds since midnight divided by two (although some MSDN pages say it is a random number)
Something to think about I guess in the larger picture of versions, requiring 1.2.*, allowing 1.2, or 1.2.3, or only accepting 1.2.3.4, etc.

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?
How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.

Using Regex to validate the number of words in a text area

I am attempting to write a MVC model validation that verifies that there is 10 or more words in a string. The string is being populated correctly, so I did not include the HTML. I have done a fair bit of research, and it seems that something along the lines of what I have tries should work, but, for whatever reason, mine always seem to fail. Any ideas as to what I am doing wrong here?
(using System.ComponentModel.DataAnnotations, in a mvc 4 vb.net environment)
Have tried ([\w]+){10,}, ((\\S+)\s?){10,}, [\b]{20,}, [\w+\w?]{10,}, (\b(\w+?)\b){10,}, ([\w]+?\s){10}, ([\w]+?\s){9}[\w], ([\S]+\s){9}[\S], ([a-zA-Z0-9,.'":;$-]+\s+){10,} and several more varaiations on the same basic idea.
<Required(ErrorMessage:="The Description of Operations field is required"), RegularExpression("([\w]+){20,}", ErrorMessage:="ERROZ")>
Public Property DescOfOperations As String = String.Empty
Correct Solution was ([\S]+\s+){9}[\S\s]+
EDIT Moved accepted version to the top, removing unused versions. Unless I am wrong and the whole sequence needs to match, then something like (also accounting for double spaces):
([\S]+\s+){9}[\S\s]+
Or:
([\w]+?\s+){9}[\w]+
Give this a try:
([a-zA-Z0-9,.'":;$-]+\s){10,}