Regex, how to select all items outside of selection group - regex

I'm a Regex noob and am pretty sure I'm not going about this in the most efficient way - wanted to get some advice.
I have a Regex expression ((\w+\b.*?){100}){1} which selects the first 100 words of my string, the length of which varies.
What I want is to select the entire string except for the first 100 words.
Is there syntax I can add to my current expression to do this, or am I better off trying to directly select the rest of the text instead.
Also, if anyone has any good resources for improving my Regex knowledge, i'd be very appreciative. Thus far I've found http://gskinner.com/RegExr/ to be very helpful.
Thanks in advance!

If you use this, you can refer to everything else as group 3 noted as $3
This one will treat hyphenated words as one word.
(\w+(-\w+|\b).*?){100}(.*)
Regex training Here

Related

Get a part of the second group

I'm having some difficulties with regex.
Here is an example of the string on which I'm doing regex:
This is some useless information (first;second;third;fourth;fifth;sixth) (seventh;eigth;ninth;tenth)
I am looking for a regex that will allows me to pick only one of the word in parenthesis, like 'ninth'. The word I need to pick depends on where I'm in my program, so I will just adapt the regex once I will know how to write it
The best I have found for the moment is : (?<=\()([^]]+?)(?=\)).*?
That allows me to match the whole content of the group between parenthesis.
Can someone help me please?
If the need is to match the contents between parenthesis given a variable
input parameter it can be done like this :
(?<=\()(?:(?![()]).)*?(?<=[(;])(ninth)(?=[);])(?:(?![()]).)*(?=\))
It is dynamically constructed by joining the three parts.
(?<=\()(?:(?![()]).)*?(?<=[(;])( + variable + )(?=[);])(?:(?![()]).)*(?=\))
https://regex101.com/r/6yxQyp/1
Where the variable is captured in group 1 if needed.
Thank you for your help.
I finally find a regex that allows me to get the value I wanted.
To reuse my example :
This is some useless information (first;second;third;fourth;fifth;sixth) (seventh;eigth;ninth;tenth)
The worlds 'first' and 'seventh' will always be there so if i want to get the value of second i will use :
\(first;(\w+)
to get ninth's value i will use :
\(seventh;\w+;(\w+)
Hope this will help someone else !
Have a good day :)

Search and replace with particular phrase

I need a help with mass search and replace using regex.
I have a longer strings where I need to look for any number and particular string - e.g. 321BS and I need to replace just the text string that I was looking for. So I need to look for BS in "gf test test2 321BS test" (the pattern is always the same just the position differs) and change just BS.
Can you please help me to find particular regex for this?
Update: I need t keep the number and change just the text string. I will be doing this notepad++. However I need a general funcion for this if possible. I am a rookie in regex. Moreover, is it possible to do it in Trados SDL Studio? Or how am i able to do it in excel file in bulk?
Thank you very much!
Your question is a bit vague, however, as I understand it you want to match any digits followed by BS, ie 123BS. You want to keep 123 but replace BS?
Regex: (\d+)BS matches 123BS
In notepad++ you can:
match (\d+)BS
replace \1NEWTEXT
This will replace 123BS with 123NEWTXT.
\1 will substitue the capture group (\d+). (which matches 1 or more digits.
You could do this in Trados Studio using an app. The SDLXLIFF Toolkit may be the most appropriate for you. The advantage over Notepad++ is that it's controlled and will only affect the translatable text and not anything that might break the integrity of the file if you make a mistake. You can also handle multiple files, or even multiple Trados Studio projects in one go.
The syntax would be very similar to the suggestion above... you would:
match (\d+)BS
replace $1NEWTEXT

Regex to match sentences with jumbled words but preserving sentence order

I want to match sentences in such a way that words with the sentence can be any order but the sentences should be in same order.
e.g.
My name is Sam. I love regex.
Acceptable input:
My Sam is name. regex I love.
name is My Sam. I regex love.
Invalid input:
I love regex. My name is Sam.
regex I love. is My name Sam.
sample regex I have come up so far to solve the above problem
^((?=.*\bMy\b)(?=.*\bSam\b)(?=.*\bis\b)(?=.*\bname\b))((?=.*\bregex\b)(?=.*\bI\b)(?=.*\blove\b)).*$
Which is not working as expected.
Can this problem be solved by regex? What would be the recommended approach to solve this?
Note: Please ignore . I am using it just for clarity.
I think you are looking for something else than regex. If you would want to do this, the most efficient way would be to compare an array of expected words and 'check' if they all appear once in a sentence. This is completely dependent on which context you are using. If you need a regex that literally finds what you stated in your example, you could use something like this:
/(My|name|is|Sam) (My|name|is|Sam) (My|name|is|Sam) (My|name|is|Sam)\. (I|love|regex) (I|love|regex) (I|love|regex)./g
But as you can see, this regex would grow exponentially the more words your sentence has. Also, it's really inefficient compared to parsing it with something else.
I couldn't achieve with a single regex, instead I did the following:
Virtually divided the sentence into multiple blocks.
Maintained a sentence block -> regex configuration.
regex configuration depends on the rule applicable on that sentence block.
Applied the regex on the sentence to identify whether such block is existing or not.
At last verifying whether the blocks are appearing in the configured order or not.

Regular Expression to find CVE Matches

I am pretty new to the concept of regex and so I am hoping an expert user can help me craft the right expression to find all the matches in a string. I have a string that represents a lot of support information in it for vulnerabilities data. In that string are a series of CVE references in the format: CVE-2015-4000. Can anyone provide me a sample regex on finding all occurrences of that ? obviously, the numeric part of that changes throughout the string...
Generally you should always include your previous efforts in your question, what exactly you expect to match, etc. But since I am aware of the format and this is an easy one...
CVE-\d{4}-\d{4,7}
This matches first CVE- then a 4-digit number for the year identifier and then a 4 to 7 digit number to identify the vulnerability as per the new standard.
See this in action here.
If you need an exact match without any syntax or logic violations, you can try this:
^(CVE-(1999|2\d{3})-(0\d{2}[1-9]|[1-9]\d{3,}))$
You can run this against the test data supplied by MITRE here to test your code or test it online here.
I will add my two cents to the accepted answer. Incase we want to detect case insensitive "CVE" we can following regex
r'(?i)\bcve\-\d{4}-\d{4,7}'

Combining regex groups to one group or exlude a character from a match

I have this string, and I need to get the datetime out of it by using regex. I have little to no experience with regex and am stuck.
As an example, take this string: Vic-nc_20150406_0100
I want to get the following result: 201504060100
How am I to accomplish this? So far I've come up with this expression: ([0-9]{8})_([0-9]{4}), although the result is two groups (20150404 and 0100).
Another expression I've come up with is ([0-9]{8}_[0-9]{4}), now the result is 20150406_0100.
I either need to combine the groups or filter out the [_] somehow. Can anybody help me out?
Thanks in advance!
If you want to replace, then just take the value of two groups.
Find (\d{8})_(\d{4})
Replace \1\2 or $1$2 based on your program language.