How to create capturing groups in regular expressions? [duplicate] - regex

This question already has answers here:
How do I regex match with grouping with unknown number of groups
(6 answers)
Closed 3 years ago.
I'm trying to use Regex to classify groups of data dependent on how many sectors are within an array. E.g.:
group = One journey
group|group = Two journeys
group|group|group = Three journeys
Could someone tell me the best practice way to do this please?
EDIT: Apologies but I'm pretty new to RegEx and still trying to work things out. I don't know which language I'm using but the tool I'm building these into is Adobe Analytics - using the Classification Rule Builder.
Also, this question has been marked as duplicate but I can't say I found the other thread particularly helpful.
I've also tried experimenting using Regex101 but still can't get my head around this. Thanks.

For such a case you need to capture what you want to match inside some block that would depend on the language you are using. For example, if you are using Python you can use:
(\w+)
This regex will allow you to capture and count every repetition of word characters, that is [a-zA-Z0-9_], that will be able to capture all the text you have between pipes.
By the way, in order to test your regex and to do some basic training and trial-error approach you can use tools like this one.

Related

Replacing a certain number of characters after a match in regular expression [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I want to find any instance of
\/Network\/Cambric Corporate Center\/
and replace the next 8 characters with nothing.
So something like this
\/Network\/Cambric Corporate Center\/1164.DEV1164
Turns into this
\/Network\/Cambric Corporate Center\/1164
I'm working with a baked in replace regular expression visual programming piece, and this is my first post to here ever so please ask if more info is needed. I basically just need it to look like this
\/Network\/Cambric Corporate Center\/1164
if there is another solution without having to use replace
It is for a frequently updated mass source of data that I need to edit to make more compatible with arrays
Try (\/Network\/Cambric Corporate Center\/).{8} and replace with $1 to keep the first group but not anything else.
Here's the regex with the replacement: https://regex101.com/r/F4Y4VD/1

How to create a regexp that ends in a line break? [duplicate]

This question already has answers here:
Differences between`[.]` vs `.` in regex
(2 answers)
Closed 3 years ago.
I have (looong) inputs that are lists of sentences/bullets like the following:
Broker and broker´s fees: 不適合
Specific purpose or use for the present acquisition or disposal: 因應內部管理需要,調整投資架構
Other issues to be disclosed: 無
In order to "translate" the Chinese text, I want to create objects, in a regexp fashion, so I can later transform the second captured group according to what it says.
I thought something like the following would work:
Specific_purpose = /(Specific purpose or use for the present acquisition or disposal: )([.]+)(\n)/
Other_issues = /(Other issues to be disclosed: )([.]+)(\n)/
i.e. this regexps should be composed of captured group 1 (the title in English), captured group 2 (the section in Chinese) and the captured group 3, i.e. the new line that indicates where the object ends.
Still, the code does not work and I cannot even get Ruby to find the needed objects in the input. If, for example, I add:
if input.include? Specific_purpose.to_s
puts "Yes, I found such bullet "
else
puts "No, there is no such bullet"
end
I keep getting "No, there is no such bullet", no matter how I rewrite the regexp.
Am I doing something wrong here? How do I create a regexp that will match everything until the line break?
As your line contains a colon which also acts as a separator for english and Chinese text, you can use this regex to capture English in group1 and Chinese in group2 to capture the text respectively. Try using this regex,
(.+):\s+(.+)
Demo
Let me know if you face any issues.

Regex, take last match before suffix [duplicate]

This question already has answers here:
Tempered Greedy Token - What is different about placing the dot before the negative lookahead?
(3 answers)
Closed 4 years ago.
I know this is going to sound like the kind of question that's been asked hundreds of times. But I've been searching for over an hour and none of the solution I found worked in my case.
I have many different numbers of the form
\d*'?\d+\.\d\d
An example of string I work with would be
The base item costs 1'245.48, the tax is of 18.45 and the bonus of 250.00, the total price is of 1'013.93. In case of trouble, contact our e-mail. Bank account 784.45
I want to get ONLY the last match corresponding to my regex before e-mail, i.e 1'013.93. I would like to use only regex, no extra python, javascript or anything
I have tried code inspired by this Regex Last occurrence?, this How to capture only last match in Regex, this Find Last Occurrence of Regex Word, and many other expressions of my own, but so far there always seems to be one piece missing
For example, after successfully selecting the very last number with (\d*'?\d+\.\d\d)(?!.*\d*'?\d+\.\d\d), I tried (\d*'?\d+\.\d\d)(?!.*\d*'?\d+\.\d\d)(?=e-mail), which does not match anything.
Any insights?
You could try this:
((\d+')?\d+(\.\d+)?)(?=[^\d]+e-mail)
The first group matches the number you want. From regex101.com:
Something like this with an extra number format check:
((\d{1,3}')*(\d{1,3})\.\d{2})(?=\D+e-mail)
Demo

Remove first char from string - Regex [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have started using Workflow on iOS to help speed up tasks at work. One of those is entering delivery records into the computer (via the iPad barcode scan function) instead of manually writting down the ref code and then typing it in.
Workflow has a "Replace Text" function that can be used with regexs to strip out characters etc.
I have managed to find a regex to get rid of the last digit in a scan (a checksum digit, always a capital letter).
The regex is simple.
.{0}-$.
This goes in the "Find Text" field. The "Replace With" is left empty. It works wonderfully.
How can adapt this to work with other scan types with other scan types where I want to specically get rid of the FIRST character only? I've searched the forums but can only find long and difficult to interpret regexes that I am sure won't do what I am trying to achive, something simple by comparison.
An example is of what I mean is to convert "Y300006944" to "300006944"
You can use the following regex:
^.(.*)$
with a backreference $1 that you can use as replacement.
Good luck.
Thanks to those who contributed somehting useful :)
I got the it resolved by using the "Split Text" function in Workflow for iOS.
I gave it the command to split based on a customer char, "Y" in this case. It's enough in this simple case.

Capture everything after one word [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 6 years ago.
I am trying to make a regular expression capture any words in the specific line after the word Attachment:
This question is for work, so it is not a homework or test question. I took the paragraph below as an example from www.regular-expressions.info. I did not major in computers but Psychology so this is completely foreign to me. I've read the manuals for the last two days, and because this is going over my head, I don't know how to begin.
I have a task which involves me linking the attachments to a specific file with the same name saved in a folder (at least 500 attachments) on Adobe PDF. What I did before was to manually select the words and link it to a specific file in a folder, but it is tedious to do when they can go up to 500 attachments.
I was aware of an application plug-in called EVERMAP that you can download for Adobe to automatically link specific words to a specific file in a folder. However, it requires me to use regular expressions which again, I don't know how to use.
I will bold the words I want to capture in the paragraph below.
The repetition operator manual expand the match as far as they, and only come back if they must to satisfy the remainder.
Attachment: The repetition operator manual
The asterisk or star tells the engine to attempt to match the preceding token zero or more times. The plus tells the engine to attempt to match the preceding token once or more.
Attachment: Asterisk and stars engine
Attachment: (.+) should work in your case unless there are other exceptions to this rule. The regex simply tells the parser to capture 1 or more character after the word Attachment:. See here for the sample
Like #Kevin said, the Regex is simple. Use Attachment: (.+).
Maybe you are confused on how to use Regex. I don't know about the Evermap plugin, but you can copy all the text from the PDF to Sublime Text (text editor to open .txt but with a lot of features) and do Regex part there. And then, since you are not a programmer, you should remove other irrelevant data. So the Regex will be:
`^\s*Attachment:\s*(.+)$|^(?!Attachment:).+$`
And replace it with:
`\1`
\1 is a variable containing group value caught in ()
In Sublime Text find Find and Replace, then apply the Regex there. Don't forget to turn on the Regex mode.