How to find all symbols between characters

How to find all symbols between characters - regex

I need to get all - characters between ###.
Input string: ### qwerty-qwerty-qwerty-qwerty - - - ###
(?<=###)\s?([\-]*)\s?(?=###)
Thanks in advance.
http://regex101.com/r/jL9lZ9/1

You could try the below regex to match - symbols which are present within ###,
(?:^(?:(?!###).)*(?=###.*?###)|(?<=###)(?:(?!###).)*$)(*SKIP)(*F)|-
DEMO

(?!.*?###.*?###.*?)(?=.*?###)-
This works as well.
See Demo:
http://regex101.com/r/jL9lZ9/4

This ones a 2 step one, slightly different but more understandable. Can use this if we are try to extract a pattern(-) between a specific pattern(###).
text="- qwe--### -- - qwerty- --## -qwerty --- ###- qwerty-- qw- - ### - rty--"
Note that there is a double hash(##) here, now assuming that we want -s between triple hashes(###) only
use this to extract required text (?<=###)[^#]*(?=###)
After that its just this - to extract
you can replace the boundary patterns and search patterns as required.

Related

can regex be used to index/slice parts of string?

So I have a list of serial numbers in the following format:
Serial Number: CN073GTT74445714892L
I was wondering if regex can be used to extract just the last 6 chars?
So in this case, it is 14892L
forget to mention, there is other unrelated text in the document, so how would i make so the match pattern is always after "serial Number: " ?
EDIT - this worked (?<=\s.{29}).{6}$

You can do it with a regex:
.{6}$
Demo
But you can do it without it, and it's an advisable solution. E.g. in Ruby:
"CN073GTT74445714892L"[-6..-1]
in Python:
In [4]: "CN073GTT74445714892L"[-6:]
Out[4]: '14892L'

Regex is ideally used to identify patterns. If it's only the last 6 digits you're interested in, then a normal string manipulation will work too.
e.g in Python, you could use:
str = "CN073GTT74445714892L"
str[-6:]

HiveQL - extract regular expression that matches a pattern at the end of the string

This might be a silly question but I can't seem to overcome this myself -
I have a field with strings, which sometime end with 3 numbers separated by commas, for example
- 2353535.123213.124
- data.2354234.1324.1314
- data.old-24234.2341.4325
and sometimes not
- aaaa.53535
- data.old-3521
- data.AFG34fsaf34
Whenever the first case occurs, I need to extract the 3-numbers pattern from the end of the string. Meaning:
- 2353535.123213.124 -> 2353535.123213.124
- data.2354234.1324.1314 -> 2354234.1324.1314
- data.old-24234.2341.4325 -> 24234.2341.4325
- aaaa.53535 -> Do nothing
Is that possible?
If not through hiveQL (although this is preferable), even a java regular expression extraction would be helpful (to use in a custom UDF).

\\d+(?:\\.\\d+){2}$
You can use this java expression.See demo

How to group provided string correctly?

I have the following regex:
^([A-Za-z]{2,3}\d{6}|\d{5}|\d{3})((\d{3})?)(\d{2}|\d{3}|\d{6})(\d{2}|\d{3})$
I use this regex to match different, yet similar strings:
# MOR644-004-007-001
MOR644004007001 # string provided
# VUF00101-050-08-01
VUF001010500801 # string provided
# MF001317-077944-01
MF00131707794401 # string provided
These strings need to match/group as it is at the top of the strings, however my problem is that it is not grouping it correctly
The first string: MOR644004007001 is grouped: (MOR644004) (007) (001) which should be (MOR644) (004) (007) (001)
The second string: VUF001010500801 is grouped (VUF001010) (500) (801) which should be (VUF00101) (050) (08) (01)
How can I change ([A-Za-z]{2,3}\d{6}|\d{5}|\d{3})((\d{3})?) so that it would group the provided string correctly?

I am not sure that you can do what you want to.
Let's consider the first two strings:
# MOR644-004-007-001
MOR644004007001 # string provided
# VUF00101-050-08-01
VUF001010500801 # string provided
Now, both the strings are composed of 3 chars followed by 12 digits. Thus, given a regex R, if R does not depend on particular (sequences of) characters and on particular (sequences of) digits (i.e., it presents [A-Za-z] and \d but does not present, let's say, MO and 0070), then it will match both the string in the same way.
So, if you want to operate a different matching, then you need to look at the particular occurrence of certain characters or digits. We need more data from you in order to give you an aswer.
Finally, I suggest you to take a look at this tool:
http://regex.inginf.units.it/ (demo: http://regex.inginf.units.it/demo.html). It is a research project that automatically generates a regex given (many) examples of extraction. I warmly suggest you to try it, especially if you know that an underlying pattern is present in your case for sure (i.e. strings beginning with VUF must be matched differently from strings beginning with MOR) but you are unable to find it. Again, you will need to provide many examples to the engine. Needles to say, if a generic pattern does not exist, then the tool won't find it ;)

Considering your comment to Serv I'd say the (only?) solution is to have one regex for each possibility, like -
MOR(\d{3})(\d{3})(\d{3})(\d{3})|VUF(\d{5})(\d{3})(\d{2})(\d{2})|MF(\d{6})(\d{6})(\d{2})
and then use the execution environment (JS/php/python - you haven't provided which one) to piece the parts together.
See example on regex101 here. Note that substitution, only as an example, matches only the second string.
Regards

Take a look at this. I have used what's called as a named group. As pointed out earlier by others, it's better to have one regex code for each string. I have shown here for the first string, MOR644004007001. Easily you can expand for other two strings:
import re
# MOR644-004-007-001
MOR = "MOR644004007001" # string provided
# VUF00101-050-08-01
VUF = "VUF001010500801" # string provided
# MF001317-077944-01
MF = "MF00131707794401" # string provided
MORcompile = re.compile(r'(?P<first>\w{,6})(?P<second>\d{,3})(?P<third>\d{,3})(?P<fourth>\d{,3})')
MORsearch = MORcompile.search(MOR.strip())
print MORsearch.group('first')
print MORsearch.group('second')
print MORsearch.group('third')
print MORsearch.group('fourth')
MOR644
004
007
001

RegEx to verify: abc123(30x2) and variations there of

I'm using to develop a regex in order to verify a pattern that will match the following:
abc123
Ab3TF56G
BD356-2
abc123(3x4)
Ab3TF56G(24x37)
BD356-2(105x04)
abc123 (3x4)
Ab3TF56G (24x37)
BD356-2 (105x04)
abc123(3x4x10)
Ab3TF56G(24x37x3)
BD356-2(105x04x14)
abc123 (3x4x10)
Ab3TF56G (24x37x3)
BD356-2 (105x04x14)
I'm admittedly terrible at RegEx, but am following the guide at: www.regexr.com, and have come up with this so far:
([A-Za-z0-9])\((\d[x^)]\d+)\)+
Unfortunately, it stops working when I start trying to account for the possible dash and parathentises.
• The alpha-numeric set can be any length
• That sequence can, but does not require a dash followed by an integer
• Which can also be followed by a open & close parentheses with integers separated by the "x" character (basically dimensions)
Any help would be much appreciated.
EDIT
In addition, the following should fail:
abc123 (3x4x10)shs
sdlk234(3x)
sdlk234(3x0)
sdlk234-2 (3x)333
Ab3T F56G

Try this:
([a-zA-Z0-9-]+)\s?(\([\dx]+\))?
See it working here: https://regex101.com/r/pU9oR4/1
Here is a graphical representation: https://www.debuggex.com/r/uVGo8mrIUYhXHxjP
EDIT
After your shouldn't match examples it turns out a bit more harder, so your new pattern should be:
^([a-zA-Z0-9-]+\b)([\s\d-])?(\((?:(?!0)[\d]+)((x(?:(?!0\b)[\d]+))(x(?:(?!0\b)[\d]+))?)\))?$
edited again
See it working here: https://www.debuggex.com/r/dxPPbPw0mUKQPRWg
I also add the validation so it didn't match:
sdlk234(3x0x0)
sdlk234(3x1x0)
sdlk234(0x1x1)
Following your logic of dimensions

101 Regexp Demo
^[\w-]+\s*(\((?!0\b)\d+(x(?!0\b)\d+)+\))?$
(?!0\b): Negative Lookahead ,make sure that after it can't be 0\b
\b:assert position at a word boundary (^\w|\w$|\W\w|\w\W)

add character before first word in line

I want to add a minus sign "-" infront of the first word in a line on the editor VIM. The lines contains spaces for indentation. The indentation shall not be touched. E.g
As Is
list point 1
sub list point 2
and so on...
I want
- list point 1
- sub list point 2
- and so on...
I can find the first word, but i struggle with replacing it in the correct way.
^\s*\w
in Vim
/^\s*\w
But in the replacement I always remove the complete found part....
:s/^\s*\w/- \w/
Which leads to
- ist point 1
- ub list point 2
- nd so on...

Use & which is replaced with the matched string:
:%s/\w/- &

I'm late to the party but:
:%norm! I- <CR>
And another one with :s:
:%s/^\s*/&- /

An alternative to falsetrue's answer: You can capture the first word character and print it out along with the leading -:
%s/\(\w\)/- \1/

:normal cmd may help too:
:%norm! wi-
note that after - there is a space.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to find all symbols between characters - regex

I need to get all - characters between ###. Input string: ### qwerty-qwerty-qwerty-qwerty - - - ### (?<=###)\s?([\-]*)\s?(?=###) Thanks in advance. http://regex101.com/r/jL9lZ9/1

You could try the below regex to match - symbols which are present within ###, (?:^(?:(?!###).)(?=###.?###)|(?<=###)(?:(?!###).)$)(SKIP)(*F)|- DEMO

(?!.?###.?###.?)(?=.?###)- This works as well. See Demo: http://regex101.com/r/jL9lZ9/4

Related

can regex be used to index/slice parts of string?

HiveQL - extract regular expression that matches a pattern at the end of the string

How to group provided string correctly?

RegEx to verify: abc123(30x2) and variations there of

add character before first word in line

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to find all symbols between characters - regex

I need to get all - characters between ###. Input string: ### qwerty-qwerty-qwerty-qwerty - - - ### (?<=###)\s?([\-]*)\s?(?=###) Thanks in advance. http://regex101.com/r/jL9lZ9/1

You could try the below regex to match - symbols which are present within ###, (?:^(?:(?!###).)*(?=###.*?###)|(?<=###)(?:(?!###).)*$)(*SKIP)(*F)|- DEMO

(?!.*?###.*?###.*?)(?=.*?###)- This works as well. See Demo: http://regex101.com/r/jL9lZ9/4

Related

can regex be used to index/slice parts of string?

HiveQL - extract regular expression that matches a pattern at the end of the string

How to group provided string correctly?

RegEx to verify: abc123(30x2) and variations there of

add character before first word in line

Categories

Resources

You could try the below regex to match - symbols which are present within ###, (?:^(?:(?!###).)(?=###.?###)|(?<=###)(?:(?!###).)$)(SKIP)(*F)|- DEMO

(?!.?###.?###.?)(?=.?###)- This works as well. See Demo: http://regex101.com/r/jL9lZ9/4