Regular expression - Get specific part of string [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have the following sentence:
b.g The big bag of bits was bugged.
How can I exclude the b.g from it by using a regular expression?
I am sure I need a negative lookahead but I cannot get it right yet.
Something like
^(?!b\.g)

I would do it this way:
[^\S].*
What [^\S] does is basically skip any character until it reaches the first space. then start capturing. No need in this case for negative or positing Lookbehind.
Demo: regex101
If you prefer to do it with positive Lookbehind, you can do it this way
(?<=b\.g).*
Demo: regex101

sed 's/^...//' strips the first 3 characters, "b.g", but I doubt that's what you're really asking. Your ^ anchor appears to be a red herring.
You already have correct escaping for . period, just stick with that:
sed 's/b\.g//'
Python's positive lookbehind ?<= may be what you are trying to find words to express:
>>> m = re.search(r'(?<=b\.g)(.*)', 'b.g The big bag of bits was bugged.')
>>> print(m.group(1))
The big bag of bits was bugged.

In python you could do something like this:
import re
w = 'b.g The big bag of bits was bugged.'
print w
d = re.compile(r'^b.g\s')
a = re.sub(d, '', w)
print a

Related

How to use regexp to identify the number of hydrogens in a chemical formula? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Which expression should I use to identify the number of hydrogen atoms in a chemical formula?
For example:
C40H51N11O19 - 51 hydrogens
C2HO - 1 hydrogen
CO2 - no hydrogens (empty)
Any suggestions?
Thanks!
Cheers!
You can start using this regex :
H\d*
H -> match literaly the H caracter
d* -> match 0 to N time a digit
see exemple and try yourself other regex at :
https://regex101.com/r/vdvH8S/2
But regex wont convert for you the result, regex only do lookup.
You need to process your result saying :
H with a number : extract the number
only H : 1
no match : 0
A Regex Expression that will match H with follwowing digits would be:
/H(\d+)/g
The 'H' is a literal charecter match to the H in the given chemical
formula
() declares a capture group, so you cna then grab the captured group without the H in whatever programming language you are using
\d will match any digit along with the + modifier that matches 1 or more
There is no catch all scenarios here, you might be best using something other than a regex.

ELO credit card regular expression [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I need a regular expression for the elo credit card which should allow only first 6 digits are mentioned below. The total length will be 16 and all 16 should be numbers only. Alphabets are not allow.
Allowed prefixes:
401178, 401179, 431274, 438935, 451416, 457393, 457631, 457632,
504175, 627780, 636297, 636368, 655000, 655001, 651652, 651653,
651654, 650485, 650486, 650487, 650488, 506699 to 506778 and 509000
to 509999
Use an alternation, with a bit of extra work to cover the two numerical ranges you have.
^(?:401178|401179|431274|438935|451416|457393|457631|457632|504175|627780|636297|636368|
655000|655001|651652|651653|651654|650485|650486|650487|650488|506699|5067[0-6][0-9]|
50677[0-8]|509\d{3})\d{10}$
Here is how we handle the two ranges:
506699 to 506778
506699| matches 506699
5067[0-6][0-9]| matches 506700 through and including 506769
50677[0-8] matches 506770 through and including 506778
509000 to 509999
509\d{3} matches 509000 through and including 509999
i.e. 509 followed by any 3 digits
Demo here:
Regex101
You can try this:
^(?:40117[8-9]|431274|438935|451416|457393|45763[1-2]|504175
|627780|636297|636368|65500[0-1]|65165[2-4]|65048[5-8]|506699
|5067[0-6]\d|50677[0-8]|509\d{3})\d{10}$
Demo
Simple Explanation
^ Start of the line
( start of group
?: will not store it in the group
40117[8-9] means 40117 followed by anything between 8 to 9 ( same
applies for similars)
| means OR
5067[0-6]\d means 5067 + a digit between 0 to 6 + a single digit
(any)
\d{10} means it will see if the next 10 characters are digits (after previous valid 6 digits)
$ end of the line
Basically, you need alternation with some range operators to shorten the regex.
The most tricky part is to define the range 506699 to 506778, which can be represented as 506699|5067[06]\d|50677[0-8].
(?x)^(?:
40117[89]|431274|438935|451416|457393|457631|457632|504175
|627780|636297|636368|65500[01]|65165[234]|65048[5-8]
|506699|5067[06]\d|50677[0-8]
|509\d{3}
)\d{10}$
Demo: https://regex101.com/r/BbnHeQ/2
NB: the (?x) is used to allow for whitespace characters in the regex, which simplifies reading for log expressions.

Using Regex to separate Asian market numerical stock tickers [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Pulling some trading data and having issues using regex to separate tickers and percentage of holding
Inputs
"94324.13%"
"007007.13%"
"0354202.91%"
Desired Output
"9432|4.13%" (ticker is 4 numbers)
"00700|7.13%" (ticker is 5 numbers)
"035420|2.91%" (ticker is 6 numbers)
Main issue is that the number of digits the ticker is may vary anywhere from 4-6 digits.
With the given information it is not possible to have a 100% accurate split of the two parts. For instance:
123410.05%
... could split in either of the following two:
1234|10.05%
12341|0.05%
And if percentages might not have a zero before the decimal point, then this would also be a possible split:
123410|.05%
The following regex replace will assume the percentage has one digit before the decimal point, and possibly a minus sign:
Find:
/^(\d{4,6})(\-?\d.*)$/gm
Replace:
\1|\2
See it on regex101.com.
I'd like to try this regex
(\d{4,6})(\d+\.\d{1,2}%)
Here is full demo:
Python:
data = "007007.13%"
rx = re.compile(r"(\d{4,6})(\d+\.\d{1,2}%)")
formated_text = rx.sub(r'\1|\2', data)
print formated_text
#it will print
00700|7.13%
You can look demo in python here
Javascript:
var re = /(\d{4,6})(\d+\.\d{1,2}%)/g;
var str = '"007007.13%"';
var subst = '$1|$2';
var result = str.replace(re, subs);
Demo in Javascript

Can I insert variables in Regular Expression? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to use regex so as to obtain specific information from the text and I give an example with a semi-pseudocode ~ you can also reply me with semi-pseudocode:
list=["orange","green","grey"]
text= "The Orange is orange"
for word in list:
if word == re.compile(r'word, text):
capture Orange in order to have the noun
Beware! My question focuses whether there is a possibility to use variables (as word up above) so as to make a loop and see if there are equal words in an text based on a list.
Do not focus on how to capture the Orange.
I think Biffen has the right idea, you're in a world of pain if you're using this for POS tagging. Anyway, this allows you to match words in your text variable
for word in list:
if word in text:
# Do what you want with word
If you wanted to use regex then you can build patterns from strings, use parentheses to capture. Then use group() to access captured patterns
for word in list:
pattern = re.compile(".*(" + word + ").*")
m = re.match(pattern, text)
if m:
print(m.group(1))

Regex to accept numbers and alphabets in a string but not only alphabets [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I need a regular expression that accepts alphabets and numbers in a text field . If user enters only alphabets or if user enters only numbers then it is not accepted. How can i do this.?
You could use the regex like below:
/^(?=.*\d)(?=.*[a-z])[\da-z]+$/i
Javascript example:
var regex = /^(?=.*\d)(?=.*[a-z])[\da-z]+$/i;
console.log(regex.test('aaa')); // false
console.log(regex.test('111')); // false
console.log(regex.test('aaa111')); // true
console.log(regex.test('111aaa')); // true
I think someone will write a better regex then this one, but it works:
^([a-zA-Z]+\d+[a-zA-Z\d]*)|(\d+[a-zA-Z]+[a-zA-Z\d]*)$
(\d+[A-Za-z]+)|([A-Za-z]+\d+) should work for you.
"\d" represents a digit which can be out of [0-9].
Works perfectly in Java. You can check specifically for any other language or editor.
Pass-
123aaa
aaa123
Rejects-
123
aaa
(?i)(?=[a-z0-9]*[0-9])(?=[a-z0-9]*[a-z])[a-z0-9]*
Will match a string only if it has at least one number and one letter.
Matches:
abc1
123a
Does not match:
abc
123