I'd like a solution to retrieve a text in a string in a c# script
the fomat of the text is 4 digits then _ and 1 to 2 digits
test_p_2008_1_Annexe_1_prix
test_p_2008_100_Annexe_1_prix
test_p_2008_1
test_p_2008_100
For this 4 examples, i need to get
2008_1
2008_100
2008_1
2008_100
Maybe use a regex buit i'm not enought good with this
I think you're trying to retrieve text which are in 4 digits then _ and 1 to 3 digits format.
#"\d{4}_\d{1,3}"
Code:
String input = #"test_p_2008_1_Annexe_1_prix
test_p_2008_100_Annexe_1_prix
test_p_2008_1
test_p_2008_100";
Regex rgx = new Regex(#"\d{4}_\d{1,3}");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE
Related
Having trouble with writing code that will pick up the pattern I want. I want to be able to grab the first number that comes up after the words 5 Months in the .txt file that I have. If there are any other characters A-Z, parentheses, $, % etc. I want to ignore them. I keep getting an error code with VBA such as the INVALID PROCEDURE CALL OR ARGUMENT.
Currently, I have code that looks like this:
Dim reg4 As Object: Set reg4 = CreateObject("vbscript.regexp")
reg4.Pattern = "5 Months\s*([\d+]\.[\d+])\s*"
Dim MCS As Object
Set MCS = reg4.Execute(myText)
**Dim Months5 As String: Months5 = MCS(0).submatches(0)** *the error stems from this line*
where mytext is a string that consists of content from a text file. My main problem is that this text file is not always in a standardized format, so when I want to extract the first number after "5 Months" it gives me that error.
The text file could look like:
EXAMPLE 1
5 Months
($) (%) (Months) (%) (%) (%) ($) (Months)
0.00 0.0000 0.000
OR
EXAMPLE 2
5 Months
0.00
0.000
0.000
In both cases, I would ideally be able to extract that first number "0.00" in its entire form, while ignoring any other characters such as (%) or ($) as shown in example 1.
I would like to ask if anyone has any suggestions on how to rewrite the pattern statement so it will be able to pick up the first numeric instance along with the numbers after its decimal point?
Many thanks in advance!
Your regex does not match the strings you showed. You can use
\b5 Months[\s\S]*?(\d+(?:\.\d+)?)
See the regex demo. Details:
\b - a word boundary
5 Months - a literal text
[\s\S]*? - any 0 or more chars, as few as possible
(\d+(?:\.\d+)?) - Capturing group 1: one or more digits followed with an optional sequence of a . and one or more digits.
Test run in VBA:
Sub TestFn()
Dim reg4 As Object: Set reg4 = CreateObject("vbscript.regexp")
reg4.Pattern = "\b5 Months[\s\S]*?(\d+(?:\.\d+)?)"
Dim myText As String
myText = "5 Months" & vbCrLf & vbCrLf & "0.00"
Dim MCS As Object
Set MCS = reg4.Execute(myText)
Dim Months5 As String: Months5 = MCS(0).SubMatches(0)
Debug.Print (Months5)
End Sub
import pandas as pd
df= pd.DataFrame({'Data':['123456A122 119999 This 1234522261 1A1619 BL171111 A-1-24',
'134456 dont 12-23-34-45-5-6 Z112 NOT 01-22-2001',
'mix: 1A25629Q88 or A13B ok'],
'IDs': ['A11','B22','C33'],
})
I have the following df as seen above. I am using the following to get only consequtive digits
reg = r'((?:[\d]-?){6,})'
df['new'] = df['Data'].str.findall(reg)
Data IDs new
0 [123456,119999, 1234522261, 171111]
1 [134456, 12-23-34-45-5-6, 01-22-2001]
2 []
This picks up many things I dont want like 171111 from BL171111 and 123456 from 123456A122 etc
I would like the following output which only picks up 6 consequtive digits
Data IDs new
0 [119999]
1 [134456]
2 []
How do I change my regex to so?
reg = r'((?:[\d]-?){6,})'
Change your regex to use word boundaries (\b) and limit the number of digits to exactly 6, like this:
reg = r'(\b\d{6}\b)'
This looks for a word boundary, 6 numbers, and another word boundary.
Here's a demo.
HI I have a question regarding REGEX.
This sounds very simple and I remember doing it but somehow it got deleted and I am finding it hard to get it back.
I want to extract group of numbers from one line.
If the count of digits > 3 - select that.
EG:
ga3rdparty/phpMyAdmin/i0ndex.php?&t0oken=abf540063shakk
This line can be different everytime but there will be only 1 group of digits with more than 2 digits.
OUTPUT: 540063
Thank you in advance
You can use \d{3,} where 3 is the minimum number of digits. You an take a look at the following python code
import re
var= "ga3rdparty/phpMyAdmin/i0ndex.php?&t0oken=abf540063shakk"
pattern = re.compile(r'\d{3,}')
for match in pattern.findall(ver):
print(match)
I have a C# project that requires me to capture a string value from a html stream.
The pattern I need to match is:
XXXX-abc
Where:
XXXX = a 4 character integer
followed by a -
abc = a 3 character alphanumeric.
I looked at txt2re.com and got
string re1="(\\d)"; // Any Single Digit 1
string re2="(\\d)"; // Any Single Digit 2
string re3="(\\d)"; // Any Single Digit 3
string re4="(\\d)"; // Any Single Digit 4
string re5="(-)"; // Any Single Character 1
string re6="((?:[a-z][a-z]*[0-9]+[a-z0-9]*))"; // Alphanum 1
The thing I am having difficulty with is combining it into one expression instead of 6.
I know I can do:
Regex r = new Regex(re1+re2+re3+re4+re5+re6,RegexOptions.IgnoreCase|RegexOptions.Singleline);
However, my OCD cringes at this method :)
You can use the expresion \d{4}-\w{3} 4 digits follow by - follow by 3 alphanumerical characters. Here is a good site to test and learn about the regular expresion.
I have the following string which contains reviews for products, I want to move sentences like 1 of 3 and 4 of 13 into new lines
Input string
The mapping features have a lot of inaccuracie 1 of 3
am a little disappointed in the new 4S. 4 of 13
Output string
The mapping features have a lot of inaccuracies
1 of 3
am a little disappointed in the new 4S
4 of 13
I was trying Regex.Replace because it changes all occurrences in the string
I located the string using #"\d+ of \d+"
but how can I keep the variable number in the replacement text? Or can you suggest a different method?
You need to capture the match in order to be able to use it for replacement:
#"(\d+ of \d+)"
To replace this value, use $1.
Something like:
Regex.Replace(input,
#"(\d+ of \d+)",
string.Format(#"{0}$1{0}", Environment.NewLine))