Regex extract only specific character and EOL - regex

I am trying to extract some text using regex.
I want to extract only those line that contains "pour 1e" or "Pour 1€" and nothing more.
The regex must be incase-sensitive.
here is my regex that don't work like I want:
/Pour ([0-9.,])(€|e)/im
and this is my text:
Tesseract Open Source OCR Engine v3.01 with Leptonica
CARDEURS
Horaire dejour de flhllll 5 19h00
pour 1€
pour 1€ supplémentaire
pour 1€ supplémentaire
pour 1€ supplémentaire
pour 1€ supplémentaire
par€ supplémentaire
Horaire de nuit de 19h00 5 flhllll
pour 1,50€
pour 1€ supplémentaire + 300 minutes
pour 1€ supplémentaire + 420 minutes
La joumée de 24 heures
35 minutes
+ 30 minutes
+ 35 minutes
+ 40 minutes
+ 45 minutes
+ 50 minutes
60 minutes
15€
Tesseract Open Source OCR Engine v3.01 with Leptonica
TARIFS
PARKING CARNOT
Homim de juur de 8:00 3 19:00 H01-aim de null de 19:00 5 8:00
mains d‘ ggg heme : G1-atuit moins d‘ ggg heure : Gmtuil
Pour 1e
Pour 1e supplémenlaire
Pour 1e suppléulentaire
Pour 1e supplémmmm
Pour 1e supplémmmm
Par e supplémenlaiI€
40 minutes
+ 40 minutes
+ 45 minutes
+ 50 minutes
+ 55 minutes
+ 55 minules
Pour 1e so nzinules
Pour 1e supplémenlaiI€ + 300 minllles
Pour 1e 5upplémenlai1Q + 420 minules
La journée a
e 24 heums 15€

You need to anchor the expression with ^ and $ which match beginning/end of line when /m is active. For example:
/^pour [0-9]+[0-9,.]*[e€]$/im

use square brackets [] to specify a group of characters to match, caret ^ to match the beginning of the line and dollar sign $ to match the end of the line. Depending on which regex implementation you are using you may be able to pass the i flag to make it case-insensitive
/^Pour 1[€e]$/i
Or handle case explicitly with character groups
/^[Pp][Oo][Uu][Rr] 1[€e]$/
For matching repetitions, use * to match 0 or more of the previous character, + to match 1 or more, and ? to match 0 or 1.
In place of the 1 in the previous, you could use
[0-9.]+ to match any 1 or more digits or decimal points
[0-9]+\.?[0-9]* to match at least 1 digit follow by an optional decimal point and more digits
[0-9]+[0-9,]*\.?[0-9]* to match at least 1 digit, optionally more digits and commas, followed by an optional decimal point and more digits
You can also use curly braces {} to explicitly specify a number of repetitions (these must be escaped with a backslash \ in some regex engines)
[0-9]{1,3} would match 1,2 or 3 digits
[0-9]{3} would match exactly 3 digits
You can use parenthesis () to group a part of a regex pattern for backreference or repetition.
So to match a line that starts with "Pour " followed by 1 or more digits, then an optional comma or decimal point with 2 digits, then the euro symbol or letter e, and any number of trailing spaces, but no other characters until end of line, and be case-insensitive:
/^Pour [0-9]+([,.][0-9][0-9])?[€e][ ]*$/i

Related

how to find the second number in a string with Regex expression

I want a regular expression that finds the last match.
the result that I want is "103-220898-44"
The expression that I'm using is ([^\d]|^)\d{3}-\d{6}-\d{2}([^\d]|$). This doesn't work because it matches the first result "100-520006-90" and I want the last "103-220898-44"
Example A
Transferencia exitosa
Comprobante No. 0000065600
26 May 2022 - 03:32 p.m.
Producto origen
Cuenta de Ahorro
Ahorros
100-520006-90
Producto destino
nameeee nene
Ahorros / ala
103-220898-44
Valor enviado
$ 1.000.00
/.*(\d{3}-\d{6}-\d{2})(?:[^\d]|$)/gs
If you add .* to the beginning of your regex, it only captures the last one since it's greedy. Also, you need to use the single-line regex flag (s) to capture new lines by using .*.
Note: I replaced some (...) strings with (?:...) since their aim is grouping, not capturing.
Demo: https://regex101.com/r/fygL1X/2
const regex = new RegExp('.*(\\d{3}-\\d{6}-\\d{2})(?:[^\\d]|$)', 'gs')
const str = `ransferencia exitosa
Comprobante No. 0000065600
26 May 2022 - 03:32 p.m.
Producto origen
Cuenta de Ahorro
Ahorros
100-520006-90
Producto destino
nameeee nene
Ahorros / ala
103-220898-44
Valor enviado
\$ 1.000.00`;
let m;
m = regex.exec(str)
console.log(m[1])

Strat extracting after a repeated string in regex

How to extract string_with_letters_and_special_caracters in this sequence ?
sequence_one \n sequence_two \n sequence_three \n string_with_letters_and_special_caracters 0000000 \n sequence_four
I can't manage to beginning after the last \n preceding string_with_letters_and_special_caracters.
(Here \n is the repeated string.)
For example \\n(\D+)\d+ extract from the first \n.
Example : I want to extract - Dimensions : L. or Dimensions here, which precedes an expression I have a pattern for :
https://regex101.com/r/jLqxxo/1
Thank you!
You seem to want
-\s*Dimensions\s*:\s*L\.\s*(\d+)\D+(\d+)\D+(\d+)
See the regex demo and the Python demo:
import re
s=r'''FICHE TECHNIQUE\n- Pieds du canapé en bois.\n- Assise et dossier en polyester effet velours.\n- Canapé idéal pour deux personnes.\n\nCARACTERISTIQUES TECHNIQUES\n- Dimensions : L. 128 x l. 71 x H. 80 cm.\n- Hauteur d'assise : H. 47 cm.\n- Poids : 15,14 kg.\n\n'''
m = re.search(r'-\s*Dimensions\s*:\s*L\.\s*(\d+)\D+(\d+)\D+(\d+)',s)
if m:
print(m.group(1)) # => 128
print(m.group(2)) # => 71
print(m.group(3)) # => 80

dart regex remove space phone

I tried all this regex solution but no match REGEX Remove Space
I work with dart and flutter and I tried to capture only digit of this type of string :
case 1
aaaaaaaaa 06 12 34 56 78 aaaaaa
case 2
aaaaaaaa 0612345678 aaaaaa
case 3
aaaaaa +336 12 34 56 78 aaaaa
I search to have only 0612345678 with no space and no +33. Just 10 digit in se case of +33 I need to replace +33 by 0
currently I have this code \D*(\d+)\D*? who run with the case 2
You may match and capture an optional +33 and then a digit followed with spaces or digits, and then check if Group 1 matched and then build the result accordingly.
Here is an example solution (tested):
var strs = ['aaaaaaaaa 06 12 34 56 78 aaaaaa', 'aaaaaaaa 0612345678 aaaaaa', 'aaaaaa +336 12 34 56 78 aaaaa', 'more +33 6 12 34 56 78'];
for (int i = 0; i < strs.length; i++) {
var rx = new RegExp(r"(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)");
var match = rx.firstMatch(strs[i]);
var result = "";
if (match != null) {
if (match.group(1) != null) {
result = "0" + match.group(2).replaceAll(" ", "");
} else {
result = match.group(2).replaceAll(" ", "");
}
print(result);
}
}
Returns 3 0612345678 strings in the output.
The pattern is
(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)
See its demo here.
(?:^|\D) - start of string or any char other than a digit
(\+33)? - Group 1 that captures +33 1 or 0 times
\s* - any 0+ whitespaces
(\d[\d ]*) - Group 2: a digit followed with spaces or/and digits
(?!\d) - no digit immediately to the right is allowed.
Spaces are removed from Group 2 with a match.group(2).replaceAll(" ", "") since one can't match discontinuous strings within one match operation.

How can I extract the first digits from string

I am trying to use powershell to extract the first digits from a long string. How can I use regex to only get the first numbers from a string?
String 1:
000660007501S W RUSSELL DLC NO 41 SLY 2.5 FT OF ELY 313.82 FT OF FOLLOWING DESCRIBED PORTION OF SAMUEL W RUSSELL DONATION CLAIM` 1000
string 2:
010454040006ALDERBROOK DIV NO 05 62000 14000040
string 3:
012000012000ALEXANDER ACRE TRS S 1/2 OF LOT 38 TGW LOT 39 TGW N 45.96 FT OF E 109.23 FT LOT 40 LESS ANY POR PLTD DEVON LANE 13000 38-39-40
I was able to do it like this:
$accountnumber = $p.Substring(0,16) -replace '\D+',''
$Parcelnumber = $accountnumber.Substring(0,10)

Delete Numeration Lines from Subtitle

I have this subtitle text with many many lines. Before times and text i have numeration (1,2,3,4,5...111 numbers):
Legend:
1 = numeration
2 = numeration
00:14:xx:xx = times
quando a te... = text
text example:
1
00:14:38,511 --> 00:14:45,747
quando a te venne il Salvatore,
2
00:14:55,595 --> 00:15:06,699
...volle da te prendere il battesimo,...
ma il prete rifiuto
10
00:15:16,082 --> 00:15:27,050
e si consacrò al martirio,
213
00:15:34,467 --> 00:15:46,174
ci diede un pegno di salvezza:
ecco! ci siamo andiamo a ubriarci
i want delete numeration lines:
1
2
10
213
this should be the end result:
00:14:38,511 --> 00:14:45,747
quando a te venne il Salvatore,
00:14:55,595 --> 00:15:06,699
...volle da te prendere il battesimo,...
ma il prete rifiuto
00:15:16,082 --> 00:15:27,050
e si consacrò al martirio,
00:15:34,467 --> 00:15:46,174
ci diede un pegno di salvezza:
ecco! ci siamo andiamo a ubriarci
Search: (?m)^\d+$[\r\n]+
Replace: empty string
In engines that don't support inline modifiers such as (?m), you'll usually add the m flag at the end of the pattern, like so:
/^\d+$[\r\n]+/m
Explanation
(?m) turns on multi-line mode, allowing ^ and $ to match on each line
The ^ anchor asserts that we are at the beginning of the string
\d+ matches digits
The $ anchor asserts that we are at the end of the string
[\r\n]+ matches line breaks
We replace with the empty string
You can simply just use the following:
Find: ^\d+\s+
Replace:
^ empty
Explanation:
^ # the beginning of the string
\d+ # digits (0-9) (1 or more times)
\s+ # whitespace (\n, \r, \t, \f, and " ") (1 or more times)