I have certain data that I want to rearrange (it's all on the same line) I have tried multiple approaches but I can't get it to work.
Here is an example of the text:
DATA1="8DE" DATA2="322" DATA3="20" DATA4="19.99" DATA5="0.01"
DATA1="FE4" DATA2="222" DATA4="400" DATA3="400" DATA5="0.00"
DATA1="CE3" DATA2="444" DATA4="60" DATA5="0.00" DATA3="60"
DATA1="MME" DATA3="20" DATA4="20" DATA5="0.00"
DATA2="667" DATA4="30" DATA3="30" DATA5="0.00" DATA1="MH4"
This should be the output:
8DE 322 20 19.99 0.01
FE4 222 400 400 0.00
CE3 444 60 60 0.00
MME 20 20 0.00
MH4 667 30 30 0.00
I have tried the following but to no avail:
FIND: DATA1=\"(.*?)\"|DATA2=\"(.*?)\"|DATA3=\"(.*?)\"|DATA4=\"(.*?)\"|DATA5=\"(.*?)\"
REPLACE: \1 \2 \3 \4 \5
and
FIND: DATA1=\"(?<d1>.*?)\"|DATA2=\"(?<d2>.*?)\"|DATA3=\"(?<d3>.*?)\"|DATA4=\"(?<d4>.*?)\"|DATA5=\"(?<d5>.*?)\"
REPLACE: $+{d1} $+{d2} $+{d3} $+{d4} $+{d5}
I would be happy if someone can help or direct me to the right answer (and sorry for any misunderstanding as english is not my first languaje)
The regex
^(?=.*\bDATA1="([^"]+)"\h*)?(?=.*\bDATA2="([^"]+)"\h*)?(?=.*\bDATA3="([^"]+)"\h*)?(?=.*\bDATA4="([^"]+)"\h*)?(?=.*\bDATA5="([^"]+)"\h*)?.*
This regex works by using optional lookaheads to locate DATAx (where x is the number) and capturing the value inside the " into a capture group, then matching the whole line (in order to replace it).
The replacement
$1\t\t$2\t\t$3\t\t$4\t\t$5
This replacement just references the capture groups and adds tab characters between them while reordering them in the order of DATA [1,2,3,4,5].
The result
8DE 322 20 19.99 0.01
FE4 222 400 400 0.00
CE3 444 60 60 0.00
MME 20 20 0.00
MH4 667 30 30 0.00
See it working
See the regex in use here
Related
I would like to extract data from the below sample data using regex
I have tried \d{2}/\d{4} and get the ex: 39/2021.I need to get 23 which is in between 2 spaces. Any numbers between those 2 spaces after my expression.
Sample Data
Backlog 25 567 07/2022 120 2510
39/2021 23 590 08/2022 120 2630
40/2021 120 710 09/2022 120 2750
41/2021 120 830 10/2022 120 2870
42/2021 120 950 11/2022 120 2990
45/2021 120 1070 12/2022 120 3110
47/2021 120 1190 13/2022 120 3230
48/2021 120 1310 14/2022 240 3470
49/2021 120 1430 15/2022 120 3590
50/2021 120 1550 16/2022 120 3710
51/2021 120 1670 17/2022 240 3950
52/2021 120 1790 18/2022 120 4070
02/2022 120 1910 19/2022 120 4190
03/2022 120 2030 20/2022 120 4310
04/2022 120 2150 21/2022 240 4550
05/2022 120 2270 22/2022 120 4670
06/2022 120 2390 23/2022 120 4790
enter image description here
I have added a picture reference for the output.
You can use a capture group, matching a space before the digits and either assert a whitespace boundary after it or match the following space
\b\d{2}/\d{4} (\d+)(?!\S)
The pattern matches:
\b A word boundary
\d{2}/\d{4} Match 2 digits / 4 digits
(\d+) Capture 1+ digits in group 1
(?!\S) Negative lookahead, assert a whitespace boundary to the right
Regex demo
If there should be a space at the left and at the right:
\b\d{2}/\d{4} (\d+)
Regex demo
I have a problem that my Googling tells me can be solved with Regex, but I'm completely unfamiliar and I tried following some tutorials but I'm entirely lost. I have this sample data set:
59 65 21366 CLEMENTINES 4.89 2.00 9.78
59 61 22384 PORK BACK RIBS 6.50 2.40 15.59
59 65 30669 BANANAS 1.89 1.00 1.89
59 13 391314 KODIAK POWER CAKES 14.69 1.00 14.69
59 65 392373 BAJA CHOPPED SALAD KIT 2.99 1.00 2.99
59 39 429227 FILA MENS ANKLE SOCK 6PK 9.99 1.00 9.99
59 65 1056187 ASIAN CASHEW SALAD KIT 2.99 1.00 2.99
59 28 1159696 SHOPKINS GG/TWOZIES ASST 5.97 1.00 5.97
59 13 1221327 KODIAK POWER CAKES -3.00 -3.00 COUPON
59 14 1270070 KLEENEX ULTRA SOFT 12 PCK 16.49 1.00 16.49
59 21 5221111 10 DRAWER STORAGE CART 29.99 1.00 29.99
59 17 1019 HALF + HALF 1 L 1.99 1.00 1.99
I want to import it into a spreadsheet. Visually I can see what I want (3 numeric columns at the beginning, then a description that may or may not contain spaces, then usually 3 numeric columns, but sometimes 2 + a word (see the line that ends in "coupon").
But because of the spaces and lack of quotes, my Excel skills (which are also marginal) don't allow me to import this in a sensible way.
I thought of doing multiple processes: pull off the 3 columns at the left and then 3 columns at the right... but in Excel I see no way to operate "from the right".
Any help appreciated. Thanks.
[edit] I realize from the comments that my ignorance has resulted in a poor question.
I didn't realize "Regex" was specific to language, etc. I am trying to import a csv into Excel, but I was using Notepad++ to perform the regex operations. I don't know what "flavor" that uses but the answer below helped greatly.
You can match this with:
^(\S*) (\S*) (\S*) (.*) (\S*) (\S*) (\S*)$
^ matches the start of a line
\S* matches one or more non-whitespace characters
.* matches anything, including spaces
the parentheses capture the matches into capture groups
$ matches the end of a line.
You haven't said what tool you intend to use to do this.
One way is with a Perl one-liner:
perl -pe 's/^(\S*) (\S*) (\S*) (.*) (\S*) (\S*) (\S*)$/"\1","\2","\3","\4","\5","\6","\7"/' input.txt
Returning:
"59","65","21366","CLEMENTINES","4.89","2.00","9.78"
...
"59","13","1221327","KODIAK POWER CAKES","-3.00","-3.00","COUPON"
... etc.
I'm struggling to find a solution to this regex which appears to be fairly straight forward. I need to match a pattern that precedes another matching pattern.
I need to capture the "Mean:" that follows "Keberos-wsfed" in the following:
Kerberos:
Historical:
Between 26 and 50 milliseconds: 10262
Between 50 and 100 milliseconds: 658
Between 101 and 200 milliseconds: 9406
Between 201 and 500 milliseconds: 6046
Between 501 milliseconds and 1 second: 1646
Between 1 and 5 seconds: 1399
Between 6 and 10 seconds: 13
Between 11 and 30 seconds: 34
Between 31 seconds and 1 minute: 7
Between 1 minute and 2 minutes: 1
Mean: 268, Mode: 36, Median: 123
Total: 29472
Kerberos-wsfed:
Historical:
Between 26 and 50 milliseconds: 3151
Between 50 and 100 milliseconds: 129
Between 101 and 200 milliseconds: 650
Between 201 and 500 milliseconds: 411
Between 501 milliseconds and 1 second: 171
Between 1 and 5 seconds: 119
Between 6 and 10 seconds: 4
Between 11 and 30 seconds: 6
Between 1 minute and 2 minutes: 1
Mean: 176, Mode: 33, Median: 37
Total: 4642
I can match (?:Kerberos-wsfed:), I can match Mean: but I must find the value of Mean after Kerberos-wsfed but having difficulty. Thanks for the assistance.
Use the regex
Kerberos-wsfed[\s\S]*?Mean: *(\d+)
The mean value is contained in the capturing group 1, that is $1 or \1 depending on your programming language.
See demo.
Try to use that regular expresion: #Kerberos-wsfed:.+?Mean:\s+(\d+)#s
You can use just space or \s instead of \s+ if you're shure in file format.
Value 176 will be at group 1 of matched elements
Demo: https://regex101.com/r/gwkUPJ/1
Using capturing group:
Kerberos-wsfed:[\s\S]*Mean:\s(\d+)
Kerberos-wsfed: matches the literal as-is
[\s\S]* allows any number of characters between (including line delimitters)
Mean:\s matches the literal Mean followed by a space \s
Finally (\d+) which is wrapped in the first capturing group captures the value you are looking for. It essentially allows any number of digits
Regex 101 Demo
The value that you are looking for (176) will be in the first capturing group which is $1 or the first one based on your language. For instance, in PHP:
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo $matches[0][1];
// Output: 176
Given the following list of phone numbers
8144658695
812 673 5748
812 453 6783
812-348-7584
(617) 536 6584
834-674-8595
Write a single regular expression (use vim on loki) to reformat the numbers so they look like this
814 465 8695
812 673 5748
812 453 6783
812 348 7584
617 536 6584
834 674 8595
I am using the search and replace command. My regular expression using back referencing:
:%s/\(\d\d\d\)\(\d\d\d\)\(\d\d\d\d\)/\1 \2 \3\g
only formats the first line.
Any ideas?
Try this:
:%s,.*\(\d\d\d\).*\(\d\d\d\).*\(\d\d\d\d\).*,\1 \2 \3,
First use count to match a pattern multiple times, it is a bad habbit to repeat the pattern:
\d\{3} "instead of \d\d\d
Than you also have to match the whitespaces etc:
:%s/.*\(\d\{3}\).*\(\d\{3}\).*\(\d\{4}\).*/\1 \2 \3/g
Or even better, escape the whole regex with \v:
:%s/\v.*(\d{3}).*(\d{3}).*(\d{4}).*/\1 \2 \3/g
This greatly increases readability
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
It's perhaps quite simple, but I can't figure it out:
I have a random number (can be 1,2,3 or 4 digits)
It's repeating on a second line:
2131
2131
How can I remove the first number?
EDIT: Sorry I didn't explained it better. These lines are in a plain text file. I'm using BBEdit as my editor. And the actual file looks like this (only then app. 10.000 lines):
336
336
rinde
337
337
diving
338
338
graffiti
339
339
forest
340
340
mountain
If possible the result should look like this:
336 - rinde
337 - diving
338 - graffiti
339 - forest
340 - mountain
Search:
^(\d{1,4})\n(?:\1\n)+([a-z]+$)
Replace:
\1 - \2
I don't have access to BBEdit, but apparently you have to check the "Grep" option to enable regex search-n-replace. (I don't know why they call it that, since it seems to be powered by the PCRE library, which is much more powerful than grep.)
since you didn't mention any programming language, tools. I assume those numbers are in a file. each per line, and any repeated numbers are in neighbour lines. uniq command can solve your problem:
kent$ echo "1234
dquote> 1234
dquote> 431
dquote> 431
dquote> 222
dquote> 222
dquote> 234"|uniq
1234
431
222
234
Another way find: /^(\d{1,4})\n(?=\1$)/ replace: ""
modifiers mg (multi-line and global)
$str =
'1234
1234
431
431
222
222
222
234
234';
$str =~ s/^(\d{1,4})\n(?=\1$)//mg;
print $str;
Output:
1234
431
222
234
Added On the revised sample, you could do something like this:
Find: /(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/
Replace: $1 - $2
Mods: /mg (multi-line, global)
Test:
$str =
'
336
336
rinde
337
337
337
diving
338
338
graffiti
339
337
339
forest
340
340
mountain
';
$str =~ s/(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/$1 - $2/mg;
print $str;
Output:
336 - rinde
337 - diving
338 - graffiti
339
337
339 - forest
340 - mountain
Added2 - I was more impressed with the OP's later desired output format than the original question. It has many elements to it so, unable to control myself, generated a way too complicated regex.
Search: /^(\d{1,4})\n+(?:\1\n+)*\s*(?:((?:(?:\w|[^\S\n])*[a-zA-Z](?:\w|[^\S\n])*))\s*(?:\n|$)|)/
Replace: $1 - $2\n
Modifiers: mg (multi-line, global)
Expanded-
# Find:
s{ # Find a single unique digit pattern on a line (group 1)
^(\d{1,4})\n+ # Grp 1, capture a digit sequence
(?:\1\n+)* # Optionally consume the sequence many times,
\s* # and whitespaces (cleanup)
# Get the next word (group 2)
(?:
# Either find a valid word
( # Grp2
(?:
(?:\w|[^\S\n])* # Optional \w or non-newline whitespaces
[a-zA-Z] # with at least one alpha character
(?:\w|[^\S\n])*
)
)
\s* # Consume whitespaces (cleanup),
(?:\n|$) # a newline
# or, end of string
|
# OR, dont find anything (clears group 2)
)
}
# Replace (rewrite the new block)
{$1 - $2\n}xmg; # modifiers expanded, multi-line, global
find:
((\d{1,4})\r(\D{1,10}))|(\d{1,6})
replace:
\2 - \3
You should be able to clean it up from there quite easily!
Detecting such a pattern is not possible using regexp.
You can split the string by the "\n" and then compare.