Select a specified number of digits following a pattern - regex

I need a regex to select a part of a string. Possible strings:
"Til.: 1231231231 Fax: 1231231231 Kin.: 1231231231"
"Til.: 1231231231"
"Til.: 1231231231 Kin.: 1231231231"
In all these occasions I need to select the 10 digits following the "Fax: " string if it exists.

You can use this regex :
Fax:\s?(\d{10})
As you didn't precise the language I can only suggest you to test it in the console of your browser; type this :
"Til.: 1231231231 Fax: 1231231231 Kin.: 1231231231".match(/Fax:\s?(\d{10})/)

With grep
grep -oP "(?<Fax: )[^ ]+"

Related

Regex grep command to select only passwords that start with a number and end with a number

I'm trying to formulate a grep regex expression that selects only passwords that start with a number and ends with a number. The format of the txt file is:
password, #OfUsersWhoUseThisPassword
(The comma is included) for example:
123456, 25969
12345678, 8667
1234, 5786
qwerty, 5455
dragon, 4321
Regex:
^[0-9]{1}.*[0-9]{1}(?=,)
Demo
Try this :
grep -oP '^\d.*?\d(?=,)' /tmp/file
-o is to print only what is matching
-P is to use perl like regex
Check look around. Check online with explanations

Regex replace phone numbers with XXXXXXXXXX

I have a text file consisting of few phone numbers and other important data. I would like to replace all the phone numbers to a predefined text, lets say XXXXXXXXXX.
How to do it using sed/awk? The regex
^\s*(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})(?: *x(\d+))?\s*$
did not work for me.
Input:
Add me 7598128789
Pls add mi 9761500634
Add 8870504046
spam post
magar maddam is not required
all hero hain
All follows
Output:
Add me XXXXXXXXXX
Pls add mi XXXXXXXXXX
Add XXXXXXXXXX
spam post
magar maddam is not required
all hero hain
All follows
you can do like in perl.
cat a |perl -npe 's/\d{10}/XXXXXXXXXX/g'
try:
gawk '{gsub(/[0-9]{10}/,"XXXXXXXXXX");print}' Input_file
simply substituting 10 continuous digits with 10 number of X string and then printing the line.
with GNU sed:
sed -r 's/\b[0-9]{10}\b/XXXXXXXXXX/g' filename

Sed remove only first occurence of a string

I have several string in my text file witch have this case:
Brisbane, Queensland, Australia|BNE
I know how to use the SED command, to replace any character by another one. This time I want to replace the characters coma-space by a pipe, only for the first match to not affect the country name at the same time.
I need to convert it to something like that:
Brisbane|Queensland, Australia|BNE
As you can see, only the first coma-space was replaced, not the second one and I keep the country name "Queensland, Australia" complete. Can someone help me to achieve this, thanks.
Here is a sample of my file:
Brisbane, Queensland, Australia|BNE
Bristol, United Kingdom|BRS
Bristol, VA|TRI
Brive-La-Gaillarde, France - Laroche|BVE
Brno, Czech Republic - Bus service|ZDN
Brno, Czech Republic - Turany|BRQ
If you do: sed 's/, /|/' file.txt doesn't work.
The output should be like that:
Brisbane|Queensland, Australia|BNE
Simply don't use the g option. Your sed command should look like this:
sed 's/, /|/'
The s command will by default only the replace the first occurrence of a string in the pattern buffer - unless you pass the g option.
Since you have not posted the output of your test file, we can only guess what you need. And here is may guess:
awk -F", *" 'NF>2{$0=$1"|"$2 OFS $3}1' OFS=", " file
Brisbane|Queensland, Australia|BNE
Bristol, United Kingdom|BRS
Bristol, VA|TRI
Brive-La-Gaillarde, France - Laroche|BVE
Brno, Czech Republic - Bus service|ZDN
Brno, Czech Republic - Turany|BRQ
As you see it counts fields to see if it needs | or not. If it neds | then reconstruct the line.

Regex code for address separated by commas

How can I extract the state text which is before third comma only using the regex code?
54 West 21st Street Suite 603, New York,New York,United States, 10010
I've managed to extract the rest how I wanted but this one is a problem.
Also, how can I extract the "United States" please?
It looks like you want to use capturing groups:
.*,.*,(.*),(.*),.*
The first capturing group will be "New York" and the second will be "United States" (try it on Rubular).
Or you can split by commas (which will probably be even simpler) as #Jerry points out, assuming the language/tool you're using supports that.
You can use this regex:
(?:[^,]*,){2}([^,]*)
And use captured group # 1 for your desired String.
TL;DR
A lot depends on your regular expression engine, and whether you really need a regular expression or field-splitting. You can do field-splitting in Ruby and Awk (among others), but sed and grep only do regular expressions. See some examples below to get you started.
Ruby
str = '54 West 21st Street Suite 603, New York,New York,United States, 10010'
str.match /(?:.*?,){2}([^,]+)/
$1
#=> "New York"
GNU sed
$ echo '54 West 21st Street Suite 603, New York,New York,United States, 10010' |
sed -rn 's/([^,]+,){2}([^,]+).*/\2/p'
GNU awk
$ echo '54 West 21st Street Suite 603, New York,New York,United States, 10010' |
awk -F, '{print $3}'

Grep Regex - Words in brackets?

I want to know the regex in grep to match everything that isn't a specific word. I know how to not match everything that isn't a single character,
gibberish blah[^.]*jack
That would match blah, jack and everything in between as long as the in between didn't contain a period. But is it possible to do something like this?
gibberish blah[^joe]*jack
Match blah, jack and everything in between as long as the in between didn't contain the word "joe"?
UPDATE:
I can also use AWK if that would better suit this purpose.
So basically, I just want to get the sentence "gibberish blah other words jack", as long as "joe" isn't in the other words.
Update 2 (The Answer, to a different question):
Sorry, I am tired. The sentence actually can contain the word "joe", but not two of them. So "gibberish blah jill joe moo jack" would be accepted, but "gibberish blah jill joe moo joe jack" wouldn't.
Anyway, I figured out the solution to my problem. Just grep for "gibberish.*jack" and then do a word count (wc) to see how many "joes" are in that sentence. If wc comes back with 1, then it's ok, but if it comes back with 2 or more, the sentence is wrong.
So, sorry for asking a question that wouldn't even solve my problem. I will mark sputnick's answer as the right one, since his answer looks like it would solve the original posts problem.
What you're looking for is named look around, it's an advanced regex technique in pcre & perl. It's used in modern languages. grep can handle this expressions if you have the -P switch. If you don't have -P, try pcregrep instead. (or any modern language).
See
http://www.perlmonks.org/?node_id=518444
http://www.regular-expressions.info/lookaround.html
NOTE
If you just want to negate a regex, maybe a simple grep -v "regex" will be sufficient. (It depends of your needs) :
$ echo 'gibberish blah other words jack' | grep -v 'joe'
gibberish blah other words jack
$ echo 'gibberish blah joe other words jack' | grep -v 'joe'
$
See
man grep | less +/invert-match
Try the negative lookbehind syntax:
blahish blah(?<!joe)*jack