i have a text file and in some lines the first space from left is 2 space long and i want it to be 1 space long. whats the script for this in bash?
123 2 5//problem
1 2 5
1 2 5
1 32 5//problem
what i want
123 2 5
1 2 5
1 2 5
1 32 5
tr way:
cat test.txt | tr -s ' '
Using sed:
sed 's/^\([^ ][^ ]*[ ]\)[ ]*/\1/' input
Starting from the left
^
match and capture non-space characters and a space
\([^ ][^ ]*[ ]\)
and any number of additional spaces:
[ ]* # remove the star if you only care about exactly 2 spaces
and replace these with the captured part:
\1
Edit: I realized that David's answer was almost right.
You can use sed.
cat x | sed -e 's/ \+/ /'
This replaces the first occurrence of one or more spaces with a single space.
But you can do it purely in bash as well:
cat x | while read a b ; do echo "$a" "$b" ; done
This splits each line at the first word, and echos back the first word and the rest of the line. The result is that there is only one space between the first word and the rest of the line.
Related
Using only SED (ubuntu20.4), I need to print sentences that have at least 2 numbers or more.
Then, print only the first two words of the sentence.
I was able to perform the second part, but the first goal, I do not know how to perform.
this is the file:
ab c1d
dea 1 a zz7 www44
xy12 abc xyz
xy1 ab XYZ
xy ab X2YZ 3
And this is what I've done so far:
sed -E "s/^[ ]*([^ ]+[ ]+[^ ]+).*/\1/" $* > 123
If you just wanna use sed to print the first 2 words of string that contains at least 2 digits :
sed -nE '/[0-9]{2,}/p' ./yourFile.txt | sed -E 's/^\s*(\S+\s+\S+).*$/\1/'
/[0-9]{2,} : strings that contains at least 2 digits
/^\s*(\S+\s+\S+).*$ : line that begins with 0 or many space, then capturing a group of (1 or many non space char)(1 or many space char)(1 or many non space char) and then any
EXAMPLE :
input :
ab c1d
dea 1 a zz7 www44
xy12 abc xyz
xy1 ab XYZ
xy ab X2YZ 3
output :
dea 1
xy12 abc
and if you want to get rid of multi space char between the first two words of each line you can pipe it one more time into sed :
sed -nE '/[0-9]{2,}/p' ./yourFile.txt
| sed -E 's/^\s*(\S+\s+\S+).*$/\1/'
| sed -E 's/\s+/ /'
s/\s+/ / : s for substistute, \s+ for capturing all consecutive space char, / / for replacing it by just one space char
so in that case output will be :
dea 1
xy12 abc
You can use
sed -En '/[0-9][^0-9]*[0-9]/{s/^ *([^ ]+ +[^ ]+).*/\1/p}' file
awk '/[0-9][^0-9]*[0-9]/{print $1" "$2}' file
In both cases, a line with at least two digits is detected with the /[0-9][^0-9]*[0-9]/ regex (digit, any zero or more chars other than digits, a digit), and then in the sed solution, the first two words are captured and the rest is matched and removed, and in the awk solution, only the first two words (that are the first and second fields) are returned concatenated with a space.
See an online demo:
s=' ab c1d
dea 1 a zz7 www44
xy12 abc xyz
xy1 ab XYZ
xy ab X2YZ 3'
sed -En '/[0-9][^0-9]*[0-9]/{s/^([^[:space:]]+ +[^[:space:]]+).*/\1/p}' <<< "$s"
echo "Now, awk..."
awk '/[0-9][^0-9]*[0-9]/{print $1" "$2}' <<< "$s"
Both return the first words, sed keeps all spaces intact:
dea 1
xy12 abc
xy ab
awk keeps just one:
dea 1
xy12 abc
xy ab
Problem
I want to get any text that consists of 1 to three digits followed by a % but without the % using sed.
What I tried
So i guess the following regex should match the right pattern : [0-9]{1,3}%.
Then i can use this sed command to catch the three digits and only print them :
sed -nE 's/.*([0-9]{1,3})%.*/\1/p'
Example
However when i run it, it shows :
$ echo "100%" | sed -nE 's/.*([0-9]{1,3})%.*/\1/p'
0
instead of
100
Obviously, there's something wrong with my sed command and i think the problem comes from here :
[0-9]{1,3}
which apparently doesn't do what i want it to do.
edit:
Solution
The .* at the start of sed -nE 's/.*([0-9]{1,3})%.*/\1/p' "ate" the two first digits.
The right way to write it, according to Wicktor's answer, is :
sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'
The .* grabs all digits leaving just the last of the three digits in 100%.
Use
sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'
Details
(.*[^0-9])? - (Group 1) an optional sequence of any 0 or more chars up to the non-digit char including it
([0-9]{1,3}) - (Group 2) one to three digits
% - a % char
.* - the rest of the string.
The match is replaced with Group 2 contents, and that is the only value printed since n suppresses the default line output.
It will be easier to use a cut + grep option:
echo "abc 100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100
echo "100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100
Or else you may use this awk:
echo "100%" | awk 'match($0, /[0-9]{1,3}%/){print substr($0, RSTART, RLENGTH-1)}'
100
Or else if you have gnu grep then use -P (PCRE) option:
echo "abc 100%" | ggrep -oP '[0-9]{1,3}(?=%)'
100
This might work for you (GNU sed):
sed -En 's/.*\<([0-9]{1,3})%.*/\1/p' file
This is a filtering exercise, so use the -n option.
Use a back reference to capture 1 to 3 digits, followed by % and print the result if successful.
N.B. The \< ensures the digits start on a word boundary, \b could also be used. The -E option is employed to reduce the number of back slashes which would normally be necessary to quote (,),{ and } metacharacters.
I would like to extract 1, 10, and 100 from:
1 one -args 123
10 ten -args 123
100 one hundred -args 123
However this regex returns 100:
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '^(?=[ ]*)\d+(?=.*)'
100
Not ignoring the preceding spaces returns the numbers (but of course with undesired spaces):
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '^[ ]*\d+(?=.*)'
1
10
100
Have I misunderstood non capturing regex groups in grep / Perl (grep version 2.2, Perl as the -P flag should use its regex) or is this a bug? I notice the release notes for 2.6 says "This release fixes an unexpectedly large number of flaws, from outright bugs (surprisingly many, considering this is "grep")".
If someone with 2.6 could try these examples that would be valuable to determine if this is a bug (in 2.2) or intended behaviour.
The issue is what is considered a 'match' by grep. In the absence of telling grep part of the total match is not what you want, it prints everything up to the end of the match regardless of matching groups.
Given:
$ echo "$txt"
1 one -args 123
10 ten -args 123
100 one hundred -args 123
You can get just the first column of digits without leading spaces several ways.
With GNU grep:
$ echo "$txt" | grep -Po '^[ ]*\K\d+'
1
10
100
Here \K is equivalent to a look behind assertion that resets the match text of the match to be what comes after. The left hand, before the \K, is required to match, but is not included in match text printed by grep.
Demo
awk:
$ echo "$txt" | awk '/^[ ]*[0-9]+/{print $1}'
sed:
$ echo "$txt" | sed 's/^[ ]*\([0-9]*\).*/\1/'
Perl:
$ echo "$txt" | perl -lne 'print $1 if /^[ ]*\K(\d+)/'
And then if you want the matches on a single line, run through xargs:
$ echo "$txt" | grep -Po '^[ ]*\K(\d+)' | xargs
1 10 100
Or, if you are using awk or Perl, just change the way it is printed to not include a carriage return.
You can delete the unwanted spaces this way :
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '^[ ]*(\d+)' | tr -d ' '
As for your question of why it is not working, it is not a bug, it is working as intended, you just misinterpreted how it should work.
If we focus on this ^(?=[ ]*)\d+:
The (?=[ ]*) part is a lookahead assertion. So it means that the regex engine tries to check if the ^ is followed by zero or more spaces. But the assertion itself is not part of the match, so in reality this code means :
- Match a ^ that is followed by 0 or more spaces
- After this ^, match one or more digits
So your code will only match when a digit is the first character of the line. The lookahead won't help you on your use case.
I think the anchor messes with the lookahead, which could be a lookbehind, but they can't be ambiguous (I always run into that one). So the following would work:
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '(?=[ ]*)\d+(?=.*)'
As for a better tool, I would use awk as it is suited to any column driven data. So if you were running it off of ps you could do something like:
ps | awk '/stuff you want to look for here/{print $1}'
awk will take care of all the white space by default
I have a series of files that uses fixed with delimiting, instead of comma separated delimiting. They all look like this:
2015/09/29 659027 RIH619 25 105.80IN921186
2015/09/29 659027 RIH619 25 105.80IN921186
2015/09/29 659027 RIH619 25 105.80IN921186
2015/09/29 659027 RIH619 25 105.80IN921186
I would like to replace all the spaces with commas. I have a piece of code that accomplish this:
sed -r 's/^\s+//;s/\s+/,/g'
After running the code I get this result:
2015/09/29,659027,RIH619,25,105.80IN921186
2015/09/29,659027,RIH619,25,105.80IN921186
2015/09/29,659027,RIH619,25,105.80IN921186
2015/09/29,659027,RIH619,25,105.80IN921186
My problem is the files I get doesn't have a space between the amount and the reference. My output needs to look like this:
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
What I tried is:
sed -r 's/^\s+//;s/\.\d\d\D+/\.\d\d,\D/;s/\s+/,/g'
But it didn't seem to do anything
with tr and sed
tr ' ' ',' <file | sed -r 's/(\.[0-9]{2})/\1,/'
You can use this single sed for both:
sed -r 's/[[:blank:]]+/,/g; s/([[:digit:]])([[:alpha:]])/\1,\2/g' file
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
([[:digit:]]) matches a digit and captures it in group#1
([[:alpha:]]) matches an alphabet and captures it in group#2
\1,\2 places a comma between 2 groups.
awk has fixed field width support that is good for this sort of thing:
$ echo "2015/09/29 659027 RIH619 25 105.80IN921186" |
awk 'BEGIN { FIELDWIDTHS="10 1 6 1 6 1 2 1 6 8"; OFS="," }{ print $1,$3,$5,$7,$9,$10 }'
2015/09/29,659027,RIH619,25,105.80,IN921186
I have the following script to remove all lines before a line which matches with a word:
str='
1
2
3
banana
4
5
6
banana
8
9
10
'
echo "$str" | awk -v pattern=banana '
print_it {print}
$0 ~ pattern {print_it = 1}
'
It returns:
4
5
6
banana
8
9
10
But I want to include the first match too. This is the desired output:
banana
4
5
6
banana
8
9
10
How could I do this? Do you have any better idea with another command?
I've also tried sed '0,/^banana$/d', but seems it only works with files, and I want to use it with a variable.
And how could I get all lines before a match using awk?
I mean. With banana in the regex this would be the output:
1
2
3
This awk should do:
echo "$str" | awk '/banana/ {f=1} f'
banana
4
5
6
banana
8
9
10
sed -n '/^banana$/,$p'
Should do what you want. -n instructs sed to print nothing by default, and the p command specifies that all addressed lines should be printed. This will work on a stream, and is different than the awk solution since this requires the entire line to match 'banana' exactly whereas your awk solution merely requires 'banana' to be in the string, but I'm copying your sed example. Not sure what you mean by "use it with a variable". If you mean that you want the string 'banana' to be in a variable, you can easily do sed -n "/$variable/,\$p" (note the double quotes and the escaped $) or sed -n "/^$variable\$/,\$p" or sed -n "/^$variable"'$/,$p'. You can also echo "$str" | sed -n '/banana/,$p' just like you do with awk.
Just invert the commands in the awk:
echo "$str" | awk -v pattern=banana '
$0 ~ pattern {print_it = 1} <--- if line matches, activate the flag
print_it {print} <--- if the flag is active, print the line
'
The print_it flag is activated when pattern is found. From that moment on (inclusive that line), you print lines when the flag is ON. Previously the print was done before the checking.
cat in.txt | awk "/banana/,0"
In case you don't want to preserve the matched line then you can use
cat in.txt | sed "0,/banana/d"