Removing Leading 0 and applying Regex to Sed - regex

I have several file names, for ease I've put them in a file as follows:
01.action1.txt
04action2.txt
12.action6.txt
2.action3.txt
020.action9.txt
10action4.txt
15action7.txt
021action10.txt
11.action5.txt
18.action8.txt
As you can see the formats aren't consistent what I'm trying to do is extract the first numbers from these file names 1,4,12,2,20 etc
I have the following regex
(\.)?action\d{1,}.txt
Which is successfully matching .action[number].txt but I need to also match the leading 0 and apply it to my substitute with blank in sed so i'm only left with the leading numbers. I'm having trouble matching the leading 0 and applying the whole thing to sed.
Thanks

With GNU sed:
sed -r 's/0*([0-9]*).*/\1/' file
Output:
1
4
12
2
20
10
15
21
11
18
See: The Stack Overflow Regular Expressions FAQ

I don't know if the below awk is helpful but it works as well:
awk '{print $1 + 0}' file
1
4
12
2
20
10
15
21
11
18

Related

Bash script to split a file by grep everything till the second time match in a column into one file and the rest into another

I am trying to split a file with data like
2 0.2345
58 0.3608
59 0.3504
60 0.4175
65 0.3995
66 0.3972
67 0.4411
411 0.3455
2 1.3867
3 1.4532
4 1.2925
5 1.2473
6 1.2605
7 1.2463
8 1.1667
9 1.1312
10 1.1502
11 1.1190
12 1.0346
13 1.0291
409 0.8025
410 0.8695
411 0.9154
For this kind of data, I am trying to split this into two files:
File 1 : 2 -411 (first Column match)
File 2 : 2-411 (second occurrence in the first column)
For this, I wrote these two one liners:
awk '1;/411/{exit}' $1 > File1_$1 ;
awk '/411/,0' $1 | awk '{if (NR!=1) {print}}' > File2_$1
The problem is that if there is a match of "411" (as in "67 0.4411") on the second column, my script prematurely cuts from that line.
I am unable to make the match on the first column only as occurrence of 411 on the second column can be number of times and not of interest.
Any help would be greatly appreciated.
an idea could be to use this command combination
awk '{ if ($1 >= 2 && $1 <= 411) print $0 }{if ($1=="411") exit}' input > f1
then
grep -v -f f1 input > f2
if your input file is more bigger you should repeat step2.
I don't know nothing about Bash, but for regex i think you should indicate that the line begins with 411 like that \b411.

Regex for soccer data

Why isn't my regex working? It just returns back the original file. My file looks like this (for a few hundred lines):
1 Germany 1765 0 Equal
2 Argentina 1631 0 Equal
3 Colombia 1488 1 Up
4 Netherlands 1456 -1 Down
5 Belgium 1444 0 Equal
6 Brazil 1291 1 Up
7 Uruguay 1243 -1 Down
8 Spain 1228 -1 Down
9 France 1202 1 Up
...
192 US Virgin Islands 28 -1 Down
And I want this:
Germany,1
Argentina,2
Colombia,3
...
US Virgin Islands,192
This is the regex I tried:
sed 's/\([0-9]*\)\t\([a-zA-Z]*\)/\2,\1/g' <fifa.csv >fifa.csv
But it just returns the original file.
EDIT:
Now I tried
sed 's/\([0-9]*\)\t\([a-zA-Z]*\)/\2,\1/g' <fifa.csv >fifa.csv
and got
,1 Germany,,1765Equal,0,
,2 Argentina,,1631Equal,0,
,3 Colombia,,1488Up,1,
,4 Netherlands,,1456-Down,1,
,5 Belgium,,1444Equal,0,
You could try the below sed command if the fields are tab-separated.
sed 's/^\([0-9]\+\)\t\([^\t]*\).*/\2,\1/' file
Add the inline-edit option -i to save the changes made.
sed -i 's/^\([0-9]\+\)\t\([^\t]*\).*/\2,\1/' file
^ means start of the line anchor. + would repeat the previous character one or more times. Basic sed uses BRE so you need to escape the + to do the functionality of repeating the previous character one or more times. [^\t]* matches any character but not of \t tab character zero or more times.
The following is what you are looking for. The -i option specifies that files are to be edited in-place.
sed -i 's/^\([0-9]\+\)\t\([^\t]*\).*/\2,\1/' fifa.csv
awk '{print( $2 "," $1)}' YourFile
not a sed but easier to manage

Use SED to replace string of fixed length at certain position - arbitrary pattern

Trying to replace a string of fixed length at certain position (a string of arbitrary numbers) with a specified string.
I have to :
for every line beginning with 1, in the 4-13 columns, replace existing value with 123456789 where column 4 is a space. 123456789
so a sample file looks like this in the first line:
110 000000000000000000000000000000000000000
and i want
110 123456789000000000000000000000000000000
So far I have:
sed -i "/^1/ s/(.{10})/ 123456789/4" $DEST/$FILE_NAME$DATE.txt
This doesn't do anything though...
With sed:
sed '/^1/s/\(.\{4\}\)\(.\{9\}\)/\1123456789/' "$DEST/$FILE_NAME$DATE.txt"
The preceding regex /^1/ makes the following substitute command apply only to lines starting with a 1.
The substitute command itself captures the first 4 chars 100<space> and the following 9 chars 000000000 into separate groups while keeping the first 4 chars and replacing the following nine chars by 123456789.
Btw, if you have GNU sed, you might simplify the command to:
sed -r '/^1/s/(.{4})(.{9})/\1123456789/'
... which looks simpler for understanding, but is not portable across all different sed versions.
Using awk, simple to understand solution
awk '/^1/ {print substr($0,1,4)"123456789"substr($0,14)}' file
110 123456789000000000000000000000000000000
If line starts with 1, print the 4 first characters + 123456789 + the rest of the line starting from 14 position.

Substring in UNIX

Suppose I have a string "123456789".
I want to extract the 3rd, 6th, and 8th element. I guess I can use
cut -3, -6, -8
But if this gives
368
Suppose I want to separate them by a white space to get
3 6 8
What should I do?
Actually shell parameter expansion lets you do substring slicing directly, so you could just do:
x='123456789'
echo "${x:3:1}" "${x:6:1}" "${x:8:1}"
Update
To do this over an entire file, read the line in a loop:
while read x; do
echo "${x:3:1}" "${x:6:1}" "${x:8:1}"
done < file
(By the way, bash slicing is zero-indexed, so if you want the numbers '3', '6' and '8' you'd really want ${x:2:1} ${x:5:1} and {$x:7:1}.)
You can use the sed tool and issue this command in your teminal:
sed -r "s/^..(.)..(.).(.).*$/\1 \2 \3/"
Explained RegEx: http://regex101.com/r/fH7zW6
To "generalize" this on a file you can pipe it after a cat like so:
cat file.txt|sed -r "s/^..(.)..(.).(.).*$/\1 \2 \3/"
Perl one-liner.
perl -lne '#A = split //; print "$A[2] $A[5] $A[7]"' file
Using cut:
$ cat input
1234567890
2345678901
3456789012
4567890123
5678901234
$ cut -b3,6,8 --output-delimiter=" " input
3 6 8
4 7 9
5 8 0
6 9 1
7 0 2
The -b option selects only the specified bytes. The output delimiter can be specified using --output-delimiter.

How to print a range of lines with sed except the one matching the range-end pattern?

I wonder if there is a sed-only way to print a range of lines, determined by patterns to be matched, except the one last line matching the end pattern.
Consider following example. I have a file
line 1
line 2
line 3
ABC line 4
+ line 5
+ line 6
+ line 7
line 8
line 9
line 10
line 11
line 12
I want to get everything starting with ABC (including) and all the lines beginning with a +:
ABC line 4
+ line 5
+ line 6
+ line 7
I tried it with
sed -n '/ABC/I,/^[^+]/ p' file
but this gives one line too much:
ABC line 4
+ line 5
+ line 6
+ line 7
line 8
What's the easiest way (sed-only) to leave this last line out?
There might be better ways but I could come up with this sed 1 liner:
sed -rn '/ABC/,/^[^+]/{/(ABC|^\+)/!d;p;}' file
Another sed 1 liner is
sed -n '/ABC/,/^[^+]/{x;/^$/!p;}' file
One more sed 1 liner (and probably better)
sed -n '/ABC/I{h;:A;$!n;/^+/{H;$!bA};g;p;}' file
The easiest way (I'll learn something new if anyone can solve this with one call to sed), is to add an extra sed at the end, i.e.
sed -n '/ABC/I,/^[^+]/ p' file | sed '$d'
ABC line 4
+ line 5
+ line 6
+ line 7
Cheating, I know, but that is the beauty of the Unix pipe philosphy. Keep whitiling down your data until you get what you want ;-)
I hope this helps.
This might work for you:
sed '/^ABC/{:a;n;/^\(ABC\|+\)/ba};d' file
EDIT: to allow adjacent ABC sections.
Well, you have selected your answer. But why aren't you using /^(ABC|\+)/ ? Or am i mis-understanding your requirement?
If you want to find those + lines AFTER a search for ABC is found
awk '/ABC/{f=1;print} f &&/^\+/ { print }' file
This is much simpler to understand than crafting cryptic sed expressions. When ABC is found, set a flag. When lines starting with + is found and flag is set, print line.