I have a file that contains a pattern at the beginning of each newline:
./bob/some/text/path/index.html
./bob/some/other/path/index.html
./bob/some/text/path/index1.html
./sue/some/text/path/index.html
./sue/some/text/path/index2.html
./sue/some/other/path/index.html
./john/some/text/path/index.html
./john/some/other/path/index.html
./john/some/more/text/index1.html
... etc.
I came up with the following code to match the ./{name}/ pattern and would like to print 1 occurance of each name, BUT, it either prints out every line matching that pattern, or just 1 and stops when using the -m 1 flag:
I've tried it as a simple grep line(below) and also put it in a for loop
name=$(grep -iEoha -m 1 '\.\/([^/]*)\/' ./without_localnamespace.txt)
echo $name
My expected reuslts are:
./bob/
./sue/
./john/
Actual Results are:
./bob/
awk -F'/' '!a[$2]++{print $1 FS $2 FS}' input
./bob/
./sue/
./john/
You can do
cut -d "/" -f2 ./without_localnamespace.txt | sort -u
You seem to want unique occurrences, use
grep -Eoha '\./[^/]*/' ./without_localnamespace.txt | uniq
See the online demo
Regarding the pattern, you do not need to escape forward slashes, they are not special regex metacharacters. The -i flag is redundant here, too.
Similar questions have been asked but they are for Powershell.
I have a Markdown file like:
.
.
.
## See also
- [a](./A.md)
- [A Child](./AChild.md)
.
.
.
- [b](./B.md)
.
.
.
## Introduction
.
.
.
I wish to replace all occurrences of .md) with .html) between ## See also and ## Introduction :
.
.
.
## See also
- [a](./A.html)
- [A Child](./AChild.html)
.
.
.
- [b](./B.html)
.
.
.
## Introduction
.
.
.
I tried like this in Bash
orig="\.md)"; new="\.html)"; sed "s~$orig~$new~" t.md -i
But, this replaces everywhere in the file. But I wish that the replacement happens only between ## See also and ## Introduction
Could you please suggest changes? I am using awk and sed as I am little familiar with those. I also know a little Python, is it recommended to do such scripting in Python (if it is too complicated for sed or awk)?
$ sed '/## See also/,/## Introduction/s/\.md/.html/g' file
I have following files from 2 different categories :
Category 1 :
MAA
MAB
MAC
MAD
MAE
MAF
MAG
MAH
MAJ
MBA
MBB
MBC
MBD
MBE
MDA
MDD
and Category 2 :
MCA
MCB
MCC
MCD
MCE
MCF
MCG
MDB
So my question is : How can I write regular expression so that I can find files from category 1 only ?
I don't want to do hard coded script, expecting some logic from brilliant people.
I am trying this :
find . -regex "*[M][A,B,D][A,B,C,D,E,F,J].txt"
It's quite simple :
ls -l | grep "MAA\|MAB\|MAC\|MAD\|MAE\|MAF\|MAG\|MAH\|MAJ\|MBA\|MBB\|MBC\|MBD MBE\|MDA\|MDD"
Ok so you don't want hardcoded. Then yes you should state the patterns which should NOT match -v
ls -l | grep -v "MC." | grep -v "pattern2" | ....
Your question is not very precise, but from your attempt, I conclude, that you are looking for files having names ending in ....MAA.txt, ...MAB.txt and so on, and being located in either your working directory or somewhere below.
You also didn't mention, which shell you are using. Here is an example using zsh - no need to write a regular expression here:
ls ./**/*M{AA,AB,AC,AD,AE,AF,AG,AH,AJ,BA,BB,BC,BD,BE,DA,DD}.txt
I am trying this : find . -regex "*[M][A,B,D][A,B,C,D,E,F,J].txt"
The errors in this are:
The wildcard for any characters in a regex is .*, unlike just * in a normal filename pattern.
You forgot G and H in the third bracket expression.
You didn't exclude the category 2 name MDB.
Besides:
The characters of a bracket expression are not to be separated by ,.
A bracket expression with a single item ([M]) can be replaced by just the item (M).
This leads to:
find . -regex ".*M[ABD].*" -not -name "MDB*"
or, without regex:
find . -name "M[ABD]*" -not -name "MDB*"
Hi all I have the code below
find . -type f -exec sed -i 's#EText-No.#New EText-No. #g' {} +
I have been using the script to find and replace some characters in multiple files in folders and subfolders.
I have discovered that some values occurs more than twice. Hence I need to modify my script to replace only the second instance of an attribute
find . -type f -exec sed -i '/Subject/{:a;N;/Subject.*Subject/!Ta;s/Subject/SecondSubject/2;}/g' {} +
I am trying to use the code above to achive this .. but it seems not to be working. I need to modify the code to work with "#" as a seperatore like the above code. because I have backlash characters in my file.
Any Idea how I might make the code to work and using the sperator #?
Thanks for your help
ORIGINAL FILE BEFORE PROCESSING
<tr><th scope="row">Subject</th><td>United States -- Biography</td></tr><tr><th scope="row">Subject</th><td>United States -- Short Stories</td></tr><tr><th scope="row">EText-No.</th><td>24200</td></tr><tr><th scope="row">Release Date</th><td>2008-01-07</td></tr><tr>
After processing
<tr><th scope="row">Subject</th><td>United States -- Biography</td></tr><tr><th scope="row">SecondSubject</th><td>United States -- Short Stories</td></tr><tr><th scope="row">EText-No.</th><td>24200</td></tr><tr><th scope="row">Release Date</th><td>2008-01-07</td></tr><tr>
Please note that the second Subject is changed from 'Subject' to 'SecondSubject'
Try this:
sed -i '/Subject/{:a;s/\(Subject.*\)Subject/\1SecondSubject/;tb;N;ba;:b}'
If a line appended to the pattern space (with the N command) contains more than one occurrence of the word "Subject", then you can use this command to only target the first occurrence of the appended line (the second occurrence of the pattern space):
sed -i '/Subject/{:a;/Subject.*Subject/!{N;ba;};s/Subject/newSubject/2;}'
I have a folder which contains sub folders and some more files in them.
The files are named in the following way
abc.DEF.xxxxxx.dat
I'm trying to find the duplicate files only matching 'xxxxxx' in the above pattern ignoring the rest. The extension .dat doesn't change. But the length of abc and DEF might change. The order of separation by periods also doesn't change.
I'm guessing I need to use Find in the following way
find -regextype posix-extended -regex '\w+\.\w+\.\w+\.dat'
I need help coming up with the regular expression. Thanks.
Example:
For a file named 'epg.ktt.crwqdd.dat', I need to find duplicate files containing 'crwqdd'.
You can use awk for that:
find /path -type f -name '*.dat' | awk -F. 'a[$4]++'
Explanation:
Let find give the following output:
./abd.DdF.TTDFDF.dat
./cdd.DxdsdF.xxxxxx.dat
./abc.DEF.xxxxxx.dat
./abd.DdF.xxxxxx.dat
./abd.DEF.xxxxxx.dat
Basically, spoken with the words of a computer, you want to count the occurrences of a pattern between .dat and the next dot and print those lines where pattern appeared at least the second time.
To achieve this we split the file names by the . what gives us 5(!) fields:
echo ./abd.DEF.xxxxxx.dat | awk -F. '{print $1 " " $2 " " $3 " " $4 " " $5}'
/abd DEF xxxxxx dat
Note the first, empty field. The pattern of interest is $4.
To count the occurrences of a pattern in $4 we use an associative array a and increment it's value on each occurrence. Unoptimized, the awk command will look like:
... | awk -F. '{{if(a[$4]++ > 1){print}}'
However, you can write an awk program in the form:
CONDITION { ACTION }
What will give us:
... | awk -F. 'a[$4]++ > 1 {print}'
print is the default action in awk. It prints the whole current line. As it is the default action it can be omitted. Also the >1 check can be omitted because awk treats integer values greater than zero as true. This gives us the final command:
... | awk -F. 'a[$4]++'
To generalize the command we can say the pattern of interest isn't the 4th column, it is the next to last column. This can be expressed using number of fields in awk its NF:
... | awk -F. 'a[$(NF-1)]++'
Output:
./abc.DEF.xxxxxx.dat
./abd.DdF.xxxxxx.dat
./abd.DEF.xxxxxx.dat