sed issue - Extract specific words from file - regex

I would like to get some help with SED.
I'm trying to extract some files from a file, all the words that I need start like this.
39;,bugs.pr~%3D~'TEXT23
I need to get TEXT23 for example.
What I did what, first, change 39;,bugs.pr~%3D~' for IDEX which is my flag, then seach for IDEX and extract 8 characters from that word.

The following sed command might eliminate all text but what you want.
sed 's/^39;,bugs.pr~%3D~'//p;d' file

Related

sed search and replace between string and last occurence of character

I currently have a bunch of .md5sum files with a md5sum hash value and it's corresponding file name with full absolute path. I'd like to modify these files from being absolute pathing to relative. I think I have it pretty close.
> cat example.md5sum
197f76c53d2918764cfa6463b7221dec /example/path/to/file/example.null
> cat example.md5sum | sed 's/( ).*\// \.\//'
197f76c53d2918764cfa6463b7221dec /example/path/to/file/example.null
Throwing the regex ( ).*\/ into notepad++ returns /example/path/to/file/ which is what I want. Moving it over to sed does not produce the same match.
The end goal here as mentioned previously is the following:
197f76c53d2918764cfa6463b7221dec ./example.null
Looks like a job for sed.
sed -i.bak 's:/.*/:./:' file ...
The -i option tells sed to modify files "in-place" rather than sending the results to stdout. With the substitute command, you can use alternate delimiters -- in this case, I've used a colon, since the text you're matching and using as replacement includes slashes. Makes things easier to read.
I haven't bothered to match the whitespace before the path, because in an md5sum file has a pretty predictable format.
Back up your input files before experimenting.
Note that this is shell agnostic -- you can run it in tcsh or bash or anything else that is able to launch sed with options.

Finding words which match regex in multiple text files

So, I'm new to manipulating data from the command line, and also a beginner at regex.
I have multiple .txt files in multiple subdirectories. What I want to do is to find all words which have a certain number of consecutive consonants.
What I've tried so far is something like this:
find . | grep -orhn '[bdfghjklmnprstvxzþ]\{2\}' > ../words.txt
Which only prints out something like:
2:rt
2:gr
2:xl
3:gr
3:st
3:kk
I want to get the whole word, not just the two consecutive consonants (and the numbers and colon. I don't know where that comes from since it's not in the original data, but it really doesn't matter for what I am trying to do).
Do you have a tip?
The -n option is the line number in the text.
My suggestion is to try matching the word characters before and after.
This is what I tried and seemed to work.
grep -orh '\w\+[bdfghjklmnprstvxzþ]\{2\}\w\+'
The -o option will only show what is matching, which is the entire word.
The -r will look recursively which isn't relevant here given that find is doing the recursion for you.

Remove all hyperlinks in a text file, linux scripting

I am very new in scripting, but I want to learn it.
What I have to do is to remove all occurrences of something like http://* from a text file. I want to do it with sed command and regular expressions.
Here is what I have come up to so far:
sed 's/http:\/\/.*/ /' < input.txt > output.txt
This code replaces all the hyperlinks with a space. But the problem is that it also removes the rest of the line.
How can I fix this problem? I have tried adding space, "http://.* " or end of word "http://.*\>" or other tricks that I found in the internet, but they didn't work.
And is there a better way to do so instead of using sed?
Sed is a fine way to do this. Try changing your regex to s!http://[^[:space:]]*! !g.

sed add text around regex

I would like to be able to go:
sed "s/^\(\w+\)$/leftside\1rightside/"
and have the group matched by (\w+\) appear in between 'leftside' and 'rightside'.
But it seems like I have to pipe it twice, one for the left of the text, another time for the right. If anyone knows a way to do it in one pass, I'd appreciate it.
The reason it's not working is that you probably specify the wrong regex. In your case, text will be added in the end and beginning of the line only if it consists only of word characters (given that your version of sed supports the \w notation). Also you didn't escape the + which you should do if not using the -r option.
Try starting with sed "s/^\(.*\)$/leftside\1rightside/" or just sed "s/.*/leftside&rightside/" and working from that.

Regular expression with sed

I'm having hard time selecting from a file using a regular expression. I'm trying to replace a specific text in the file which is full of lines like this.
/home/user/test2/data/train/train38.wav /home/user/test2/data/train/train38.mfc
I'm trying to replace the bolded text. The problem is the i don't know how to select only the bolded text since i need to use .wav in my regexp and the filename and the location of the file is also going to be different.
Hope you can help
Best regards,
Jökull
This assumes that what you want to replace is the string between the last two slashes in the first path.
sed 's|\([^/]*/\)[^/]*\(/[^/]* .*\)|\1FOO\2|' filename
produces:
/home/user/test2/data/FOO/train38.wav /home/user/test2/data/train/train38.mfc
sed processes lines one at a time, so you can omit the global option and it will only change the first 'train' on each line
sed 's/train/FOO/' testdat
vs
sed 's/train/FOO/g' testdat
which is a global replace
This is quite a bit more readable and less error-prone than some of the other possibilities, but of course there are applications which will not simplify quite as readily.
sed 's;\(\(/[^/]\+\)*\)/train\(\(/[^/]\+\)*\)\.wav;\1/FOO\3.wav;'
You can do it like this
sed -e 's/\<train\>/plane/g'
The \< tells sed to match the beginning of that work and the \> tells it to match the end of the word.
The g at the end means global so it performs the match and replace on the entire line and does not stop after the first successful match as it would normally do without g.