I want to find a substring in a string and replace another string - replace

I have a bunch of text files that have a line that starts with {title: and ends with }
I want to know the linux command line to replace the } with either _c} or -c}
For example,
{title: In a World of}
To end up being
{title: In a World of_c}
Originally I started with
find . -type f -exec sed -i 's/title: /title c_: /g' {} ;
but decided I wanted to end up with
{title: In a World of_c}
Instead of
{title: c_In a World of}
Any help appreciated

Related

AWK\SED Replace both ^(beginning) and $(end) of a string in a single command

I've been looking around but couldn't find a way to do it with both AWK and SED.
I was wondering if there's a way to replace a string's start and end in a single command.
more specifically, there's a file with a lot of words in it, and I would like to add something before the word and after the word.
Thanks,
Roy
Since you said: more specifically, there's a file with a lot of words in it, and I would like to add something before the word and after the word.
The only thing you need is $& that is match itself. So you simply can write anything that you want just before and end of this whildcard. that's it.
For example say you have this file:
this is line 1.
this is line 2.
this is line 3.
And I tested with perl:
perl -lne 'print "beginning->", $&, "<-end" if /.+/g' file
which the output is:
beginning->this is line 1.<-end
beginning->this is line 2.<-end
beginning->this is line 3.<-end
May you would like to match only one word, so still this is a good solution such as:
perl -lne 'use English; print "$PREMATCH", "[$MATCH]","$POSTMATCH" if /line/g' file
Here I matched line and put around that: [ then $& then ]
the output
this is [line] 1.
this is [line] 2.
this is [line] 3.
NOTE
As you can see the only things you need just are prematch and match and postmatch. I tested it with perl for you, and if you are interesting in Perl you can use it or may you want to use Sed or Awk. Since you have no specific examples I tested with Perl.
If you want to wrap a particular word with markers you can use & in the replacement string to achieve what you want.
For example to put square brackets around every occurrence of the word bird:
$ echo "hello bird, are you really a bird?" | sed "s/\bbird\b/[&]/g"
hello [bird], are you really a [bird]?
to replace a string's start and end in a single command
Let's say we have a test file with line:
tag hello, world tag
To enclose each tag word with angle brackets < ... > we can apply:
awk approach with gsub() function:
awk '{ gsub(/\<tag\>/, "<&>"); print}' test_file
word boundaries \<, \> may differ depending on awk implementations
sed approach:
sed 's/\btag\b/<&>/g' test_file
The output(for both approaches):
<tag> hello, world <tag>

perl oneliner search replace pattern

I have thousands of rows where some contain the following:
id_s,title_dk
KKS2826,"Søslag ved Øland og Gulland, 1564",12312,2x2
KKS935,"Vignet til Edvard Brandes, afhandling om Johan Wiehe", 1233, 4x4
I'm looking for a Perl one-liner where I can delete any comma that might occur within quotations (the second column). But obviously not the others, wince they are delimiters.
So desired output would be:
id_s,title_dk
KKS2826,"Søslag ved Øland og Gulland 1564",12312,2x2
KKS935,"Vignet til Edvard Brandes afhandling om Johan Wiehe", 1233, 4x4
I have been playing with this: perl -ne 's/(?<!,),//g; print;'
But I can't figure out how to keep the other commas.
Easy using Text::CSV_XS:
perl -CS -MText::CSV_XS=csv -we '
my $aoa = csv(in => shift, allow_whitespace => 1);
$_->[1] =~ s/,//g for #$aoa;
csv(in => $aoa, out => *STDOUT, always_quote => 0);
' input.csv > output.csv
Try this one liner
perl -p -e 's/"([^"]*)"/my $m=$1;$m=~ s:,::g; $m /eg' file.txt
As per Borodin comment script updated. Because above script will remove the " also.
perl -p -e 's/ (?<=") ([^"]*) (?<=")/$1=~ s:,::rg; /xeg' file.txt
In second one, I used positive look ahead and look behind. With non-destructive modifier (r). Non-destructive modifier will work on only > 5.14.

How can I use sed to find a line starting with AAA but NOT end with BBB

I'm trying to create a script to append oracleserver to /etc/hosts as an alias of localhost. Which means I need to:
Locate the line that ^127.0.0.1 and NOT oracleserver$
Then, append oracleserver to this line
I know the best practice is probably using negative look ahead. However, sed does not have look around feature: What's wrong with my lookahead regex in GNU sed?. Can anyone provide me some possible solutions?
sed -i '/oracleserver$/! s/^127\.0\.0\.1.*$/& oracleserver/' filename
/oracleserver$/! - on lines not ending with oracleserver
^127\.0\.0\.1.*$ - replace the whole line if it is starting with 127.0.0.1
& oracleserver - with the line plus a space separator ' ' (required) and oracleserver after that
Just use awk with && to combine the two conditions:
awk '/^127\.0\.0\.1/ && !/oracleserver$/ { $0 = $0 "oracleserver" } 1' file
This appends the string when the first pattern is matched but the second one isn't. The 1 at the end is always true, so awk prints each line (the default action is { print }).
I wouldn't use sed but instead perl:
Locate the line that ^127.0.0.1 and NOT oracleserver$
perl -pe 'if ( m/^127\.0\.0\.1/ and not m/oracleserver$/ ) { s/$/oracleserver/ }'
Should do the trick. You can add -i.bak to inplace edit too.

Extracting string from html file or curl output

I have a html file where some of them are "minified", this means that a whole website can be in just one line.
I want to filter the value of ?idsite= which contains numbers. So a html contains something like this: img src="//stats.domains.com/piwik.php?idsite=44.
So the plain output should be "44".
I tried grep but it echos the whole line and just highlights the value.
With perl it could be something like:
echo "Whole bunch of stuff \
img src=\"stats.domains.com/piwik.php?idsite=44\" " \
| perl -nE 'say /.*idsite=(..)\"/ '
(assumes that idsite is always two characters ! :-). Your regex will need to be more sophisticated than this most likely).
Putting the snippet from the page you reference above in an HTML file (non-minified) and subsituting 44 for the parameter variable, this bit of perl will extract the "44":
perl -nE 'say /.*idsite=(..)/ if /idsite/ ' idsite.html
Translating the one liner to a sed command line would be similar:
echo "Whole bunch of stuff \
img src=\"stats.domains.com/piwik.php?idsite=44\" " \
| sed -En "s/^.*idsite=(..)\"/\1/p"
This is POSIXsed from FreeBSD (should work on OSX) the -E switch is to add "modern" regexes.
Doing it in awk is left as an exercise for another community member :-)
Here is a perl way to extract only the trailing digits of strings like src="//stats.domains.com/piwik.php?idsite=44" and run on a bash command line:
echo $src|perl -ne '$_ =~m /(\d+$)/; print $1'
Here is a python way to do the same thing:
import re
print ', '.join( re.findall(r'\d+$', src))
If there will be a lot of src strings to process, it would be best to compile the regex when using Python as follows:
import re
p = re.compile('\d+$')
print ', '.join(p.findall(src))
The import and the compilation only have to be done once.
Here is a Ruby way to do it:
puts src.scan( /\d+$/ ).first
In all cases the regexes end with "$" which matches the end of the string. That is why they match and extract only digits (\d+) at the end of the string.
If you don't need to check whether the idsite is in the value of a src attribute, then all you need is
perl -nE'say $1 if /\bidsite=(\d+)' myfile.html
$ cat site.html
lorem ipsum idsite='4934' fasdf a
other line
$ sed -n '/idsite/ { s/.*idsite=\([0-9]\+\).*$/\1/; p }' < site.html
4934
Let me know in case you need an explanation of what is going on.

unix : search a file if a string is present between two patterns

I have a file, having a format, given below. I want to search if a word for e.g. 'hello' is present in line following schema and before the DocName. If it is present, how many such schema's have it?
How can I do this in one line using grep/awk/sed?
The expected output is: assuming I am searching if word 'hello' is present, then in this case it is present in 1st, 2nd and 4th schema, so the output is 3, since we have three 'hello' present in three schemas. Note even if there are multiple occurrences of 'hello' in first schema, it is still counted as one.
:
:
:
DocName: abjrkj.txt
schema:
abs
askj
djsk
djsk
hello
adj
hello
DocName: abjrkj.txt
schema:
abs
askj
djsk
djsk
adj
hello
DocName: aasjrkj.txt
schema:
absasd
askjas
djsksa
djskasd
adjsg
DocName: ghhd.txt
schema:
absg
fdgaskj
dgdjsk
dgdfdjsk
drgadj
hello
:
:
:
Try this.
awk -F '^DocName:' '/hello/ { ++i }
END { print i }' file
If you absolutely require a one-line solution (why??) the whitespace can be condensed to just one space.
Here is sed solution:
sed ':a; N; s/\n/ /; $!ba; s/DocName/\n&/g' < file | sed -n '/DocName/{/hello/p}' | wc
This is algorithm: It puts whole file in pattern space with replacing all \n characters with space. Then before every DocName string puts \n. After that is piping throw searching Docname & hello finally prints 3 numbers from which first is asked. If you want to see printed lines omit | wc piping for test reasons. Maybe more elegant sed solution exists playing with pattern & hold space!
Since your input file has schemas separated by blank lines you can use awk in paragraph mode and then it's simply:
$ awk -v RS= '/hello/{++c} END{print c}' file
3