Similar questions have been asked but they are for Powershell.
I have a Markdown file like:
.
.
.
## See also
- [a](./A.md)
- [A Child](./AChild.md)
.
.
.
- [b](./B.md)
.
.
.
## Introduction
.
.
.
I wish to replace all occurrences of .md) with .html) between ## See also and ## Introduction :
.
.
.
## See also
- [a](./A.html)
- [A Child](./AChild.html)
.
.
.
- [b](./B.html)
.
.
.
## Introduction
.
.
.
I tried like this in Bash
orig="\.md)"; new="\.html)"; sed "s~$orig~$new~" t.md -i
But, this replaces everywhere in the file. But I wish that the replacement happens only between ## See also and ## Introduction
Could you please suggest changes? I am using awk and sed as I am little familiar with those. I also know a little Python, is it recommended to do such scripting in Python (if it is too complicated for sed or awk)?
$ sed '/## See also/,/## Introduction/s/\.md/.html/g' file
Related
I am having a XML similar to this
<Level1Node>
.
.
<Level2Node val="Retain"/>
.
.
</Level1Node>
<Level1Node>
.
.
<Level2Node val="Replace"/>
.
.
</Level1Node>
<Level1Node>
.
.
<Level2Node val="Retain"/>
.
.
</Level1Node>
I need to remove only the below node,
<Level1Node>
.
.
<Level2Node val="Replace"/>
.
.
</Level1Node>
To have it replaced in non-greedy manner, I used the below regex,
perl -0 -pe "s|<Level1Node>.*?<Level2Node val="Retain"/>.*?</Level1Node>||gs" myxmlfile
But the non-geedy terminates the match only at the end of the pattern, not at the start. How to get it started at the last match of <Level1Node>
You will need to use a negative lookahead to make sure you do not match closing Level1Node tags where you don't want to:
perl -0 -pe 's|<Level1Node>(?:(?!<\/Level1Node>).)*<Level2Node val="Retain"\/>(?:(?!<\/Level1Node>).)*<\/Level1Node>||gs' tmp.txt
Details:
<Level1Node>
(?:(?!<\/Level1Node>).)* # Everything except </Level1Node>
<Level2Node val="Retain"\/>
(?:(?!<\/Level1Node>).)* # Everything except </Level1Node>
<\/Level1Node>
?: is only here so that the parenthesis are not interpreter as a capturing group.
If you plan to run this on a large file, you should probably check the cost of the negative lookahead, it might be high.
Use a proper parser! It's way simpler.
perl -MXML::LibXML -e'
my $doc = XML::LibXML->new->parse_file($ARGV[0]);
$_->unbindNode() for $doc->findnodes(q{//Level1Node[Level2Node[#val!="Retain"]]});
$doc->toFH(\*STDOUT);
' tmp.txt
I have following files from 2 different categories :
Category 1 :
MAA
MAB
MAC
MAD
MAE
MAF
MAG
MAH
MAJ
MBA
MBB
MBC
MBD
MBE
MDA
MDD
and Category 2 :
MCA
MCB
MCC
MCD
MCE
MCF
MCG
MDB
So my question is : How can I write regular expression so that I can find files from category 1 only ?
I don't want to do hard coded script, expecting some logic from brilliant people.
I am trying this :
find . -regex "*[M][A,B,D][A,B,C,D,E,F,J].txt"
It's quite simple :
ls -l | grep "MAA\|MAB\|MAC\|MAD\|MAE\|MAF\|MAG\|MAH\|MAJ\|MBA\|MBB\|MBC\|MBD MBE\|MDA\|MDD"
Ok so you don't want hardcoded. Then yes you should state the patterns which should NOT match -v
ls -l | grep -v "MC." | grep -v "pattern2" | ....
Your question is not very precise, but from your attempt, I conclude, that you are looking for files having names ending in ....MAA.txt, ...MAB.txt and so on, and being located in either your working directory or somewhere below.
You also didn't mention, which shell you are using. Here is an example using zsh - no need to write a regular expression here:
ls ./**/*M{AA,AB,AC,AD,AE,AF,AG,AH,AJ,BA,BB,BC,BD,BE,DA,DD}.txt
I am trying this : find . -regex "*[M][A,B,D][A,B,C,D,E,F,J].txt"
The errors in this are:
The wildcard for any characters in a regex is .*, unlike just * in a normal filename pattern.
You forgot G and H in the third bracket expression.
You didn't exclude the category 2 name MDB.
Besides:
The characters of a bracket expression are not to be separated by ,.
A bracket expression with a single item ([M]) can be replaced by just the item (M).
This leads to:
find . -regex ".*M[ABD].*" -not -name "MDB*"
or, without regex:
find . -name "M[ABD]*" -not -name "MDB*"
So I have two functions in my vimrc which I use a lot:
function! FindAndReplaceAllConfirm(from, to)
exec '%s/' . a:from . '/' . a:to . '/gc'
endfunction
function! FindAndReplaceAll(from, to)
exec '%s/' . a:from . '/' . a:to . '/g'
endfunction
The problem is consider if I'm replacing Foo with FooBar. Sometimes I already have FooBar in the file and I don't want FooBar becoming FooFooBar. How does one exclude patches like this.
You can add word boundaries \< and \> to match and replace only exact words as in the following function:
function! FindAndReplaceAll(from, to)
exec '%s/\<' . a:from . '\>/' . a:to . '/g'
endfunction
I have variable names ending with an underscore (_), followed by a year code:
clear
set obs 1
foreach var in age_58 age_64 age_75 age_184 age_93 age99 {
generate `var' = rnormal()
}
list
+----------------------------------------------------------------------+
| age_58 age_64 age_75 age_184 age_93 age99 |
|----------------------------------------------------------------------|
1. | .1162236 -.8781271 1.199268 -1.475732 .9077238 -.0858719 |
+----------------------------------------------------------------------+
I would like to rename them into:
age58 age64 age75 age184 age93 age99
I know I can do this by renaming one variable at a time as follows:
rename age_58 age58
rename age_64 age64
rename age_75 age75
rename age_184 age184
rename age_93 age93
How can I remove the underscore from all the variable names at once?
In Stata 13 and later versions, this can be done in one line using the built-in command rename.
One merely has to specify the relevant rules, which can include wildcard characters:
rename *_# *#
list
+----------------------------------------------------------------------+
| age58 age64 age75 age184 age93 age99 |
|----------------------------------------------------------------------|
1. | .1162236 -.8781271 1.199268 -1.475732 .9077238 -.0858719 |
+----------------------------------------------------------------------+
Type help rename group for details on the various available specifiers.
For Stata 8 up, the community-contributed command renvars offers a solution:
renvars age_*, subst(_)
For documentation and download, see
. search renvars, historical
Search of official help files, FAQs, Examples, SJs, and STBs
SJ-5-4 dm88_1 . . . . . . . . . . . . . . . . . Software update for renvars
(help renvars if installed) . . . . . . . . . N. J. Cox and J. Weesie
Q4/05 SJ 5(4):607
trimend() option added and help file updated
STB-60 dm88 . . . . . . . . Renaming variables, multiply and systematically
(help renvars if installed) . . . . . . . . . N. J. Cox and J. Weesie
3/01 pp.4--6; STB Reprints Vol 10, pp.41--44
renames variables by changing prefixes, postfixes, substrings,
or as specified by a user supplied rule
For the 2001 paper, see this .pdf file.
You can loop over the variables using the macro extended function subinstr:
foreach var of varlist * {
local newname : subinstr local var "_" "", all
if "`newname'" != "`var'" {
rename `var' `newname'
}
}
I have a text file for IPCONFIG command, and am interested to obtain value for HOST NAME i.e. S4333AAB45 utilizing REGEX.
Windows IP Configuration
Host Name . . . . . . . . . . . . : S4333AAB45
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
I tried following option and it didn't work
/\bHost Name\s+(\d+)/
Here is what I would use:
/\s+Host Name.*: (\w+)$/
Use Field Splitting with AWK
You don't say what regular expression engine you're using, or why you need to use a regular expression to match the host name portion. If you have access to AWK, you can treat this as a field-splitting issue instead. For example:
awk '/\<Host Name\>/ { print $NF }' /tmp/foo
Use Known Line Positions
Assuming you've got Cygwin or similar installed, you can use the position of the interesting record to get the data you want without a regular expression at all. For example:
cat /tmp/foo | head -n3 | cut -d: -f2 | tr -d ' '
Just replace the cat command with your call to ipconfig instead, and you should get the results you want.
Use sed Instead
You can also use sed to find the line you're interested in, and print out just the trailing word on the line. For example:
sed -n '/\<Host Name\>/ s/.*[[:space:]]\([[:alnum:]]\+\)$/\1/p' /tmp/foo
Your host had a letter "S" as the first character of the host name, so "(\d+)" wouldn't be correct for matching your host name. You also failed to account for the dots and colon on the host name line. So the answer from weexpectedTHIS should do the trick. But for your information, here's how you could get the host name without first creating an intermediate file.
$ipconfig = `ipconfig /all`;
($host) = $ipconfig =~ /^\s*Host Name.*:\s*(\w+)/m;
You would need the "/m" in there so that the "^" will match the start of any line in the multi-line contents of $ipconfig. I tend to use "\s*" instead of "\s+" as a sort of insurance against future changes in the output format (where white space is often removed or expanded in newer versions of a command).