Why I cannot match certain string with ( | ) in regex - regex

I have a question about matching a pattern of string
I want copy certain file with some identification characters
For example:
20190108JPYUSDabced.csv
20190108CHNUSDabced.csv
20190108IJKUSDabcde.csv
So I want to used command to just copy the first 2 files
cp 20190108(JPY|CHN)USDabced.csv
Does not work.
Received error:
-bash: syntax error near unexpected token ‘(‘

bash brace expansion is for this
$ cp 20190108{JPY,CHN}USDabced.csv

Related

Why am I getting the error: Unmatched ( in regex; marked by <-- HERE?

I am having trouble figuring out why am I getting the error defined in the title.
This is the line of code I'm inputting into the command line:
perl -pi -e 's/(\/(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)' myfilepath
Basically, what I'm trying to do is go through a body of text, find all the URLS and append something to the end of the domain. For example:
https://thisisalink.com/navigate/page <-- I want to ignore the ]
I keep getting this error when I run that code though:
Unmatched [ in regex; marked by <-- HERE in *)|[ <-- HERE A-Z0-9+&##%=~_|5.030003)/gxi)/ at -e line 1, <> line 1.
How to fix this issue?
$] is a special variable that contains the current version of the Perl interpreter used. Hence, [A-Z0 9+&##%=~_|$] is interpolated as [A-Z0 9+&##%=~_|5.032001 (on my Perl 5.32.1), and the opening [ is thus unmatched. To fix this, escape the $ using \$:
[A-Z0 9+&##%=~_|\$]
Similarly, earlier in the regex, you are using [...$?...], except that $? is also a special variable containing The status returned by the last pipe close, backtick (``) command, successful call to wait() or waitpid(), or from the system() operator. This does not cause any error since it should be an integer, but it will no match either $ or ? as you'd like. Once again, escape the $ using \$?.
In general, when you want to match a literal $, you should probably escape it.

Need to extract entry names from file to populate list or variable

I have this config file with entry names encased in brackets: []. I need to extract each entry name into a list or variable to be used in a for loop. Still new and fumbling with some commands. I have a feeling grep is my answer but I don't know where to start. Any help would be appreciated.
[dropbox]
type = dropbox
scope = dropbox
token = {"access_token":"my_token"}
[drive2]
type = drive
scope = drive
token = {"access_token":"other_token"}
You can use sed:
sed -rn 's/(^\[)(.*)(\]$)/\2/p' configfile
Enable regex with -r. Split each line of the file (configfile) into three sections - start of line,[ then anything (.*) and then ], end of line. Substitute the whole line for just the second section and print.
You can use GNU grep:
echo "[dropbox]\ntype = dropbox" | grep -Po '\[\K[^\]]*'
# Prints: dropbox
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only, 1 match/line, not the entire lines.
\[\K[^\]]* : literal [, escaped, which is followed by the special character \K that tells the regex engine to pretend that the match starts at that point, which is followed by any non-] character, repeated 0 or more times ([^\]]*).
SEE ALSO:
grep manual

git grep <regex containing newline>

I'm trying to grep all line breaks after some binary operators in a project using git bash on a Windows machine.
Tried the following commands which did not work:
$ git grep "[+-*\|%]\ *\n"
fatal: command line, '[+-*\|%]\ *\n': Invalid range end
$ git grep "[+\-*\|%]\ *\n"
fatal: command line, '[+\-*\|%]\ *\n': Invalid range end
OK, I don't know how to include "-" in a character set, but still after removing it the \n matches the character n literally:
$ git grep "[+*%] *\n"
somefile.py: self[:] = '|' + name + '='
^^^
Escaping the backslash once (\\n) has no effect, and escaping it twice (\\\n) causes the regex to match \n (literally).
What is the correct way to grep here?
I don't know how to include "-" in a character set
There is no need to escape the dash character (-) if you want to include it in a character set. If you put it the first or the last character in set it doesn't have its special meaning.
Also, there is no need to escape | inside a character range. Apart from ^ (when it's the first character in the range), - (when it is not the first or the last character in the range), ] and \ (when it is used to escape ]), all other characters have their literal meaning (i.e no special meaning) in a character range.
There is also no need to put \n in the regexp. The grepping tools, by default, try to match the regexp against one row at a time and git grep does the same. If you need to match the regexp only at the end of line then put $ (the end of line anchor) as the last character of the regexp.
Your regexp should be [-+*|%] *$.
Put together, the complete command line is:
git grep '[-+*|%] *$'
How to find a newline in the middle of a line
For lack of better option I think I'll start with:
sudo apt install pcregrep
git grep --cached -Il '' | xargs pcregrep -Mb 'y\nl'
this combines:
How to list all text (non-binary) files in a git repository?
https://unix.stackexchange.com/questions/112132/how-can-i-grep-patterns-across-multiple-lines/112134#112134
The output clearly shows the filename and line number, e.g.:
myfile.txt:123:my
love
myfile.txt:234:my
life
otherfile.txt:11:my
lion
Tested on Ubuntu 22.04.

sed randomized last digits using expression

I need to parse a file and randomized the last digits for a given string when the pattern is found.
I am able to perform the desired result when using a simple case but it fails for a more complex case.
I am wondering what is wrong with the second case.
This example here works.
echo 'AB111-1-13' | sed 's/\(AB111\)-\([0-9]*\)-\([0-9]*\)/echo \1-\2-$(echo \3*$RANDOM | bc )/ge'
But this one doesn't work.
echo '<http://name/link#AB111-1-13>' | sed 's/\(AB111\)-\([0-9]*\)-\([0-9]*\)/echo \1-\2-$(echo \3*$RANDOM | bc )/ge'
Any ideas?
EDIT
This is the error message when trying to run the second example.
sh: -c: line 0: syntax error near unexpected token newline'
sh: -c: line 0:'
The GNU sed e flag executes the pattern space as a shell command.
In your first example your pattern space starts as AB111-1-13 and becomes echo AB111-1-$(echo 13*$RANDOM | bc ) which is a valid shell command and gets executed. (I should point out that bc is entirely unnecessary here as the shell can perform integer arithmetic just fine by itself echo $((13 * RANDOM)).)
But in your second example you pattern space starts as <http://name/link#AB111-1-13> and becomes <http://name/link#echo AB111-1-$(echo 13*$RANDOM | bc )> which is very much not a valid shell command and so, presumably, you get a shell error (would have been good of you to include it in the question though) when it tries to get executed.
So don't use sed for this. Use something that can evaluate arbitrary expressions like awk or perl or python, etc.

Unkown opition while trying to use sed with a regular expression to replace a string

I am trying to replace all the patterns in a file that are of the following form:
> hello, test< by ><link>hello, test</link><
For this purpose I used the following command:
sed -i 's/>[a-zA-Z0-9_ ]*</><link>\1<link></g' finename.txt
>[a-zA-Z0-9_ ]*<: find all the alphanumerical patterns between ><
<link>\1<link>: Replace them with ><link>pattern</link><
However I am seeing the following error message
character 37. Reference \ 1 invalid in the right side of the control
What's wrong with the expression?
The \1 references the first capture group and you don't have any capture groups set up hence \1 is invalid. What you want is:
$ echo ">hello, test<" | sed -r 's/>([a-zA-Z0-9,_ ]*)</><link>\1<link></g'
><link>hello, test<link><
The use of capture groups is part of extended regular expressions so you will need to use the -r argument. Also note in your example input you have a , but you don't include that character in your character class.