Editing this Script to my needs

Editing this Script to my needs - regex

I want to use this Script to build a custom Wordlist.
Wordlist Script
This Script will build a Wordlist with only loweralpha Chars. But i want lower/upper Chars and Numbers.
The Output should be like this example:
test
123test
test123
Test
123Test
Test123
I dont know how to change it. I would be really happy if you could help me out with this.
I tried some tutorials for grep and regex but i dont understand anything.

Replace the line 18 of the script
page=`grep '' -R "./temp/" | sed -e :a -e 's/<[^>]*>//g;/</N;//ba' | tr " " "\n" | tr '[:upper:]' '[:lower:]' | sed -e '/[^a-zA-Z]/d' -e '/^.\{9,25\}$/!d' | sort -u`;
With this:
page=`grep '' -R "./temp/" | sed -e :a -e 's/<[^>]*>//g;/</N;//ba' | tr " " "\n" | sort -u`;
If you have a look at it, you can see how it
replaces " " with "\n",
changes cases
filters by length
sorts
You can remove bits from that pipe chain and see how the output changes

delete this bit from the script:
tr '[:upper:]' '[:lower:]' |
that will leave case alone.
there's also a bit in wordlist.sh that only selects words from 9 to 25 characters which you could delete, or change if you prefer a different range:
`sed -e '/[^a-zA-Z]/d' -e '/^.\{9,25\}$/!d' |`
or you could try a simpler strategy: download and install w3m, a command-line web browser, and replace the complicated line in wordlist.sh with this:
page=`grep '' -R "./temp/" | w3m -dump wikipedia.org | grep -o '\w\+' | sort -u`
the grep is (a weird) way to get all the text from the html files, then w3m -dump gets rid of all the html tags and other non-display stuff, and grep -o '\w\+' matches any word.

Related

how to sed for pattern before and after match

I currently am trying to get specific parameters from a url.
My url looks like: https://private.io/report-artifact/dsop-pipeline-artifacts/container-scan-reports/redhat/ubi/ubi7/7.8/2020-02-14T222203.548_2868/ubi7-7.8.tar
I want just redhat/ubi/ubi7/7.8
I can get redhat/ubi/ubi7/7.8/2020-02-14T222203.548_2868/ubi7-7.8.tar by doing,
echo https://private.io/report-artifact/dsop-pipeline-artifacts/container-scan-reports/redhat/ubi/ubi7/7.8/2020-02-14T222203.548_2868/ubi7-7.8.tar | sed 's|.*/container-scan-reports/||'
Thus I want to remove /2020-02-14T222203.548_2868/ubi7-7.8.tar
I also would like to change the / to a - so that I have redhat-ubi-ubi7-7.8

With GNU sed:
Get the 4 following path elements after .*/container-scan-reports/ and replace all / with -:
url='https://private.io/report-artifact/dsop-pipeline-artifacts/container-scan-reports/redhat/ubi/ubi7/7.8/2020-02-14T222203.548_2868/ubi7-7.8.tar'
echo "$url" | sed -E 's|.*/container-scan-reports/(([^/]*/){3}[^/]*).*|\1|;s|/|-|g'
Or you could get everything after .*/container-scan-reports/, but not the last two path elements:
echo "$url" | sed -E 's|.*/container-scan-reports/(.*)/[^/]*/[^/]*|\1|;s|/|-|g'

When you know the position in the string you can use cut
echo "${string}" | cut -d/ -f 7-10 | tr '/' '-'
Another way with sed is
echo "${string}" | sed -E 's#([^/]*/){6}([^/]*)/([^/]*)/([^/]*)/([^/]*).*#\2-\3-\4-\5#'

Using sed captured group variable as input for bash command

I have text like:
TEXT="I need to replace the hostname [[google.com]] with it's ip in side the text"
Is there a way to use something like below, but working?
sed -Ee "s/\[\[(.*)\]\]/`host -t A \1 | rev | cut -d " " -f1 | rev`/g" <<< $TEXT
looks like the value of \1 is not being passed to the shell command used inside sed.
Thanks

Backquote interpolation is performed by the shell, not by sed. This means that your backquotes will either be replaced by the output of a command before the sed command is run, or (if you correctly quote them) they will not be replaced at all, and sed will see the backquotes.
You appear to be trying to have sed perform a replacement, then have the shell perform backquote interpolation.
You can get the backquotes past the shell by quoting them properly:
$ echo "" | sed -e 's/^/`hostname`/'
`hostname`
However, in that case you will have to use the resulting string in a shell command line to cause backquote interpolation again.
Depending on how you feel about awk, perl, or python, I'd suggest you use one of them to do this job in a single pass. Alternatively, you could make a first pass extracting the hostnames into a command without backquotes, then execute the commands to get the IP addresses you want, then replace them in another pass.

It's got to be a two part command, one to get a variable that bash can use, the other to do a straight-up /s/ replacement with sed.
TEXT="I need to replace the hostname [[google.com]] with it's ip in side the text"
DOMAIN=$(echo $TEXT | sed -e 's/^.*\[\[//' -e 's/\]\].*$//')
echo $TEXT | sed -e 's/\[\[.*\]\]/'$(host -tA $DOMAIN | rev | cut -d " " -f1 | rev)'/'
But, more cleanly using how to split a string in shell and get the last field
TEXT="I need to replace the hostname [[google.com]] with it's ip in side the text"
DOMAIN=$(echo $TEXT | sed -e 's/^.*\[\[//' -e 's/\]\].*$//')
HOSTLOOKUP=$(host -tA $DOMAIN)
echo $TEXT | sed -e 's/\[\[.*\]\]/'${HOSTLOOKUP##* }/
The short version is that you can't mix sed and bash the way you're expecting to.

This works:
#!/bin/bash
txt="I need to replace the hostname [[google.com]] with it's ip in side the text"
host_name=$(sed -E 's/^[^[]*\[\[//; s/^(.*)\]\].*$/\1/' <<<"$txt")
ip_addr=$(host -tA "$host_name" | sed -E 's/.* ([0-9.]*)$/\1/')
echo "$txt" | sed -E 's/\[\[.*\]\]/'"$ip_addr/"
# I need to replace the hostname 172.217.4.174 with it's ip in side the text

Thank you all,
I made the below solution:
function host_to_ip () {
echo $(host -t A $1 | head -n 1 | rev | cut -d" " -f1 | rev)
}
function resolve_hosts () {
local host_placeholders=$(grep -o -e "##.*##" $1)
for HOST in ${host_placeholders[#]}
do
sed -i -e "s/$HOST/$(host_to_ip $(sed -Ee 's/##(.*)##/\1/g' <<< $HOST))/g" $1
done
}
Where resolve_hosts gets a text file as an argument

Regex w/grep against tnsnames.ora

I am trying to print out the contents of a TNS entry from the tnsnames.ora file to make sure it is correct from an Oracle RAC environment.
So if I do something like:
grep -A 4 "mydb.mydomain.com" $ORACLE_HOME/network/admin/tnsnames.ora
I will get back:
mydb.mydomain.com =
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)(HOST = myhost.mydomain.com)(PORT = 1521))
  (CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME=mydb)))
Which is what I want. Now I have an environment variable being set for the JDBC connection string by an external program when the shell script gets called like:
export $DB_URL=#myhost.mydomain.com:1521/mydb
So I need to get TNS alias mydb.mydomain.com out of the above string. I'm not sure how to do multiple matches and reorder the matches with regex and need some help.
grep #.+: $DB_URL
I assume will get the
#myhost.mydomain.com:
but I'm looking for
mydb.mydomain.com
So I'm stuck at this part. How do I get the TNS alias and then pipe/combine it with the initial grep to display the text for the TNS entry?
Thanks
update:
#mklement0 #Walter A - I tried your ways but they are not exactly what I was looking for.
echo "#myhost.mydomain.com:1521/mydb" | grep -Po "#\K[^:]*"
echo "#myhost.mydomain.com:1521/mydb" | sed 's/.*#\(.*\):.*/\1/'
echo "#myhost.mydomain.com:1521/mydb" | cut -d"#" -f2 | cut -d":" -f1
echo "#myhost.mydomain.com:1521/mydb" | tr "#:" "\t" | cut -f2
echo "#myhost.mydomain.com:1521/mydb" | awk -F'[#:]' '{ print $2 }'
All these methods get me back: myhost.mydomain.com
What I am looking for is actually: mydb.mydomain.com

Note:
- For brevity, the commands below use bash/ksh/zsh here-string syntax to send strings to stdin (<<<"$var"). If your shell doesn't support this, use printf %s "$var" | ... instead.
The following awk command will extract the desired string (mydb.mydomain.com) from $DB_URL (#myhost.mydomain.com:1521/mydb):
awk -F '[#:/]' '{ sub("^[^.]+", "", $2); print $4 $2 }' <<<"$DB_URL"
-F'[#:/]' tells awk to split the input into fields by either # or : or /. With your input, this means that the field of interest are part of the second field ($2) and the fourth field ($4). The sub() call removes the first .-based component from $2, and the print call pieces together the result.
To put it all together:
domain=$(awk -F '[#:/]' '{ sub("^[^.]+", "", $2); print $4 $2 }' <<<"$DB_URL")
grep -F -A 4 "$domain" "$ORACLE_HOME/network/admin/tnsnames.ora"
You don't strictly need intermediate variable $domain, but I've added it for clarity.
Note how -F was added to grep to specify that the search term should be treated as a literal, so that characters such as . aren't treated as regex metacharacters.
Alternatively, for more robust matching, use a regex that is anchored to the start of the line with ^, and \-escape the . chars (using shell parameter expansion) to ensure their treatment as literals:
grep -A 4 "^${domain//./\.}" "$ORACLE_HOME/network/admin/tnsnames.ora"

You can get a part of a string with
# Only GNU-grep
echo "#myhost.mydomain.com:1521/mydb" | grep -Po "#\K[^:]*"
# or
echo "#myhost.mydomain.com:1521/mydb" | sed 's/.*#\(.*\):.*/\1/'
# or
echo "#myhost.mydomain.com:1521/mydb" | cut -d"#" -f2 | cut -d":" -f1
# or, when the string already is in a var
echo "${DB_URL#*#}" | cut -d":" -f1
# or using a temp var
tmpvar="${DB_URL#*#}"
echo "${tmpvar%:*}"
I had skipped the alternative awk, that was given by #mklement0 already:
echo "#myhost.mydomain.com:1521/mydb" | awk -F'[#:]' '{ print $2 }'
The awk solution is straight-forward, when you want to use the same approach without awk you can do something like
echo "#myhost.mydomain.com:1521/mydb" | tr "#:" "\t" | cut -f2
or the ugly
echo "#myhost.mydomain.com:1521/mydb" | (IFS='#:' read -r _ url _; echo "$url")
What is happening here?
After introducing the new IFS I want to take the second word of the input. The first and third word(s) are caught in the dummy var's _ (you could have named them dummyvar1 and dummyvar2). The pipe | creates a subprocess, so you need ()to hold reading and displaying the var url in the same process.

Escape dollar sign in regexp for sed

I will introduce what my question is about before actually asking - feel free to skip this section!
Some background info about my setup
To update files manually in a software system, I am creating a bash script to remove all files that are not present in the new version, using diff:
for i in $(diff -r old new 2>/dev/null | grep "Only in old" | cut -d "/" -f 3- | sed "s/: /\//g"); do echo "rm -f $i" >> REMOVEOLDFILES.sh; done
This works fine. However, apparently my files often have a dollar sign ($) in the filename, this is due to some permutations of the GWT framework. Here is one example line from the above created bash script:
rm -f var/lib/tomcat7/webapps/ROOT/WEB-INF/classes/ExampleFile$3$1$1$1$2$1$1.class
Executing this script would not remove the wanted files, because bash reads these as argument variables. Hence I have to escape the dollar signs with "\$".
My actual question
I now want to add a sed-Command in the aforementioned pipeline, replacing this dollar sign. As a matter of fact, sed also reads the dollar sign as special character for regular expressions, so obviously I have to escape it as well.
But somehow this doesn't work and I could not find an explanation after googling a lot.
Here are some variations I have tried:
echo "Bla$bla" | sed "s/\$/2/g" # Output: Bla2
echo "Bla$bla" | sed 's/$$/2/g' # Output: Bla
echo "Bla$bla" | sed 's/\\$/2/g' # Output: Bla
echo "Bla$bla" | sed 's/#"\$"/2/g' # Output: Bla
echo "Bla$bla" | sed 's/\\\$/2/g' # Output: Bla
The desired output in this example should be "Bla2bla".
What am I missing?
I am using GNU sed 4.2.2
EDIT
I just realized, that the above example is wrong to begin with - the echo command already interprets the $ as a variable and the following sed doesn't get it anyway... Here a proper example:
Create a textfile test with the content bla$bla
cat test gives bla$bla
cat test | sed "s/$/2/g" gives bla$bla2
cat test | sed "s/\$/2/g" gives bla$bla2
cat test | sed "s/\\$/2/g" gives bla2bla
Hence, the last version is the answer. Remember: when testing, first make sure your test is correct, before you question the test object........

The correct way to escape a dollar sign in regular expressions for sed is double-backslash. Then, for creating the escaped version in the output, we need some additional slashes:
cat filenames.txt | sed "s/\\$/\\\\$/g" > escaped-filenames.txt
Yep, that's four backslashes in a row. This creates the required changes: a filename like bla$1$2.class would then change to bla\$1\$2.class.
This I can then insert into the full pipeline:
for i in $(diff -r old new 2>/dev/null | grep "Only in old" | cut -d "/" -f 3- | sed "s/: /\//g" | sed "s/\\$/\\\\$/g"; do echo "rm -f $i" >> REMOVEOLDFILES.sh; done
Alternative to solve the background problem
chepner posted an alternative to solve the backround problem by simply adding single-quotes around the filenames for the output. This way, the $-signs are not read as variables by bash when executing the script and the files are also properly removed:
for i in $(diff -r old new 2>/dev/null | grep "Only in old" | cut -d "/" -f 3- | sed "s/: /\//g"); do echo "rm -f '$i'" >> REMOVEOLDFILES.sh; done
(note the changed echo "rm -f '$i'" in that line)

There are other problems with your script, but file names containing $ are not a problem if you properly quote the argument to rm in the resulting script.
echo "rm -f '$i'" >> REMOVEOLDFILES.sh
or using printf, which makes quoting a little nicer and is more portable:
printf "rm -f '%s'" "$i" >> REMOVEOLDFILES.sh
(Note that I'm addressing the real problem, not necessarily the question you asked.)

There is already a nice answer directly in the edited question that helped me a lot - thank you!
I just want to add a bit of curious behavior that I stumbled across: matching against a dollar sign at the end of lines (e.g. when modifying PS1 in your .bashrc file).
As a workaround, I match for additional whitespace.
$ DOLLAR_TERMINATED="123456 $"
$ echo "${DOLLAR_TERMINATED}" | sed -e "s/ \\$/END/"
123456END
$ echo "${DOLLAR_TERMINATED}" | sed -e "s/ \\$$/END/"
sed: -e expression #1, char 13: Invalid back reference
$ echo "${DOLLAR_TERMINATED}" | sed -e "s/ \\$\s*$/END/"
123456END
Explanation to the above, line by line:
Defining DOLLAR_TERMINATED - I want to replace the dollar sign at the end of DOLLAR_TERMINATED with "END"
It works if I don't check for the line ending
It won't work if I match for the line ending as well (adding one more $ on the left side)
It works if I additionally match for (non-present) whitespace
(My sed version is 4.2.2 from February 2016, bash is version 4.3.48(1)-release (x86_64-pc-linux-gnu), in case that makes any difference)

Replace string with another string based on backreference with sed

I'm trying to convert a predefined string %c# where # can be some number with another string. The catch is that the length of the other string must be truncated to # number of characters.
Ideally these set of commands would work:
FORMAT="%c10"
LAST_COMMIT="5189e42b14797b1e36ffb7fc5657c7eea08f1c0f"
echo $FORMAT | sed "s/%c\([0-9]\+\)/${LAST_COMMIT:0:\1}/g"
but clearly there is a syntax error on the \1. You can replace it with a number to see what I'm trying to get as output.
I'm open to using some other program other than sed to achieve this but ideally it should be programs that are pretty much native to most linux installations.
Thanks!

This is my idea.
echo ${LAST_COMMIT} | head -c $(echo ${FORMAT} | sed -e 's/%c//')
Get number with sed and get first some character with head.
EDIT1
This might be better.
echo ${LAST_COMMIT} | head -c $(echo ${FORMAT} | sed -e 's/%c\([0-9]\+\)/\1/')
EDIT2
I make the script because it is too tough to understand. Please try this.
$ cat sample.sh
#!/bin/bash
FORMAT="%b-%t-%c10-%c5"
LAST_COMMIT="5189e42b14797b1e36ffb7fc5657c7eea08f1c0f"
## List numbers
lengths=$(echo ${FORMAT} | sed -e "s/%[^c]//g" -e "s/-//g" -e "s/%c/ /g")
## Substitute %cXX to first XX characters of LAST_COMMIT
for n in ${lengths}
do
to_str=$(echo ${LAST_COMMIT:0:${n}})
FORMAT=$(echo ${FORMAT} | sed "s/%c${length}/${to_str}/")
done
## Print result
echo ${FORMAT}
This is the result.
$ ./sample.sh
%b-%t-5189e42b1410-5189e5
Also this is one line commands (Same contents but too long and too tough)
for n in $(echo ${FORMAT} | sed -e "s/%[^c]//g" -e "s/-//g" -e "s/%c/ /g"); do to_str=$(echo ${LAST_COMMIT:0:${n}}); FORMAT=$(echo ${FORMAT} | sed "s/%c${length}/${to_str}/"); done; echo ${FORMAT}

The value of $LAST_COMMIT gets interpolated before sed runs, so there is no backreference to refer back to yet. There is an /e extension in GNU sed which would support something like this, but I would simply use a slightly more capable tool.
perl -e '$fmt = shift; $fmt=~ s/%c(\d+)/%.$1s/g; printf("$fmt\n", #ARGV)' '%c10' "$LAST_COMMIT"
Of course, if you can let go of your own ad-hoc format string specifier, and switch to a printf-compatible format string altogether, just use the printf shell command straight off.

length=$(echo $FORMAT | sed "s/%c\([0-9]\+\)/\1/g")
echo "${LAST_COMMIT:0:$length}"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js