Mirroring with regex in wget

Mirroring with regex in wget - regex

I'm using wget and trying to mirror all 98 folders on a website. What would be the syntax to do "wget -mk http://example.com/folder[1-98]/"?
Thanks.

for i in $(seq 1 98);do echo "http://example.com/folder${i}/";done|wget -mki -

wget does not support specifying URL ranges in the format you've described. You would be better served building out the range of links with bash or some other programming language into a text file and then reading that text file with wget.

You can apply {} to apply range of numbers. Use this example "wget -mk http://example.com/folder{1..98}"

Related

Scanning APIs with ZAP Docker image - replacer with regex

I'm trying to use API scanner Docker image as described here: https://www.zaproxy.org/blog/2017-06-19-scanning-apis-with-zap/ and I want to do some requests replacement using regexp. I'm using command:
docker run -v $(pwd):/zap/wrk/:rw --network=host -t owasp/zap2docker-weekly zap-api-scan.py --hook=/zap/wrk/authentication-hooks.py -t docs/openapi.yaml -f openapi -w output/oppenapi.md -z "-configfile /zap/wrk/zapproxy.prop" -d
with "zapproxy.prop":
replacer.full_list(0).description=customerId
replacer.full_list(0).enabled=true
replacer.full_list(0).matchtype=REQ_HEADER_STR
replacer.full_list(0).matchstr=/api/customers/\d+
replacer.full_list(0).regex=true
replacer.full_list(0).replacement=/api/customers/1
and the replacement doesn't work for URL I want to modify: GET /api/customers/10. The same rule used via GUI works just fine.
I've also tried:
replacer.full_list(0).description=customerId
replacer.full_list(0).enabled=true
replacer.full_list(0).matchtype=REQ_HEADER_STR
replacer.full_list(0).matchstr=/api/customers/10
replacer.full_list(0).regex=false
replacer.full_list(0).replacement=/api/customers/1
it also works fine.
Simon Bennetts suggested to check how GUI saves those settings: https://www.zaproxy.org/faq/how-do-you-find-out-what-key-to-use-to-set-a-config-value-on-the-command-line/. As you can see - there aren't any esacapes in mastchstr.
Is there something that I need to do to pass this regex correctly?

Escaping was the issue:
replacer.full_list(0).description=clientId
replacer.full_list(0).enabled=true
replacer.full_list(0).matchtype=REQ_HEADER_STR
replacer.full_list(0).matchstr=/api/customers/\\d+
replacer.full_list(0).regex=true
replacer.full_list(0).replacement=/api/customers/2

Using Grep + Regex linux

I am trying to find the value "PASS_MAX_DAYS" equal to 90 or less in the /etc/login.defs file But it does not work, I am testing on a suse 12 server but the command does not work.
grep "^PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)" /etc/login.defs
Thanks for your support and time

It is better to test the number against your value, and not test a string against any pattern (as already suggested in comments). For example like this:
awk '/^PASS_MAX_DAYS/ && $2<=90' /etc/login.defs
This way, you can easily modify your command, if your limit changes to 30 or to 365 days. Also I guess values like 090 are still valid for that configuration.

grep, by default, doesn't understand extended regular expressions ..
grep -E "^PASS_MAX_DAYS\s*([0-9]|[1-8][0-9]|90)\s*$" /etc/login.defs
will give you SOME result.
That said, how many different entries for PASS_MAX_DAYS do you expect in that file?

How do I use the --accept-regex option for downloading a website with wget?

I'm trying to download an archive of my website — 3dsforums.com — using wget, but there are millions of pages I don't want to download, so I'm trying to tell wget to only download pages that match certain URL patterns, and yet I'm running into some roadblocks.
As an example, this is a URL I would like to download:
http://3dsforums.com/forumdisplay.php?f=46
...so I've tried using the --accept-regex option:
wget -mkEpnp --accept-regex "(forumdisplay\.php\?f=(\d+)$)" http://3dsforums.com
But it just downloads the home page of the website.
The only command that remotely works so far is the following:
wget -mkEpnp --accept-regex "(\w+\.php$)" http://3dsforums.com
This provides the following response:
Downloaded 9 files, 215K in 0.1s (1.72 MB/s)
Converting links in 3dsforums.com/faq.php.html... 16-19
Converting links in 3dsforums.com/index.html... 8-88
Converting links in 3dsforums.com/sendmessage.php.html... 14-15
Converting links in 3dsforums.com/register.php.html... 13-14
Converting links in 3dsforums.com/showgroups.php.html... 14-29
Converting links in 3dsforums.com/index.php.html... 16-80
Converting links in 3dsforums.com/calendar.php.html... 17-145
Converting links in 3dsforums.com/memberlist.php.html... 14-99
Converting links in 3dsforums.com/search.php.html... 15-16
Converted links in 9 files in 0.009 seconds.
Is there something wrong with my regular expressions? Or am I misunderstanding the use of the --accept-regex option? I've been trying all sorts of variations today but I'm not quite grasping what the actual problem is.

wget by default uses POSIX regex \d class is expressed as [:digit:] and \w class is expressed as [:word:], plus why all the grouping? If your wget is compiled with PCRE support make your life easier and do it as:
wget -mkEpnp --regex-type pcre --accept-regex "forumdisplay.php\?f=\d+$" http://3dsforums.com
but... that will not work because your forum software creates automatic session IDs (s=<session_id>) and injects them in all the links, so you need to account for those as well:
wget -mkEpnp --regex-type pcre --accept-regex "forumdisplay\.php\?(s=.*)?f=\d+(s=.*)?$" http://3dsforums.com
The only problem is that now your files will be saved with the session ID in their names so you'll have to add another step when wget is finished - to bulk rename all the files with the session ID in their names. You could probably do it by piping wget to sed, but I'll leave that to you :)
And if your wget doesn't support PCRE this pattern will end up being quite long, but lets hope it does...

Xpath find files for windows? xml parser to find files in windows

So we have 1500 xhtml pages in lets say 100 subfolders of /myfolder. I want to find evil constellations of
<goodTag>
....
<evilTag/>
....
<evilTag/>
....
</goodTag>
In my current case, it is only allowed to have
<goodTag>
....
<evilTag/>
...
</goodTag>
and not 2 evil Tags within a good Tag. This is just an example though. Sometimes I must search for something like
<outter>
....
<someTag someAttribute="iDoEvil" />
...
</outter>
I've been browsing for a while now and could not find a tool which would help me to do so.
What free ware / open source solutions are availble in windows?
What are the xhtml files like? basically they are web pages created for JSF. We use our own tags and keep doing changes to them and thus, have to keep a good eye on bad constellation who haven't been thought of
I'm basically asking because I finally ended up doing it with regex, which makes people around here going nuts.

This is a bash solution:
find all xml files in current directory
list all xml files which contain <someTag someAttribute="iDoEvil" />
for i in `find . -name '*.xml'`
do
if xmlstarlet sel -H -t -m '//someTag[#someAttribute="iDoEvil"]' -v #someAttribute "$i" >/dev/null
then
echo "$i"
fi
done
Note: I haven't try to write a DOS script in Windows, but the idea is the same.
You can download xmlstarlet(windows version) here.

If you're willing to write your own Java program, you could use a combination of apache commons IO and jOOX:
// Use apache commons to recurse into your file structure:
for (File file : FileUtils.listFiles(yourDir, new String[] { ".xml" }, true)) {
// Use jOOX to parse the file and match the "bad" combination with XPath:
if ($(file).xpath("//goodTag[count(.//evilTag) > 1]").size() > 0) {
System.out.println("Match : " + file);
}
}
Note, if you're not up for writing your own program, maybe SuperUser might be a better site for this question...

Linux cp with a regexp

I would like to copy some files in a directory, renaming the files but conserving extension. Is this possible with a simple cp, using regex ?
For example :
cp ^myfile\.(.*) mydir/newname.$1
So I could copy the file conserving the extension but renaming it. Is there a way to get matched elements in the cp regex to use it in the command ?
If not, I'll do a perl script I think, or if you have another way...
Thanks

Suppose you have myfile.a, myfile.b, myfile.c:
for i in myfile.*; do echo mv "$i" "${i/myfile./newname.}"; done
This creates (upon removal of echo) newname.a, newname.b, newname.c.

The shell doesn't understand general regexes; you'll have to outsource to auxiliary programs for that. The classical scripty way to solve your task would be something like
for a in myfile.* ; do
b=`echo $a | sed 's!^myfile!mydir/newname!'`
cp $a $b
done
Or have a perl script generate a list of commands that you then source into the shell.

I really like the regex syntax of the rename perl script (by Robin Barker and Larry Wall), e.g.:
rename "s/OldFile/NewFile/" OldFile*
OldFile.c and OldFile.h are renamed to NewFile.c and NewFile.h, respectively
I simply wanted the exact same thing with a copy command:
copy "s/OldFile/NewFile/" OldFile*
So I duplicated that script and changed the rename statement to copy via File::Copy. Et voila! A copy command with perl-regex syntax:
https://gist.github.com/jcward/0ead33bd79f2061c68728cc82582241f

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Mirroring with regex in wget - regex

I'm using wget and trying to mirror all 98 folders on a website. What would be the syntax to do "wget -mk http://example.com/folder[1-98]/"? Thanks.

for i in $(seq 1 98);do echo "http://example.com/folder${i}/";done|wget -mki -

wget does not support specifying URL ranges in the format you've described. You would be better served building out the range of links with bash or some other programming language into a text file and then reading that text file with wget.

You can apply {} to apply range of numbers. Use this example "wget -mk http://example.com/folder{1..98}"

Related

Scanning APIs with ZAP Docker image - replacer with regex

Using Grep + Regex linux

How do I use the --accept-regex option for downloading a website with wget?

Xpath find files for windows? xml parser to find files in windows

Linux cp with a regexp

Categories

Resources