txt file delete url to last "/" to get files - regex

I have txt file contaning one url per row each url as:
://url/files.php?file=parent/children/file.pdf
://url/files.php?file=parent/children2/childrenofchildren2/file2.txt
......etc
I need help to cut everythink before last / in a row. That is what I used in notepad++ regex mode (it doesnt work):
^.+[/](.*)$
To get:
file.pdf
file2.txt
But I am open to all waysof solving.

Replace your line from left including / by nothing:
sed 's/.*\///' file
or
sed 's|.*/||' file
Output:
file.pdf
file2.txt

This solution may be more complicated than it needs to be, but it works!
A purely regex-based approach could be as follows:
(([^\/])*)((\n)|($))/g
Basically, it matches any number of non-newline and non \ characters (([^\/])*) and then stops when it either encounters a new line \nor the end of the sequence $. The global /g is also set, to allow it to match more than one instance!
I hope this helped!

Related

How do i extract some particular words from each line?

The text file has many lines of these sort , i want to extract the words after /videos till .mp4 and the very last number ( shown in bold ) and output each filtered line in a separate file
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/**S4KWZTyt-32313922.mp4**.m3u8?hdnts=exp=1592315851~acl=*/S4KWZTyt-32313922.mp4.m3u8~hmac=83f4674e6bf2576b070c716a3196cb6a30f35737827ee69c8cf7e0c57a196e51 **1**
Lets say for example the text file content is ..
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/JajSfbVN-32313922.mp4.m3u8?hdnts=exp=1592315891~acl=*/JajSfbVN-32313922.mp4.m3u8~hmac=d3ca7bd5b233a531cfe242d17d2ea0c0167b41b90fff6459e433700ffc969d69 19
https://videos-a.jwpsrv.com/content/conversions/7kHOkkQa/videos/Qs3xZqcv-32313922.mp4.m3u8?hdnts=exp=1592315940~acl=*/Qs3xZqcv-32313922.mp4.m3u8~hmac=c30e2082bf748a6b4d1621c1d33a95319baa61798775e9da8856041951cf5233 20
The output should be
JajSfbVN-32313922.mp4 19
Qs3xZqcv-32313922.mp4 20
You may try the below regex:
.*\/videos\/(.*?mp4).*?(?<= )(\d+)
Explanation of the above regex:
.* - Matching everything before \videos.
\/videos\/ - Matching videos literally.
(.*?mp4) - Represents a capturing group lazily matching everything before mp4.
.*? - Greedily matches everything before the occurrence of digits.
(\d+) - Represents second capturing group matching the numbers at the end as required by you.
You can find the demo of the above regex in here.
Command line implementation in linux:
cat regea.txt | perl -ne 'print "$1 $2\n" while /.*\/videos\/(.*?mp4).*?(?<= )(\d+)/g;'> out.txt
You can find the sample implementation of the above command in here.
The proposed regex is probably a better solution, but I'll leave a Python solution that writes each filtered line in a separate file. This script works if every line in the file is like that.
with open("my_file.txt","r") as FILE:
lines=FILE.readlines()
for line in lines:
num=line.split(" ")[1]
newline=line.split("videos")[2]
newline=newline[1:]
new=newline.split(".")[0:2]
with open(new[0],"w") as f:
f.write(new[0]+"."+new[1]+" "+num.strip())
f.close

File Name capturing form path

I could have sworn i have done this before, but.... no go.
I am trying to copy the file name from each line of the sample data below to the beginning of the line. However, when when i add parathensis to the expression to capture the file name it is deleted. I have tried several variations.
Regex Expresion
[^\\/:*?<>]+$
The expression successfully captures the file names on each line.
Sample Data
c:\Dir1\dir2\Samplefile.txt
c:\Dir1\dir2\dir3\Sample file.txt
c:\Dir1\dir2\Samplefile
c:\Dir1\dir2\dir3\Sample file
c:\Dir1\wp_movfiles_20160911024934.ini
c:\Dir1\\dir2\wp_movfiles_20160911055222.ini
desire results
Samplefile.txt c:\Dir1\dir2\Samplefile.txt
Sample file.txt c:\Dir1\dir2\dir3\Sample file.txt
Samplefile c:\Dir1\dir2\Samplefile
Sample file c:\Dir1\dir2\dir3\Sample file
wp_movfiles_20160911024934.ini c:\Dir1\wp_movfiles_20160911024934.ini
wp_movfiles_20160911055222.ini c:\Dir1\\dir2\wp_movfiles_20160911055222.ini
any assitance is greatly appricated. thank you.
I think you can get away with using this regex:
.*\\(.*)$
This will greedily consume everything in the file path from left to right, until hitting the final backslash. Then it will stop, and capture everything which comes after that final backslash, which should be the file name.
Demo here:
Regex101

sed - match regex in specific position

I'm having some trouble creating a one liner or a simple script to edit some fixed length files using sed.
Supposing my file has lines in this format:
IPITTYTHEFOOBUTIDONOTPITTYTHEBAR
IPITTYTH BARBUTIDONOTPITTYTH3FOO
If the entire lines are considered as a string, I can say I would want to match the substring that starts in position 10 and has length 3 with a regex. If it matches the regex I want to had some other string in the end of that line.
Assuming the matching regex is B.R, and the string to append in the end of the line is NOT, I would want my file to turn into:
IPITTYTHEFOOBUTIDONOTPITTYTHEBAR
IPITTYTH BARBUTIDONOTPITTYTHEFOONOT
The lines in the files are bigger than the ones in this sample.
So far I have this:
sed -i '/B.R/ s/$/NOT/' file.name
The problem is that this ignores the position where the regex is matched, making the first line of the example a match as well:
IPITTYTHEFOOBUTIDONOTPITTYTHEBAR
IPITTYTH BARBUTIDONOTPITTYTH3FOO
I'm open to use awk as well.
Thanks in advance.
You are almost there. You just need to specify the characters which exists before B.R . If B is at 10th position then there must be 9 characters exists before B
sed -i '/^.\{9\}B.R/s/$/NOT/' file.name
Example:
$ sed '/^.\{9\}B.R/s/$/NOT/' file
IPITTYTHEFOOBUTIDONOTPITTYTHEBAR
IPITTYTH BARBUTIDONOTPITTYTHEFOONOT

sed only replacing last occurrence of match - need to match all

I would like to replace all { } on a certain line with [ ], but unfortunately I am only able to match the last occurrence of the regexp.
I have a config file which has structure as follows:
entry {
id 123456789
desc This is a description of {foo} and was added by {bar}
trigger 987654321
}
I have the following sed, of which is able to replace the last match 'bar' but not 'foo':
sed s'/\(desc.*\){\(.*\)}/\1\[\2\]/g' < filename
I anchor this search to the line containing 'desc' as I would hate for it to replace the delimiting braces of each 'entry' block.
For the life of me I am unable to figure out how to replace all of the occurrences.
Any help is appreciated - have been learning all day and unable to read any more tutorials for fear that my corneas might crack.
Thanks!
Try the following:
sed '/desc/ s/{\([^}]*\)}/[\1]/g' filename
The search and replace in the above command will only be done for lines that match the regex /desc/, however I don't think this is actually necessary because sed processes text a line at a time, so even without this you wouldn't be replacing braces on the 'entry' block. This means that you could probably simplify this to the following:
sed 's/{\([^}]*\)}/[\1]/g' filename
Instead of .* inside of the capturing group [^}]* is used which will match everything except closing braces, that way you won't match from the first opening to the last closing.
Also, you can just provide the file name as the final argument to sed instead of using input redirection.

Regular Expression - Capture and Replace Select Sequences

Take the following file...
ABCD,1234,http://example.com/mpe.exthttp://example/xyz.ext
EFGH,5678,http://example.com/wer.exthttp://example/ljn.ext
Note that "ext" is a constant file extension throughout the file.
I am looking for an expression to turn that file into something like this...
ABCD,1234,http://example.com/mpe.ext
ABCD,1234,http://example/xyz.ext
EFGH,5678,http://example.com/wer.ext
EFGH,5678,http://example/ljn.ext
In a nutshell I need to capture everything up to the urls. Then I need to capture each URL and put them on their own line with the leading capture.
I am working with sed to do this and I cannot figure out how to make it work correctly. Any ideas?
If the number of URLs in each line is guaranteed to be two, you can use:
sed -r "s/([A-Z0-9,]{10})(.+\.ext)(.+\.ext)/\1\2\n\1\3/" < input
This does not require the first two fields to be a particular width or limit the set of (non-comma) characters between the commas. Instead, it keys on the commas themselves.
sed 's/\(\([^,]*,\)\{2\}\)\(.*\.ext\)\(http:.*\)/\1\3\n\1\4/' inputfile.txt
You could change the "2" to match any number of comma-delimited fields.
I have no sed available to me at the moment.
Wouldn't
sed -r 's/(....),(....),(.*\.ext)(http.*\.ext)/\1,\2,\3\n\1,\2,\4/g'
do the trick?
Edit: removed the lazy quantifier