Combine Bash Regex Expressions - regex

I have a server that has some pages written in LESS. I have a launch.sh script that essentially builds all the CSS files from LESS, puts them in a directory, and starts the server (written in Node.js).
Here is what the script looks like currently:
# Searches the CSS directory for LESS files
for file in views/less/*.less
do
FROM=$file
A=${file/.*/.css}
B=${A/less/css}
TO=${B/views/resources}
echo "$FROM -> $TO"
# Compiles each LESS file into a CSS file of the same name with minified output
lessc --clean-css $FROM $TO
done
Everything works fine, but I was wondering if I could condense the regex expressins, denoated as A and B. Essentially the script takes the entire build path, let's say:
/views/less/style.less
and replaces less to css and replaces views to resources. So, the final path (after the conversion process) becomes:
/resources/css/style.css
Any help would be greatly appreciated!

You can replace all occurrences of less in a variable by doubling the slash:
A=${file//less/css}

Related

Regex Assistance for replacing filepaths in markdown documents

I migrated my notes from evernote to markdown files with yarle. unfortunately it created me a lot of folders seperatively for the attachments (although I set it up for one folder only).
I moved all attachements to one folder, so the filepath to the attachments in the mardown files needs to be updated.
I think regex would be right for this, but I don't have any knowledge about regex and would be really thankful for help.
Filepaths are as follows:![[./_attachmentsMove/Coordination_Patterns.resources/CoordinationPattern_Ipsi.MOV]]
All filepaths are identical ![[./_attachmentsMove/]] up to this
The second folder varies e.g. Coordination_Patterns.resources/.
I want to delete everything but the filename.extension itself e.g. ![[CoordinationPattern_Ipsi.MOV]].
An example of the other filepaths:
![[./_attachmentsMove/Jonglieren_(Hände).resources/07 Jonglieren.MOV]]
(second folder changes, filename changes, I also have .png and .mov).
I use MassReplaceIt (app for mac) which allows me to replace expressions in documents with regex. If someone has a solution using the terminal/commandline, I'll try this as well of course :)
Try if this regexp suffices:
(?<=!\[\[)[^\]]+/(?=[^\]/]+]])
Replace with empty string.
It should delete the part from the ![[ up to the last / before the next ]].

rsync --exclude-from 'list' file not working

I am trying to use rsync to complete an unfinished transfer from a remote server to a local machine using
rsync -a user#domain.com:~/source/ /dest/
where /dest/ is the location of the partially completed transfer. However, due to bandwidth concerns I need to run rsync to a /tmp_dest/ on a different machine that does not have a copy of /dest/, from where I can then later move /tmp_dest/ to /dest/
The solution I have come up with thus far is to use rync's --exclude-from option, using a file containing a complete list of files from /dest/.
The command would look something like this
rsync -a --exclude-from 'list.txt' user#domain.com:~/source/ /tmp_dest/
At this point I feel as though I have scoured everywhere for a solution and tried every variant I came across.
This included relative and absolute paths for the 'list.txt'
relative:
path 1/file 1
path 2/file 2
--or--
absolute:
/absolute/source/path 1/file 1
/absolute/source/path 2/file 2
I have tried the above with combinations of including - to explicitly exclude that line (where I have seen examples of people wanting to also + other files)
- /absolute/source/path 1/file 1
- /absolute/source/path 2/file 2
I have tried putting leading **/ in front of the file paths to rectify the relative path problem
**/path 1/file 1
**/path 2/file 2
I have also tried navigating to the directory containing 'list' and executing rsync from there, to avoid the issue where rsync looks for
/path/to/the/list/something1/to.exclude
/path/to/the/list/something2/to.exclude
/path/to/the/list/something3/to.exclude
and undoubtedly finding nothing
I have also ensued that the correct line breaks are being used in the 'list' file. i.e. LF (Unix) line breaks.
I have tried to create the 'list' with the following command
find . -type f | tee list.txt
this initially created a file looking something like this
./yyyy-mm-dd folder 1/sub folder [foo]/file.a
./(yyyy) folder 2 {foo2}/file.b
./folder, 3/sub-folder 3/file.c
as you can see, there are spaces and other characters in the file paths, but from my current understanding, this shouldn't affect. But perhaps I am mistaken and will need to escape any characters with special meaning, which I may then need help with
which I then perform a replace on ./ in notepad++ or some other text editor that preserves the LF (Unix) line breaks to get the desired result.
(e.g. as above, I've tried replacing ./ with nothing, with /absolute/path/for/source/ noting the leading slash, or even double wildcards to match any parent tree structure containing the files.
The only thing I feel that I haven't tried is escaping the spaces in the file names and paths, but I have read that this shouldn't be an issue.
Perhaps I am overlooking something and any help would be appreciated.
Here is from rsync man page how to use "--exclude-from":
--exclude-from=FILE read exclude patterns from FILE
Use the following command:
rsync -a --exclude-from=list.txt user#domain.com:~/source/ /tmp_dest/
And also it is better to use full path name of list.txt file

Iterating over directory with specified path in Bash

pathToBins=$1
bins="${pathToBins}contigs.fa.metabat-bins-*"
for fileName in $bins
do
echo $fileName
done
My goal is to attach a path to my file name. I can iterate over a folder and get the file name when I don't attach the path. My challenge is when I add the path echo fileName my regular expression no longer works and I get "/home/erikrasmussen/Desktop/Script/realLargeMetaBatBinscontigs.fa.metabat-bins-*" where the regular expression '*' is treated like a string. How can I get the path and also the full file name while iterating over a folder of files?
Although I don't really know how your files are arranged on your hard drive, a casual glance at "/home/erikrasmussen/Desktop/Script/realLargeMetaBatBinscontigs.fa.metabat-bins-*" suggests that it is missing a / before contigs. If that is the case, then you should change your definition of bins to:
bins="${pathToBins}/contigs.fa.metabat-bins-*"
However, it is much more robust to use bash arrays instead of relying on filenames to not include whitespace and metacharacters. So I would suggest:
bins=(${pathToBins}/contigs.fa.metabat-bins-*)
for fileName in "${bins[#]}"
do
echo "$fileName"
done
Bash normally does not expand a pattern which doesn't match any file, so in that case you will see the original pattern. If you use the array formulation above, you could set the bash option nullglob, which will cause the unmatched pattern to vanish instead, leaving an empty array.

Applescript to extract the Digital Object Identifier (DOI) from a PDF file

I looked for an applescript to extract the DOI from a PDF file, but could not find it. There is enough information available on the actual format of the DOI (i.e. the regular expression), but how could I use this to get the identifier from the PDF file?
(It would be no problem if some external program were used, such as Hazel.)
If you're ok with using an app, I'd recommend Skim. Good AppleScript support. I'd probably structure it like this (especially if the document might be large):
set DOIFound to false
tell application "Skim"
set pp to pages of document 1
repeat with p in pp
set t to text of p
--look for DOI and set DOIFound to true
if DOIFound then exit repeat--if it's not found then use url?
end repeat
end tell
I'm assuming a DOI would always exist on one page (not spread out to between two). Looks like they are invariably (?) on the first page of an article, which would make this quick of course, even with a large doc.
[edit]
Another way would be to get the Xpdf OSX binaries from http://www.foolabs.com/xpdf/download.html and use pdftotext in the command line (just tested this; it works well) and parse the text using AppleScript. If you want to stay in AppleScript, you can do something like:
do shell script "path/to/pdftotext 'path/to/pdf/file.pdf'"
which would output a file in the same directory with a txt file extension -- you parse that for DOI.
Have you tried it with pdfgrep? It works really well in commmandline
pdfgrep -n --max-count 1 --include "*.pdf" "DOI"
i have no idea to build an apple script though, but i would be interested in one also. so that if i drop a pdf into that folder it just automatically extracts the DOI and renames the file with the DOI in the filename.

RegEx to rewrite folder structure of varying length and file name to string

This is the code I'm using, developed with the help of #anubhava to rewrite a path generated by a CGI script to redirect the path from the location of my jpg image files to another folder that contains watermarked image files in the same folder structure organization as the originals, but exclude files that begin with tn_ or AM (plus _category_image.jpg):
RewriteRule ^ImageFolio4_files/1/([^/]+)/((?!AM|tn_)[^.]+\.jpg)$ /ImageFolio4_files/cache/images/~$1~$2 [L,R=302,NC]
The original path of:
/ImageFolio4_files/1/Casual_Portraits/abc123_789-xyz.jpg
And the above RegEx works to properly generate this output:
/ImageFolio4_files/cache/images/~Casual_Portraits~abc123_789-xyz.jpg
My CHALLENGE: I need to accommodate a multi-folder structure up to three folders deep underneath the ImageFolio4_files/1/ structure. The current code doesn't accomodate that. I also need to exclude any files named _category_image.jpg which occurs at each of the folder levels beneath ImageFolio4_files/1/ (these files are unique small display icons that appear next to the category names)
I really have no idea how to accomodate the multi-folder structure so your help would be appreciated.
First, change
([^/]+)/ to (([^/]+)/)+
in your expression.
Second, change
(?!AM|tn_) to (?!AM|tn_|_category_image.jpg)
You can use the the negative lookahead (?!) for the whole filename as well, it doesn't eat up characters, just checks if the regex "AM|tn_|_category_image.jpg" matches at the actual position.