Replace exact part of text in a string (fstab) using sed - regex

I'm in the process of migrating some data between 2 servers. The data is held in the same folder structure on each server.
Once the data has been moved I want to update the fstab file on all of the affected Linux machines. I have a bash script that rsyncs the data between the servers and then logs on to each machine in a list and updates the fstab with the new IP address using sed.
sed "s/\(172.16.0.30\)\(.*\)\(${share}\)\(.*\)/172.16.0.35\2\3\4/"
This has worked absolutely fine in the past, however this time I'm migrating a folder which has a name very similar to a few others, let's say $share is 'home':
home
home-old
home-ancient
The problem I'm having is that this regex is picking up all of the shares with the text contained in $share and not just the one I want.
Is there a way to adjust the regex so that it will only replace the IP on the single line that I want? I've looked at the /b variable but can't seem to get it to work, unfortunately regular expressions usually confuse me!

\b is a GNU extension and in this case won't work because it matches a word boundary, and both the space and - are in the group of non-word. It will match all of them. One simple option is to match a space (or end-of-line) character after $share, like:
sed "s/\(172.16.0.30\)\(.*\)\(${share}\)\( \(.*\)\|$\)/172.16.0.35\2\3\4/"

Related

Search in VSCode for the multiline contents of a set of XML tags, using a regular expression

I am using VSCode to do a global search of XML files. Within those files there are multiple instances of these XML tags: <translated></translated>. I need to find all occurrences of any hyphens - that exist anywhere between those tags, where the contents of those tags can be on multiple lines.
<translated>
Content is here
Could be on multiple lines
The meeting could take 3-4 hours
</translated>
In the above example, the phrase "3-4 hours" has a hyphen in it. I need a regex that works for VSCode which finds all incidences of hyphens which happen to be within a set of these XML tags.
Option 1 (using VS Code)
This only matches one dash at a time and not all dashes. This is because limiting the search to inside one set of tags means it can only do one pass at a time. I was going to delete this answer but if it's the only answer given it may be better than nothing. The work around would be that you would have to refresh the search (button above the search box) and click replace all over and over. If there are lots of dashes this would be annoying but better than no answer.
I have been fiddling with Visual Code Studio and the following seems to work.
(<translated>(.|\n)*?)(-)((.|\n)*?<\/translated>)
Assuming you may be wanting to, for example, replace the dash it's possible to with adding back groups 1 and 4 wrapped around any new text...
$1 <yourTextHere> $4
Example:
Before replace:
After replace (note only the 3-4 in the first section of the file(s) is affected and the 3 to 4 is not changed):
Option 2 / Update (using Brackets.io)
While I'm unsure of the cause if the failure for VSCode to match across files, the following regex works with Brackets (google Brackets.io) across multiple files...
-(?=[^<]*?<\/translated>)
You have to have all your files in a folder and open the folder. Then search in the project (Find > Find in files). Notice in the screenshot it shows for the matches found across all files. In the lower panel for the selected file t2 copy.txt it matches first on line 6 and then on line 16 and (correctly) does not match on line 10 because it is not contained in a translated tag set.
The reason why -(?=[^<]*?<\/translated>) doesn't work in vscode is because it does not EXPLICITLY contain a newline \n. Even though [^<] includes newlines, the \n needs to be actually written into the regex in order to trigger the multiline option. Why is this?
See https://github.com/microsoft/vscode/issues/75265 which uses a similar regex. The issue makes for interesting reading ;>} Primarily for performance reasons.
So simply using this
-(?=[^<]*?\n*<\/translated>)
works in vscode!
-(?=[^<]*?\n<\/translated>) would work for you too unless you have single line blocks like:
<translated>Con-tent is he-re</translated>

Remove matching strings using regex

I have a comma-delimited list of name/value pairs like this:
make=mazda;model=cx-5;year=2016;moonroof=yes;radio=yes;navigation=no;color=gray;
I would like to remove the moonroof, radio, and navigation pairs. I can capture these pairs using a regex like this:
(radio|navigation|moonroof)=.*?(?:;|$)
Is there a way to remove the captured group(s) using regex alone, without writing code? Alternatively, is there a way to get the rest of the pairs excluding the captured groups?
If your data set is small, you can use an online website to do it (such as https://regex101.com/.) With Linux or, I imagine Windows Subsystem for Linux you should be able to use the above expression with sed or bash regexp:
sed -ri 's/(radio|navigation|moonroof)=.*?(;|$)//g' <filename>
That sed command will do it in situ, so back up your data first.
Without bash/sed/perl to help you from a suitable command line, I'm sorry to say you need code, or rather the regexp engine associated with it!!!
Hope that helps!

Geany regex to extract data inside and outside parenthesis separately

I have an incomplete XML file I am trying to convert to CSV to map to a spreadsheet. To create the header I need to extract the label before each = and seperate with a ,.
Inversely, I need to capture everything between the "" on all the lines to match up to the header.
Where I'm having trouble is there are some spaces in some of the data fields which is messing me up in creating anchors, and some fields have no data at all with just "". Here is a sample with both cases in which I was trying to create my header.
lvendor="EBL" lxref="1304112" linked="0" ltrnqty="" labeltype="ITEM W/DATE,VENDOR" taxcode="1" foodstamp="false" nonstock="false" detail="true" ars2="false"
The Geany regex I tried with is:
[=]["](\S+)?["][\s]
This works until I run into a space in the data field, but replacing (\S+)? with (.+)? gives me other problems. I'm just not sure how to anchor my regex properly, or if I need to use a capture group to get it done.
I'm not even positive if Geany is the right tool here. I'm on an Arch Linux box, so I'm open to any tools that are available to me.
You could do:
(\w+)(?==)|"([^"]*)"
This will save the variable names on first capturing group and their corresponding values on the second capturing group.
Since you are open to new tools, you can convert XML to CSV easily in the terminal with sed:
cat file.xml | sed -r 's/\s?\S+=/,/g' | sed -r 's/^,//'

Find file names using find command and regex, functioning improperly

We have a Samba server that is backing up to an S3 bucket. Come to find out that a large number of file names contain inappropriate characters and the AWS CLI won't allow the transfer of those files. Using the "worst offender" I build a quick regex check, tested in rubular against another file name to try and generate a list of files that need to be fixed:
([中文网页我们的团队孙é¹â€“¦]+)
The command I'm running is:
find . -regextype awk -regex ".*/([中文网页我们的团队孙é¹â€“¦]+)"
This brings back a small list of files that contain the above string, in order, not individual characters contained throughout the name. This leads me to believe that either my regextype is incorrect or something is wrong with the formatting of the list of characters. I've tried types emacs and egrep as they seem most similar to regex I've used outside of a Unix environment to no luck.
My test file name is: this-is-my€™s'-test-_ folder-name. which, according to my rubular tests, should be returned but isn't. Any help would be greatly appreciated.
Your regex .*/([中文网页我们的团队孙é¹â€“¦]+) expects one of the special characters after the slash and your test file doesn't start with one of these characters.
You might try something more like .*[中文网页我们的团队孙é¹â€“¦]+.* instead.

replace urls

I have a huge txt file and Editpad Pro list of urls with images on the root folder.
http://www.othersite.com/image01.jpg
http://www.mysite.com/image01.jpg
http://www.mysite.com/category/image01.jpg
How can I change only that ones that has images on the root using regexp?
http://www.othersite.com/image01.jpg
http://www.NEW_WEBSITE.com/image01.jpg
http://www.mysite.com/category/image01.jpg
I'm using the RegExr online app.
Search and replace (case insensitive, regular expression):
http://www\.mysite\.com/([^/]*\.(?:jpg|gif|png))
with:
http://www\.NEW_WEBSITE\.com/\1
EDIT
And yes, this will also re-base files such as http://www.mysite.com/.jpg, if any such files or directories exist. If anyone doesn't like this then just replace * with + -- or with {X,} if your assumption happens to be that an image file needs at least a X character name s etc. etc. -- but really, this is probably quite outside the scope of what lab72 is trying to achieve (i.e. not image file name validation.)
url1.replace(/((https?:\/\/www.?)(\w*?)(.com\/image\d*?\.(png|gif|jpg))/,
"$1newName$3");
Something like the above should work. The code is in AS (not compiled though :P) Note that $2 matches the sites name which we are replacing with yoursite.
Replace
http://www\.mysite\.com/image(.*)
with
http://www.newsite.com/image$1
That being said, you might also be interested in a decent text editor. That flash applet is really yucky. You can still use the same regexp, although you'll have to replace the dollar sign $ with a backslash \.