Remove a character from the middle of a string with regex - regex

I have no programing experience and thought this would be simple, but I have searched for days without luck. I am using a program to strip content from a web page. The program uses regex filters to display what you want from the stripped content. The stripped content can be any letter and is in the form of USD/SEK. I want to display USDSEK, without the "/"
Thanks
To elaborate further - I am using a program called Data toolbar for chrome, which makes it easy to strip content from web pages. After it strips the content, it provides a regex filter to display what part of the content is displayed. But I have to know the regex command to remove the / from USD/SEK, to display just USDSEK. I've tried [A-Z.,]+ but that only displays USD. I need the regex command to grab the first 3 and last 3 characters only, or to omit the / from the string.

Try adding parentheses around the groups which you wish to capture:
([a-zA-Z]{3})\/([a-zA-Z]{3})
or
([a-zA-Z]{3})\/((?1))
Depending on the functionality of the program you are using you can then reference these captured groups as $1and $2 (or \1and \2 depending on flavor)

Related

regex: expression does not starts nor ends with certain character

I have a regex for youtube links that I am using in a webapp. The issue is that one of the things I didnt realize when doing this editor thing is that people could make an a with an href to youtube.
I am replacing the youtube links for a custom component. The thing is that I dont want to substitue the links that are between "
Here is my regex:
youtubeRegex = /(?:https?:)?(?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube(?:\-nocookie)?\.(?:[A-Za-z]{2,4}|[A-Za-z]{2,3}\.[A-Za-z]{2})\/)(?:watch|embed\/|vi?\/)*(?:\?[\w=&]*vi?=)?([^#&\?\/]{11}).*?/g;
Right now the regex gets all this examples:
https://www.youtube.com/watch?v=njpgevZ_MUc
"https://www.youtube.com/watch?v=njpgevZ_MUc
https://www.youtube.com/watch?v=njpgevZ_MUc"
"https://www.youtube.com/watch?v=njpgevZ_MUc" (this one should not be matched but it gets matched)
How do I edit my regex so it doesnt match the last one but still matches the rest?

Google Analytics Regex Code

I'm having trouble figuring out the last part of my regex code for Google Analytics. I want to be able to grab any URL from my site that fits the following pattern:
www.site.com/hotel/[any text]/rooms?[any text]
So the URLs will always begin with /hotels and will always end with /rooms? followed by any possible text string with any possible text between "hotel/" and "/rooms?".
I have this much: ^/hotel/([^/])+/rooms([^\?])
But I'm not sure how to finish this so that it will only capture URLs that have text after the "?"
This works. You may want to tighten up the the allowed text in the path parameter and query parameter.
^www.site.com/hotel/[^/]+/rooms\?.+$

How to use a regular expression in notepad++ to change a url

I need some help with our migrated site urls's. We moved our site from Joomla to Worpdress and IN our posts we have over 20K of internal links.
The structure of these links are like these:
www.mysite.nl/current-post-title/index.php?option=com_content&view=article&id=5259:related-post-title&catid=35:universum&Itemid=48
What we need is this:
www.mysite.nl/related-post-title
So basically we need to remove everyhing behind www.mysite.nl/ up until the colon :, i.e. remove this: current-post-title/index.php?option=com_content&view=article&id=5259: (must remove the colon itself too)
And then remove everything behind the first ampersand (including the ampersand itself) until the end of the string, i.e. remove &catid=35:universum&Itemid=48
Of course only url strings containing this index.php?option=com_content must be changed.
I have dumped the table in plain text and opened it in Notepad++ to do a search and replace with regular expression because the content that must be removed from these lines is different every time.
Can someone please help me with the right regular expression?
In find what box enter below:
(www.mysite.nl)\/.*index.php\?option=com[^:]+:([^&]+)&.*
In replace with box enter:
\1/\2
Result
www.mysite.nl/related-post-title
Go inside-out, rather than outside-in, replace \/.+&id=\d+\:(.+?)&.+ with /$1. Also, paste a few into http://www.regexr.com/ and play around, although JavaScript and Notepad++ might have some differences in implemented Regex features, e.g. negative lookbehinds.

regex to remove text between two strings in TextWrangler

I've searched quite hard for an answer to this.
Basically, what I'm trying to do is to remove certain fields in some of my exported vCards which I exported using the Mac's Contacts application via Automator.
I managed to remove those single-line fields such as Birthday and Social Network. However, there is one particular field which is taking up multiple lines which I assume is a base64-encoded version of the original image - the PHOTO field.
This is an example of the start of the field:
PHOTO;ENCODING=b;TYPE=JPEG:/9j/4AAQSkZJRgABAQAAAQABAAD/4gxYSUNDX1BST0ZJTEUA
The end varies, so I used the start of the next line as the end:
CATEGORIES
The closest I've got was PHOTO;ENCODING.*CATEGORIES
Unfortunately, it seems to only select the the first line of the entire chunk.
Is there any way around this? I'm trying to do this in TextWrangler on my Mac.
Instead of .* you need :-
(.+[\r\n]+).*
because . doesn't match linebreak chars.
The pattern in parentheses matches multiple lines consisting of char sequences ending with linebreaks.
With the help of a friend I tried in TextWrangler
ATTACH;ENCODING=BASE64([^\n]*\n )*[^\n]*\n
and it matches each attachment

In Yahoo-Pipes, how to use regex when you can't see non-printable characters and html tags?

I keeping having the problem trying to extract data using regex whereas my result is not what I wanted because there might be some newlines, spaces, html tags, etc in the string, but is there anyway to actually see what is in the string, the debugger seems to show only the real text. How do you deal with this?
If the content of the string is HTML then debugger gives you a choice of viewing "HTML" or "Source". Source should show you any HTML tags that are there.
However if your concern is white space, this may not be enough. Your only option is to "view source" on the original page.
The best course of action is to explicitly handle these possibilities in your regex. For example, if you think you might be getting white space in your target string, use the \s* pattern in the critical positions. That will match zero or more spaces, tabs, and new lines (you must also have the "s" option checked in the regex panel for new lines).
However, without specific examples of source text and the regex you are using - advice can only be generic.
What I do is use a regex tester (whichever uses the same regex engine that you are using) and I test my pattern on it. I've tried using text editors that display invisible characters but to me they only add to the confusion.
So I just go by trial and error. For instance, if a line ends in:
</a>
Then I'll try the following patterns on the regex tester until I find one that works:
</a>.
</a>..
</a>\s
</a>\s*
</a>\n
</a>\r
</a>\r\n
Etc.