The URLEncodedFormat() function does something that I specifically want which is this:
Generates a URL-encoded string. For example, it replaces spaces with %20
Why does EncodeForURL() not do this? It replaces spaces with a plus "+" symbol instead which is stopping my URLs working.
How would I be able to get around this other than using URLEncodedFormat()?
Related
I have a string in the format of "c:\replaceallslashes\directory1\subdirectory1\etc\etc\file.html" in a large number of files. All the backslashes in these strings need be changed to forward slashes so that the path can become a URL. I want to change this via find & replace in a text or regex editor, but don't want to accidentally replace any backslashes outside the string that may occur in the documents.
How should I construct the find and replace command?
Edit: just to be clear, I am looking for regex strings for both the "find" and "replace" fields. The answer below only gives the "find" command.
A very rudimentary Windows path regex would look something like:
[a-z]:\\(?:[a-z0-9_-]+\\?)*
https://regex101.com/r/g1hubs/1
The issue is that Windows filenames are only restricted from using \/:*?"<>| so there's only a small fraction of chars which tell you that something is definitely not a path. So my example assumes that you only have alphanumeric paths which may or may not include underscores and dashes.
I'm really bad at regex and still learning. I'm trying to setup my regex to find my first URI below.
/test/guid/5824812d100afbc60ef09411
/test/guid/5824812d100afbc60ef09411/action/create
/test/guid/5824812d100afbc60ef09411/action/version/delete
I have my regex working for both the (/action/create) and (/action/version/delete).
I need the first to be it's own individual URI. The guid after /guid changes, but it never will contain anything after.
These are working:
\/test\/guid\/\d.*\/action\/create
\/test\/guid\/\d.*\/action\/version\/delete
However if I use the same convention to find the first URI, it finds them all. I need all 3 separate.
Help?
Anchors are your friend here. ^ matches the beginning of a line (or beginning of the full string, depending on your modifiers) and $ matches the end.
So all you need is something like this:
\/test\/guid\/[a-z0-9]+$
That should be good enough, since after the guid's string of alphanumeric characters you're expecting the string to either terminate or have a forward slash, but if your guid is of a known fixed length, it might be better to do something like:
\/test\/guid\/[a-z0-9]{24}$
I am trying to get through a lot of content and to extract some data from it. Therefore I need to pick the information between 2 set of characters.
It looks like this
***some text*** li> ***data to capture*** </li ***more text***
What regex can I use to get everything that is enclosed between li> and </li ?
Basically it will be like this:
li>(.*?)(?:</li)
Depending on your language environment, certain characters may need to be escaped or the way of retrieving the matched string may differ. Typically you would need to escape / by prepending a backslash, resulting in this new version:
li>(.*?)(?:<\/li)
Here's a live demo:
https://regex101.com/r/zV4uN6/1
I'm looking at a JSON feed from Twitter and trying to make URLs clickable using a regular expression.
The problem is that there are URLs in the text with trailing commas. A comma can legally be part of a URL, but in this case they're just punctuation inserted by the user.
Is there any way around this? Am I missing something?
You are not missing something; there is no fool-proof way of determining the "intended" URL if it is provided as and is surrounded by plaintext. Your best bet is to make an educated guess.
A common approach is to check if the punctuation mark(s) in question is followed by a whitespace or is the terminator of the string. If it is, do not interpret it as part of the URL; otherwise, include it.
Keep in mind this problem isn't limited to commas or a single character (consider the ellipsis, ...).
You could ignore the last character if it is punctuation (so that punctuation in the middle of a url doesn't affect it).
eg. Regex could be something like:
`([a-z/A-Z0-9.,]*?)([.,]?)\s`
Warning (the first part of the regex doesn't include all url stuff, so you still need to fix that. But essentially, we have ([a-z/A-Z0-9.,]*?) which matches the main part of the URL. the * allows many characters, but we use ? so that it isn't greedy.
Then we use ([.,]?) to match a possible trailing punctuation, and \s to match a space or whitespace.
The first subexpression is therefore the url, and you can turn it into a link.
If you have access to the internet, you could try accessing the resource to see if it returns a 404 to decide whether the trailing punctuation is part of the URL or actual punctuation.
Ok first off I am using built-in .NET regex this I what I was told I am using. I am using the group function A(.*?)B than I am replacing it with nothing (basically removing it). What I am doing is removing some unwanted stuff from the end of a url I am scraping.
But the problem is for "B" I am using the quote which needs to be in there. Is there a way to say remove everything between A and b But not A and B? But A and B has to be used as markers in this example. I hope I explained this well enough.
Just in case I didn't I'll use an example random words and spaces nothing to use as indicators on either site "example.com" sometimes space no space sometimes words letters, etc. Now I want example.com with the quotes but everything changes on each side including spaces.
But I need example.com including the quotes so I cant just use "(.*?)" because once I use the replace function it wont get the quotes which I need to keep.
Ok rewording this A(.*?)B replace essentially I am reving whats in between A and B with nothing which is fine But i want to keep A and B i cannot use any characters or words before or after A and B because they are random and change for example how would you remove this: "example.com" but keep the quotes when everything before the quotes and inside the quotes is changing.
You can use lookaround assertion:
I don't know the exact syntax for the regex flavour you're using but you can adapt this to your language.
replace (?<=A).*?(?=B) by nothing