I have a partial solution to convert this
USERNAME=CONSTANT[myUserName]
PASSWORD=CONSTANT[mypwd]
to
"USERNAME":"myUserName",
"PASSWORD":"mypwd"
I see a similar solution here
properties file to json. Basically I am looking for zero or more spaces 1.) anywhere before or after a key 2.) before and after = sign
USERNAME = CONSTANT[myUserName]
PASSWORD = CONSTANT[mypwd]
Find What: (^[^ \t]+)(\s.*=\s*CONSTANT\[)(.*[^\n])(\])
Replace: "$1":"$2",
"USERNAME":"myUserName",
"PASSWORD":"mypwd",
Also I want to make sure I do this for each line and some times it matches multiple lines which is wrong. I hope one can find a solution that works in Eclipse on Windows.
Make sure to use ^ and $ in order to avoid your regex matching multiple lines. Try something like this:
^\s*(\w+)\s*?\=\s*?\w+\[(\w+)\]$
Replace with:
"$1":"$2",
Demo: https://regex101.com/r/mxF8lI/1/
Related
I'm using the following regex to find URLs in a text file:
/http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/
It outputs the following:
http://rda.ucar.edu/datasets/ds117.0/.
http://rda.ucar.edu/datasets/ds111.1/.
http://www.discover-earth.org/index.html).
http://community.eosdis.nasa.gov/measures/).
Ideally they would print out this:
http://rda.ucar.edu/datasets/ds117.0/
http://rda.ucar.edu/datasets/ds111.1/
http://www.discover-earth.org/index.html
http://community.eosdis.nasa.gov/measures/
Any ideas on how I should tweak my regex?
Thank you in advance!
UPDATE - Example of the text would be:
this is a test http://rda.ucar.edu/datasets/ds117.0/. and I want this to be copied over http://rda.ucar.edu/datasets/ds111.1/. http://www.discover-earth.org/index.html). http://community.eosdis.nasa.gov/measures/).
This will trim your output containing trail characters, ) .
import re
regx= re.compile(r'(?m)[\.\)]+$')
print(regx.sub('', your_output))
And this regex seems workable to extract URL from your original sample text.
https?:[\S]*\/(?:\w+(?:\.\w+)?)?
Demo,,, ( edited from https?:[\S]*\/)
Python script may be something like this
ss=""" this is a test http://rda.ucar.edu/datasets/ds117.0/. and I want this to be copied over http://rda.ucar.edu/datasets/ds111.1/. http://www.discover-earth.org/index.html). http://community.eosdis.nasa.gov/measures/). """
regx= re.compile(r'https?:[\S]*\/(?:\w+(?:\.\w+)?)?')
for m in regx.findall(ss):
print(m)
So for the urls you have here:
https://regex101.com/r/uSlkcQ/4
Pattern explanation:
Protocols (e.g. https://)
^[A-Za-z]{3,9}:(?://)
Look for recurring .[-;:&=+\$,\w]+-class (www.sub.domain.com)
(?:[\-;:&=\+\$,\w]+\.?)+`
Look for recurring /[\-;:&=\+\$,\w\.]+ (/some.path/to/somewhere)
(?:\/[\-;:&=\+\$,\w\.]+)+
Now, for your special case: ensure that the last character is not a dot or a parenthesis, using negative lookahead
(?!\.|\)).
The full pattern is then
^[A-Za-z]{3,9}:(?://)(?:[\-;:&=\+\$,\w]+\.?)+(?:\/[\-;:&=\+\$,\w\.]+)+(?!\.|\)).
There are a few things to improve or change in your existing regex to allow this to work:
http[s]? can be changed to https?. They're identical. No use putting s in its own character class
[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),] You can shorten this entire thing and combine character classes instead of using | between them. This not only improves performance, but also allows you to combine certain ranges into existing character class tokens. Simplifying this, we get [a-zA-Z0-9$-_#.&+!*\(\),]
We can go one step further: a-zA-Z0-9_ is the same as \w. So we can replace those in the character class to get [\w$-#.&+!*\(\),]
In the original regex we have $-_. This creates a range so it actually inclues everything between $ and _ on the ASCII table. This will cause unwanted characters to be matched: $%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_. There are a few options to fix this:
[-\w$#.&+!*\(\),] Place - at the start of the character class
[\w$#.&+!*\(\),-] Place - at the end of the character class
[\w$\-#.&+!*\(\),] Escape - such that you have \- instead
You don't need to escape ( and ) in the character class: [\w$#.&+!*(),-]
[0-9a-fA-F][0-9a-fA-F] You don't need to specify [0-9a-fA-F] twice. Just use a quantifier like so: [0-9a-fA-F]{2}
(?:%[0-9a-fA-F][0-9a-fA-F]) The non-capture group isn't actually needed here, so we can drop it (it adds another step that the regex engine needs to perform, which is unnecessary)
So the result of just simplifying your existing regex is the following:
https?://(?:[$\w#.&+!*(),-]|%[0-9a-fA-F]{2})+
Now you'll notice it doesn't match / so we need to add that to the character class. Your regex was matching this originally because it has an improper range $-_.
https?://(?:[$\w#.&+!*(),/-]|%[0-9a-fA-F]{2})+
Unfortunately, even with this change, it'll still match ). at the end. That's because your regex isn't told to stop matching after /. Even implementing this will now cause it to not match file names like index.html. So a better solution is needed. If you give me a couple of days, I'm working on a fully functional RFC-compliant regex that matches URLs. I figured, in the meantime, I would at least explain why your regex isn't working as you'd expect it to.
Thanks all for the responses. A coworker ended up helping me with it. Here is the solution:
des_links = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', des)
for i in des_links:
tmps = "/".join(i.split('/')[0:-1])
print(tmps)
I'm using an application called Firemon which uses regex to pull text out of various fields. I'm unsure what specific version of regex it uses, I can't find a reference to this in the documentation.
My raw text will always be in the following format:
CM: 12345
APP: App Name
BZU: Dept Name
REQ: First Last
JST: Text text text text.
CM will always be an integer, JST will be sentence that may span multiple lines, and the other fields will be strings that consist of 1-2 words - and there's always a return after each section.
The application, Firemon, has me create a regex entry for each field. Something simple that looks for each prefix and then a return should work, because I return after each value. I've tried several variations, such as "BZU:\s*(.*)", but can't seem to find something that works.
EDIT: To be clear I'm trying to get the value after each prefix. Firemon has a section for each field. "APP" for example is a field. I need a regex example to find "APP:" and return the text after it. So something as simple as regex that identifies "APP:", and grabs everything after the : and before the return would probably work.
You can use (?=\w+ )(.*)
Positive lookahead will remove prefix and space character from match groups and you will in each match get text after space.
I am a little late to the game, but maybe this is still an issue.
In the more recent versions of FireMon, sample regexes are provided. For instance:
jst:\s*([^;]?)\s;
will match on:
jst:anything in here;
and result in
anything in here
I am trying to write a regex which will strip away the rest of the path after a particular folder name.
If Input is:
/Repository/Framework/PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces/IDemoReader.cs
Output should be:
/Repository/Framework/PITA/branches/ChangePack-6a7B6
Some constrains:
ChangePack- will be followed change pack id which is a mix of numbers or alphabets a-z or A-Z only in any order. And there is no limit on length of change pack id.
ChangePack- is a constant. It will always be there.
And the text before the ChangePack can also change. Like it can also be:
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces
My regex-fu is bad. What I have come up with till now is:
^(.*?)\-6a7B6
I need to make this generic.
Any help will be much appreciated.
Below regex can do the trick.
^(.*?ChangePack-[\w]+)
Input:
/Repository/Framework/PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces/IDemoReader.cs
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces
Output:
/Repository/Framework/PITA/branches/ChangePack-6a7B6
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6
Check out the live regex demo here.
^(.*?ChangePack-[a-zA-Z0-9]+)
Try this.Instead of replace grab the match $1 or \1.See demo.
https://regex101.com/r/iY3eK8/17
Will you always have '/Repository/Framework/PITA/branches/' at the beginning? If so, this will do the trick:
/Repository/Framework/PITA/branches/\w+-\w*
Instead of regex you could can use split and join functions. Example python:
path = "/a/b/c/d/e"
folders = path.split("/")
newpath = "/".join(folders[:3]) #trims off everything from the third folder over
print(newpath) #prints "/a/b"
If you really want regex, try something like ^.*\/folder\/ where folder is the name of the directory you want to match.
I want to make a regex so I can do a "Search/Replace"
over a json file with many object.
Every object has a key named "resource"
containing a URL.
Take a look at these examples:
"resource":"http://www.img/qwer/123/image.jpg"
"resource":"io.nl.info/221/elephant.gif"
"resource":"simgur.com/icon.png"
I want to make a regex to replace the whole url with
a string like this: img/filename.format.
This way, the result would be:
"resource":"img/image.jpg"
"resource":"img/elephant.gif"
"resource":"img/icon.png"
I'm just starting with regular expressions and I'm
completely lost. I was thinking that one valid idea would
be to write something starting with this pattern "resource":"
and ending with the last five characters. But I don't even know how to try
that.
How could I write the regular expression?
Thanks in advance!
Try this:
Find: "resource":\s*"[^"]+?([^\/"]+)"
Replace: "resource":"img/\1
Using [^"]+? ensures the match won't roll off the end of the current entry and gobble up too much input, and it's reluctant (with the added ?) so it gets the whole image file name (instead ofwhat the last character).
Edit:
I added optional whitespace after the key, which your pastebin has.
See a live demo of this regex with your pastebin.
Regex
.*\/
Debuggex Demo
This will find the text you want to replace. Replace it with img/ if you want to find the whole text you'll need to look for the following Regex:
("resource":").*\/
Debuggex Demo
Then replace with $1img/ this should give you group 1 and the img part.
Let me know if there are any questions
Note: I personally would just use objects since you have the JSON and parse it to a object then iterate over the objects and change each resource on each object independently rather than looking for a magic bullet
If your JSON is an array of objects containing resource field I would do it in 3 steps: convert to object, find resources and replace them, convert back to string (optional)
var tmp = JSON.parse('<your json>');
for (i = 0; i < tmp.length; ++i) {
for (e in tmp[i])
if (e == 'resource')
tmp[i][e] = tmp[i][e].replace(/.*(?=img\/.*\..*)/,'')
}
tmp = JSON.stringify(tmp);
I made an article spinner that used regex to find words in this syntax:
{word1|word2}
And then split them up at the "|", but I need a way to make it support tier 2 brackets, such as:
{{word1|word2}|{word3|word4}}
What my code does when presented with such a line, is take "{{word1|word2}" and "{word3|word4}", and this is not as intended.
What I want is when presented with such a line, my code breaks it up as "{word1|word2}|{word3|word4}", so that I can use this with the original function and break it into the actual words.
I am using c#.
Here is the pseudo code of how it might look like:
Check string for regex match to "{{word1|word2}|{word3|word4}}" pattern
If found, store each one as "{word1|word2}|{word3|word4}" in MatchCollection (mc1)
Split the word at the "|" but not the one inside the brackets, and select a random one (aka, "{word1|word2}" or "{word3|word4}")
Store the new results aka "{word1|word2}" and "{word3|word4}" in a new MatchCollection (mc2)
Now search the string again, this time looking for "{word1|word2}" only and ignore the double "{{" "}}"
Store these in mc2.
I can not split these up normally
Here is the regex I use to search for "{word1|word2}":
Regex regexObj = new Regex(#"\{.*?\}", RegexOptions.Singleline);
MatchCollection m = regexObj.Matches(originalText); //How I store them
Hopefully someone can help, thanks!
Edit: I solved this using a recursive method. I was building an article spinner btw.
That is not parsable using a regular expression, instead you have to use a recursive descent parser. Map it to JSON by replacing:
{ with [
| with ,
wordX with "wordX" (regex \w+)
Then your input
{{word1|word2}|{word3|word4}}
becomes valid JSON
[["word1","word2"],["word3","word4"]]
and will map directly to PHP arrays when you call json_decode.
In C#, the same should be possible with JavaScriptSerializer.
I'm really not completely sure WHAT you're asking for, but I'll give it a go:
If you want to get {word1|word2}|{word3|word4} out of any occurrence of {{word1|word2}|{word3|word4}} but not {word1|word2} or {word3|word4}, then use this:
#"\{(\{[^}]*\}\|\{[^}]*\})\}"
...which will match {{word1|word2}|{word3|word4}}, but with {word1|word2}|{word3|word4} in the first matching group.
I'm not sure if this will be helpful or even if it's along the right track, but I'll try to check back every once in a while for more questions or clarifications.
s = "{Spinning|Re-writing|Rotating|Content spinning|Rewriting|SEO Content Machine} is {fun|enjoyable|entertaining|exciting|enjoyment}! try it {for yourself|on your own|yourself|by yourself|for you} and {see how|observe how|observe} it {works|functions|operates|performs|is effective}."
print spin(s)
If you want to use the [square|brackets|syntax] use this line in the process function:
'/[(((?>[^[]]+)|(?R))*)]/x',