Group / split string into 2's set

Group / split string into 2's set - regex

This I probably a dumb question but it beats me.
The same thing in Python works perfectly, although in AS3 doesn't.
var s:String = "123456";
trace(s.split(/../gm));
Expecting this as array: ['12','34','56']
But instead I get: [,,]
I have experimented various regexr patterns but none split into 2-char batches.
Any ideas / solutions ?

You're using the split command, which means the string will be divided into an array of values using the regular expression .. to match the delimiters. These delimiters are then not included in the output.
I think you want to do something like s.match(/../g). See also this link for more information about .match

Related

Parsing links with regex

I have a problem I can't seem to figure out how to write a regular expression correctly. How to write a regular expression that for example if I have loaded some text the part that interests me is links that end with .m3u or m3u8. For example if i specify this input in my program
Input - player = new Player({"player-id":"1","autoplay":"false","fullscreen":"false","debug":"true","content-volume":"85","ad-volume":"30","ad-load-timeout":"15000","div-id":"videoPlayer","default-quality-index":0,"title":"\u0428\u043f\u0438\u043e\u043d, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043c\u0435\u043d\u044f \u043a\u0438\u043d\u0443\u043b ","poster":"https://test/four/v1/video-file1/00/00/00/00/00/00/00/10/22/11/102211-480p.mp4/thumb-33000.jpg","content":{"mp4":[],"dash":"https://test/four/v1/video-file1/00/00/00/00/00/00/00/10/22/11/102211-,480,p.mp4.urlset/manifest.mpd","hls":"https://test/four/v1/video-file1/00/00/00/00/00/00/00/10/22/11/102211-,480,p.mp4.urlset/master.m3u8"},"about":"false","key":"4eeeb77181526bedc1025586d43a70fa","btn-play-pause":"true","btn-stop":"true","btn-fullscreen":"true","btn-prev-next":"false","btn-share":"true","btn-vk-share":"true","btn-twitter-share":"true","btn-facebook-share":"true","btn-google-share":"true","btn-linkedin-share":"true","quality":"true","volume":"true","timer":"true","timeline":"true","iframe-version":"true","max-hls-buffer-size":"10","time-from-cookie":"true","set-prerolls":["https://test/j/v.php?id=645"],"max-prerolls-impressions":1});
By using regex the output should be -
https://test/four/v1/video-file1/00/00/00/00/00/00/00/10/22/11/102211-,480,p.mp4.urlset/master.m3u8
I have tried writing this regex expression but it parses all links and not the ones that I need. I only need the links tht end with a specific tag
Thank you for your answer in advance

I dont see why there are so much downvotes, maybe the question looked totally different originally.
Using regex only, my solution in ASP.net would be to reverse the text first, then look up for everything between "u3m" until the next occurence of "ptth".
Play with it: http://refiddle.com/nwvu
Regex for m3u8 OR m3u:
(8u3m.+?ptth)|(u3m.+?ptth)
ASP String reversal (from https://forums.asp.net/t/1841367.aspx?Reverse+String+in+asp+net):
string input = TextBox1.Text;
char[] inputarray = input.ToCharArray();
Array.Reverse(inputarray);
string output = new string(inputarray);

How can I use Regex to parse irregular CSV and not select certain characters

I have to handle a weird CSV format, and I have been running into problems. The string I have been able to work out thus far is
(?:\s*(?:\"([^\"]*)\"|([^,]+))\s*?)+?
My files are often broken and irregular, since we have to deal with OCR'd text which is usually not checked by our users. Therefore, we tend to end up with lots of weird things, like a single " within a field, or even a newline character(which is why I am using Regex instead of my previous readLine()-based solution). I've gotten it to parse most everything correctly, except it captures [,] [,]. How can I get it to NOT select fields with only a single comma? When I try and have it not select commas, it turns "156,000" into [156] and [000]
The test string I've been using is
"156,000","",""i","parts","dog"","","Monthly "running" totals"
The ideal desire capture output is
[156,000],[],[i],[parts],[dog],[],[Monthly "running" totals]
I can do with or without the internal quotes, since I can always just strip them during processing.
Thank you all very much for your time.

Your CSV is indeed irregular and difficult to parse. I suggest you do 2 replacements first to your data.
// remove all invalid double ""
input = Regex.Replace(input, #"(?<!,|^)""(?=,|$)|(?<=,)""(?!,|$)", "\"");
// now escape all inner "
input = Regex.Replace(input, #"(?<!,|^)"(?!,|$)", #"\\\"");
// at this stage your have proper CSV data and I suggest using a good .NET csv parser
// to parse your data and get individual values
Replacement 1 demo
Replacement 2 demo

How to replace characters in string Erlang?

I have this piece of code that gets sessionid, make it a string, and then create a set with key as e.g. {{1401,873063,143916},<0.16443.0>} in redis. I'm trying replace { characters in this session with letter "a".
OldSessionID= io_lib:format("~p",[OldSession#session.sid]),
StringForOldSessionID = lists:flatten(OldSessionID),
ejabberd_redis:cmd([["SADD", StringForSessionID, StringForUserInfo]]);
I've tried this:
re:replace(N,"{","a",[global,{return,list}]).
Is this a good way of doing this? I read that regexp in Erlang is not a advised way of doing things.

Your solution works, and if you are comfortable with it, you should keep it.
On my side I prefer list comprehension : [case X of ${ -> $a; _ -> X end || X <- StringForOldSessionID ]. (just because I don't have to check the function documentation :o)

re:replace(N,"{","a",[global,{return,list}]).
Is this a good way of doing this? I read that regexp in Erlang is not
a advised way of doing things.
According to official documentation:
2.5 Myth: Strings are slow
Actually, string handling could be slow if done improperly. In Erlang, you'll have to think a little more about how the strings are used and choose an appropriate representation and use the re module instead of the obsolete regexp module if you are going to use regular expressions.
So, either you use re for strings, or:
leave { behind(using pattern matching)
if, say, N is {{1401,873063,143916},<0.16443.0>}, then
{{A,B,C},Pid} = N
And then format A,B,C,Pid into string.

Since Erlang OTP 20.0 you can use string:replace/3 function from string module.
string:replace/3 - replaces SearchPattern in String with Replacement. 3rd function parameter indicates whether the leading, the trailing or all encounters of SearchPattern are to be replaced.
string:replace(Input, "{", "a", all).

Find/Replace string that doesn't contain quotes

I have inherited a rather large/ugly php codebase (language is unimportant, this is a generic vim question) , where nothing is quoted properly (old php doesn't mind, but new php versions throw warnings).
I'd like to turn $something[somekey] into $something['somekey'], only if its not already quoted or contain the character $
I was trying to build a regular expression to quote the keys, but just cant seem to be able get it to cooperate.
This is what i have so far, which doesn't work but maybe will help explain my question better. And to show that i have actually tried.
:%s/\v\$(.{-})\[(['"$]#<!.{-})\]/$\1['\2']/
My goal is to have something like this:
$something[somekey] = $something['somekey']
$somethingelse[someotherthing] = $something['someotherthing']
$another['key'] = $another['key'] (is ignored)
$yetanother["keykey"] = $yetanother["keykey"] (is ignored)
$derp[$herp] = $derp[$herp] (is ignored)
$array[3] = $array[3] (is ignored)
These can appear anywhere in text, even multiple on the same line, and even touching each other like $something[key]$something[key2], which i would like to be replaced with $something['key']$something['key2']
Another problem, there seems to be random javascript arrays in some files.. which have [] square brackets. So the regex needs to check to see if it starts with $ and text before the brackets.
Im probably asking for the impossible, but any help on this would be great before i go insane editing each file one by one manually.
EDIT: forgot that keys can be numeric, and shouldn't be quoted.

I tried the following, which processed everything from your question correctly:
:%s/\[\(\I\i*\)\]/['\1']/g
Or, with optional white spaces inside the parens:
:%s/\[\s*\(\I\i*\)\s*\]/['\1']/g
And also checking for $identifier before the parens:
:%s/\(\$\i\+\)\[\s*\(\I\i*\)\s*\]/\1['\2']/g

RegEx for VB.net

I have a txt file with content
$NETS
P3V3_AUX_LGATE; PQ6.8 PU37.2
U335_PIN1; R3328.1 U335.1
$END
need to be updated in this format, and save back to another txt file
$NETS
'P3V3_AUX_LGATE'; PQ6.8 PU37.2
'U335_PIN1'; R3328.1 U335.1
$END
NOTE: number of lines may go up to 10,000 lines
My current solution is to read the txt file line by line, detect the presence of the ";" and newline character and do the changes.
Right now i have a variable that holds ALL the lines, is there other way something like Replace via RegEx to do the changes without looping thru each line, this way i can readily print the result
and follow up question, which one is more efficient?

Try
ResultString = Regex.Replace(SubjectString, "^([^;\r\n]+);", "'$1';", RegexOptions.Multiline)
on your multiline string.
This will find any string (length one or more) at the start of a line up until the first semicolon if there is one and replace it with its quoted equivalent.
It should be more efficient than looping through the string line by line as you're doing now, but if you're in doubt, you'd have to profile it.

You could probably find all the matches using something like \w+; but I don't know how you'd be able to do a replace on that using Regex.Replace to add the 's but keep the original match.
However, if you already have it as one variable, you don't have to read the file again, either you could make your code find all ;s and then find the previous newline for each, or you could use a String.Split on newlines to split the variable you've already got into lines.
And if you want to get it back to one variable you can just use String.Join.
Personally I'd normally use the String.Split (and possibly the String.Join if needed) method, since I think that would make the code easy to read.

I would say Yes! this can be done with Regular expressions. Make sure you got the "multiline" option turned on and craft your regular expression using some capture groups to ease the work.
I can however say this will NOT be the optimal one. Since you mention the amount of lines you could be processing, it seems 'resource wise' smarter to use a streaming approach instead of the in memory approach.
Taking the Regex approach (and this took 15 mins so please don't think this is an optimal solution, just prove it would work)
private static Regex matcher = new Regex(#"^\$NETS\r\n(?<entrytitle>.[^;]*);\s*(?<entryrest>.*)\r\n(?<entrytitle2>.[^;]*);\s*(?<entryrest2>.*)\r\n\$END\r\n", RegexOptions.Compiled | RegexOptions.Multiline);
static void Main(string[] args)
{
string newString = matcher.Replace(ExampleFileContent, new MatchEvaluator(evaluator));
}
static string evaluator(Match m)
{
return String.Format("$NETS\r\n'{0}'; {1}\r\n'{2}'; {3}\r\n$END\r\n",
m.Groups["entrytitle"].Value,
m.Groups["entryrest"].Value,
m.Groups["entrytitle2"].Value,
m.Groups["entryrest2"].Value);
}
Hope this helps,

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Group / split string into 2's set - regex

Related

Parsing links with regex

How can I use Regex to parse irregular CSV and not select certain characters

How to replace characters in string Erlang?

Find/Replace string that doesn't contain quotes

RegEx for VB.net

Categories

Resources