Adding a > to markup using regex - regex

really easy question
$config['markup'][] = array("/\[gt\](.+?)\[\/gt\]/s", "<span class=\"quote\">\$1</span>");
With that current code, [gt]foobar[/gt] will output foobar and apply the span and it works fine, however, I was wondering if
$config['markup'][] = array("/\[gt\](.+?)\[\/gt\]/s", "<span class=\"quote\">\>\$1</span>");
would make it so that [gt]foobar[/gt] would output >foobar
if that isn't the case, how could I achieve the desired output?

Related

Regex for XAML Formatting

I'm attempting to build a PowerShell CmdLet that can parse and cleanly reformat a chunk of XAML or any other markup language.
So far, I've had to build an assortment of CmdLet's so that I can get the correct information to put into this thing (for indentation, counts, items, child items, etc, so forth...)
What I'm attempting to do is to collect ALL of the properties and values in a set of XAML/HTML, etc, and then once I have the lengths of all those variables, I can then start to chunk them out and properly format them so that they all output down a straight line. It may not make a super amount of sense as I describe it? So, here's an example.
<Window xmlns = 'http://schemas.microsoft.com/winfx/2006/xaml/presentation'
xmlns:x = 'http://schemas.microsoft.com/winfx/2006/xaml'
Title = 'Window Title'
Height = '600'
MinHeight = '600'
Width = '800'
MinWidth = '800'
BorderBrush = 'Black'
ResizeMode = 'CanResize'
HorizontalAlignment = 'Center'
WindowStartupLocation = 'CenterScreen'>
The reason I am attempting to build this, is so that I can programmatically save the instructions to a smaller footprint. So, instead of... having fluctuating numbers for each line and item and the end result looking like this...
<Window xmlns='http://schemas.microsoft.com/winfx/2006/xaml/presentation'
xmlns:x = 'http://schemas.microsoft.com/winfx/2006/xaml' Title = 'Window Title' Height = '600'
MinHeight = '600' Width = '800' MinWidth = '800' BorderBrush = 'Black' ResizeMode = 'CanResize'
HorizontalAlignment = 'Center' WindowStartupLocation = 'CenterScreen'>
...I then have a set of instructions that can vectorize the content of the XAML, so that it has a pattern and less randomness. Sure, the line count might get expanded quite a bit, but there's no need to be concerned with that if all it is doing is expanding into RAM. Which is the point of it...
At any rate, the code that I am having trouble with is essentially a way to preserve the spacing between the quoted objects. I feel like I'm beating my head against a wall trying to get this to work correctly when I know it's a matter of Regex ...
I've posted the code I'm talking about via this link.
https://github.com/secure-digits-plus-llc/FightingEntropy/blob/master/Format-XAML.ps1
Lines 43-147
It is a script block, and testing with it requires a Xaml Here String.
Any suggestions would be appreciated. I'm not much of a Regex fan, I understand some basics to it but I'm not that great with it yet.
-MC
Found the answer I was looking for.
Not the most eloquent way to solve the issue I was having, but it works.
"(?<=\').+?(?=\')"
When the lines are split, and you want to preserve the spacing within the quotes, then you need something like this.
I was attempting to iterate through a do loop until the array/string contained (2) single quotes, but what was happening was... 'oh. I thought you wanted to match 'adbhjikvgrfe' with '21345rfs'.
No regex. Wasn't looking to match that. sigh.
Then it was taking the spacing out between the quotes.
sigh
I gotta say... anyone who truly writes good programming...? Well, I tip my hat off to you good sir/ma'am... because... it's a frustrating job. For certain.

Regex (re2 googlesheets) multiple values in multiline cell

Getting stuck on how to read and pretty up these values from a multiline cell via arrayformula.
Im using regex as preceding line can vary.
just formulas please, no custom code
The first column looks like a set of these:
```
[config]
name = the_name
texture = blah.dds
cost = 1000
[effect0]
value = 1000
type = ATTR_A
[effect1]
value = 8
type = ATTR_B
[feature0]
name = feature_blah
[components]
0 = comp_one,1
[resources]
res_one = 1
res_five = 1
res_four = 1
<br/>
Where to be useful elsewhere, at minimum it needs each [tag] set ([effect\d], [feature\d], ect) to be in one column each, for example the 'effects' column would look like:
ATTR_A:1000,ATTR_B:8
and so on.
Desired output can also be seen in the included spreadsheet
<br/>
<b>Here is the example spreadsheet:</b>
https://docs.google.com/spreadsheets/d/1arMaaT56S_STTvRr2OxCINTyF-VvZ95Pm3mljju8Cxw/edit?usp=sharing
**Current REGEXREPLACE**
Kinda works, finds each 'type' and 'value' great, just cant figure out how to extract just that from the rest, tried capture (and non-capturing) groups before and after but didnt work
=ARRAYFORMULA(REGEXREPLACE($A3:$A,"[\n.][effect\d][\n.](.)\n(.)","1:$1 2:$2"))
**Current SUBSTITUTE + REGEXEXTRACT + REGEXREPLACE**
A different approach entirely, also kinda works, longer form though and left with having to parse the values out of that string, where got stuck again. Idea was to use this to simplify, then regexreplace like above. Getting stuck removing content around the final matches though, and if can do that then above approach is fine too.
// First ran a substitute
=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE($A3:$A,char(10),";"),";;",char(10)))
// Then variation of this (gave up on single line 'effect/d' so broke it up to try and get it working)
=ARRAYFORMULA(IF(A3:A<>"",IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect0]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect1]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect2]);(.)$")&";;"),""))
// Then use regexreplace like above
=ARRAYFORMULA(REGEXREPLACE($B3:$B,"value = (.);type = (.);;","1:$1 2:$2"))
**--EDIT--**
Also, as my updated 'Desired Output' sheet shows (see timestamped comment below), bonus kudos if you can also extract just the values of matching 'type's to those extra columns (see spreadsheet).
All good if you cant though, just realized would need that too for lookups.
**--END OF EDIT--**
<br/>
Ive tried dozens of things, discarding each in turn, had a quick look in version history to grab out two promising attempts and shared them in separate sheets.
One of these also used SUBSTITUTE to simplify input column, im happy for a solution using either RAW or the SUBSTITUTE results.
<br/>
**Potentially Useful links:**
https://github.com/google/re2/wiki/Syntax
<br/>
<b>Just some more words:</b>
I also have looked at dozens of stackoverflow and google support pages, so tried both REGEXEXTRACT and REGEXREPLACE, both promising but missing that final tweak. And i tried dozens of tweaks already on both.
Any help would be great, and hopefully help others in future since examples with spreadsheets are great since every new REGEX seems to be a new adventure ;)
<br/>
P.S. if we can think of better title for OP, please say in comment or your answer :)
paste in B3:
=ARRAYFORMULA(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
IF(C3:E<>"", C2:E2&":"&C3:E, )),,999^99))), " ", ", "))
paste in C3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&C2)))
paste in D3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&D2)))
paste in E3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&E2)))
paste in F3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[feature\d+\]\nname = (.*)")))
paste in G3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[components\]\n\d+ = (.*)")))
paste in H3:
=ARRAYFORMULA(IFNA(REGEXREPLACE(INDEX(SPLIT(REGEXEXTRACT(
REGEXREPLACE(A3:A, "\n", ", "), "\[resources\], (.*)"), "["),,1), ", , $", )))
spreadsheet demo
This was a fun exercise. :-)
Caveat first: I have added some "input data". Examples:
[feature1]
name = feature_active_spoiler2
[components]
0 = spoiler,1
1 = spoilerA, 2
So the output has "extra" output.
See the tab ADW's Solution.

Using Regex in Pig in hadoop

I have a CSV file containing user (tweetid, tweets, userid).
396124436476092416,"Think about the life you livin but don't think so hard it hurts Life is truly a gift, but at the same it is a curse",Obey_Jony09
396124436740317184,"“#BleacherReport: Halloween has given us this amazing Derrick Rose photo (via #amandakaschube, #ScottStrazzante) http://t.co/tM0wEugZR1” yes",Colten_stamkos
396124436845178880,"When's 12.4k gonna roll around",Matty_T_03
Now I need to write a Pig Query that returns all the tweets that include the word 'favorite', ordered by tweet id.
For this I have the following code:
A = load '/user/pig/tweets' as (line);
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'(.*)[,”:-](.*)[“,:-](.*)')) AS (tweetid:long,msg:chararray,userid:chararray);
C = filter B by msg matches '.*favorite.*';
D = order C by tweetid;
How does the regular expression work here in splitting the output in desired way?
I tried using REGEX_EXTRACT instead of REGEX_EXTRACT_ALL as I find that much more simpler, but couldn't get the code working except for extracting just the tweets:
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT(line,'[,”:-](.*)[“,:-]',1)) AS (msg:chararray);
the above alias gets me the tweets, but if I use REGEX_EXTRACT to get the tweet_id, I do not get the desired o/p: B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT(line,'(.*)[,”:-]',1)) AS (tweetid:long);
(396124554353197056,"Just saw #samantha0wen and #DakotaFears at the drake concert #waddup")
(396124554172432384,"#Yutika_Diwadkar I'm just so bright 😁")
(396124554609033216,"#TB23GMODE i don't know, i'm just saying, why you in GA though? that's where you from?")
(396124554805776385,"#MichaelThe_Lion me too 😒")
(396124552540852226,"Happy Halloween from us 2 #maddow & #Rev_AlSharpton :) http://t.co/uC35lDFQYn")
grunt>
Please help.
Can't comment, but from looking at this and testing it out, it looks like your quotes in the regex are different from those in the csv.
" in the csv
” in the regex code.
To get the tweetid try this:
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT(line,'.*(,")',1)) AS (tweetid:long);

Pig capture matching string with regex

I am trying to capture image url's from inside tweets.
REGISTER 'hdfs:///user/cloudera/elephant-bird-pig-4.1.jar';
REGISTER 'hdfs:///user/cloudera/elephant-bird-core-4.1.jar';
REGISTER 'hdfs:///user/cloudera/elephant-bird-hadoop-compat-4.1.jar';
--Load Json
loadJson = LOAD '/user/cloudera/tweetwall' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map []);
B = FOREACH loadJson GENERATE flatten(json#'tweets') as (m:map[]);
tweetText = FOREACH B GENERATE FLATTEN(m#'text') as (str:chararray);
intermediate date looks like this:
(#somenameontwitter your nan makes me laugh with some of the things she comes out with like http://somepics.com/my.jpg)
then I try to do the following to get only the image url back :
x = foreach tweetText generate REGEX_EXTRACT_ALL(str, '((http)(.*)(.jpg|.bmp|.png))');
dump x;
but that doesn't seem to work. I have also been trying with filter to no avail.
Even when trying the above with .* it returns empty results () or (())
I'm not good with regex and pretty new to Pig so it could be that I'm missing something simple here that I'm just not seeing.
update
example input data
{"tweets":[{"created_at":"Sat Nov 01 23:15:45 +0000 2014","id":5286804225,"id_str":"5286864225","text":"#Beace_ your nan makes me laugh with some of the things she comes out with blabla http://t.co/b7hjMWNg is an url, but not a valid one http://www.something.com/this.jpg should be a valid url","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":52812992878592,"in_reply_to_status_id_str":"522","in_reply_to_user_id":398098,"in_reply_to_user_id_str":"3","in_reply_to_screen_name":"Be_","user":{"id":425,"id_str":"42433395","name":"SAINS","screen_name":"sa3","location":"Lincoln","profile_location":null,"description":"","url":null,"entities":{"description":{"urls":[]}},"protected":false,"followers_count":92,"friends_count":526,"listed_count":0,"created_at":"Mon May 25 16:18:05 +0000 2009","favourites_count":6,"utc_offset":0,"time_zone":"London","geo_enabled":true,"verified":false,"statuses_count":19,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"EDECE9","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme3\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme3\/bg.gif","profile_background_tile":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/52016\/DGDCj67z_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/526\/DGDCj67z_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/424395\/13743515","profile_link_color":"088253","profile_sidebar_border_color":"D3D2CF","profile_sidebar_fill_color":"E3E2DE","profile_text_color":"634047","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":false,"follow_request_sent":false,"notifications":false},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":1,"entities":{"hashtags":[],"symbols":[],"user_mentions":[{"screen_name":"e_","name":"\u2601\ufe0f effy","id":3998,"id_str":"398","indices":[0,15]}],"urls":[]},"favorited":false,"retweeted":false,"lang":"en"}]}
Try this and let me know if this works
x = foreach tweetText generate REGEX_EXTRACT(str,'.*(http://.*.[jpg|bmp|png])',1);
DUMP x;
I managed to get it working (though I doubt it is totally optimal)
x = foreach tweetText generate REGEX_EXTRACT(str,'(http://.*(.jpg|.bmp|.png))',1) as image;
filtered = FILTER x BY $0 is not null;
dump filtered;
so the initial problem was just the regex (and my lack of knowledge on the subject).
Thanks for the assistance sivasakthi jayaraman!

Stacking related lines together in notepad++

Hi so I'm trying to use find and replace in notepad++ with regular expression to do the following:
I have two set of lines
first set:
[c][eu][e]I37ANKCB[/e]
[c][eu][e]OIL8ZEPW[/e]
[c][eu][e]4OOEL75O[/e]
[c][eu][e]PPNW5FN4[/e]
[c][eu][e]E2BXCWUO[/e]
[c][eu][e]SD9UQNT8[/e]
[c][eu][e]E6BK6IGO[/e]
second set:
[u]7ubju2jvioks[u2]_261
[u]89j408tah1lz[u2]_262
[u]j673xnd49tq0[u2]_263
[u]dv73osmh1wzu[u2]_264
[u]twz3u4yiaeqr[u2]_265
[u]cuhtg6r71kud[u2]_266
[u]yts0ktvt9a3r[u2]_267
now I want to the second set to by places after each of the first set like this:
[c][eu][e]I37ANKCB[/e][u]7ubju2jvioks[u2]_261
[c][eu][e]OIL8ZEPW[/e][u]89j408tah1lz[u2]_262
[c][eu][e]4OOEL75O[/e][u]j673xnd49tq0[u2]_263
[c][eu][e]PPNW5FN4[/e][u]dv73osmh1wzu[u2]_264
[c][eu][e]E2BXCWUO[/e][u]twz3u4yiaeqr[u2]_265
[c][eu][e]SD9UQNT8[/e][u]cuhtg6r71kud[u2]_266
[c][eu][e]E6BK6IGO[/e][u]yts0ktvt9a3r[u2]_267
any suggestions?
You can mark the second block in column mode using ALT and the left mouse button. Then just copy paste it at the end of the first row.
No need/Not possible using regex.
I would solve this via a simple script written in Python or Ruby or something equally quick. This works, for example:
import os
path = os.path.dirname(__file__)
with open(os.path.join(path, 'file1')) as file1:
with open(os.path.join(path, 'file2')) as file2:
lines = zip(file1.readlines(), file2.readlines())
print ''.join([a.rstrip() + b for a, b in lines])
Running it gives the correct result:
> python join.py
[c][eu][e]I37ANKCB[/e][u]7ubju2jvioks[u2]_261
[c][eu][e]OIL8ZEPW[/e][u]89j408tah1lz[u2]_262
[c][eu][e]4OOEL75O[/e][u]j673xnd49tq0[u2]_263
[c][eu][e]PPNW5FN4[/e][u]dv73osmh1wzu[u2]_264
[c][eu][e]E2BXCWUO[/e][u]twz3u4yiaeqr[u2]_265
[c][eu][e]SD9UQNT8[/e][u]cuhtg6r71kud[u2]_266
[c][eu][e]E6BK6IGO[/e][u]yts0ktvt9a3r[u2]_267
Customize to suit your needs.