I'm trying to fix malformed XML with Regex in VS Code.
Problem code is self-closing tags. On build of the project, these get nested as children. So I just need a simple regex pattern to match certain elements only (polygon, here), and ignore others.
Using Regex pattern:
(?:\.[0-9]+\"\s)\/\>
I want to search: tags containing a decimal point, number(s), and end double quote.
I want to replace: everything after that.
Problem -- it replaces the entire part, instead of the chunk in the second expression ()
Example:
...
<stop offset="1" stop-color="#7ccb13" />
<stop offset="1" stop-color="#7ccb55" />
...
<polygon points="393.38 285.04 394.44 284.63 393.67 283.79 394.61 284.38 394.59 283.31 394.94 284.34 395.79 283.54 395.21 284.5 396.3 284.49 395.28 284.83 396.16 285.64 395.05 285.05 395.12 286.27 394.73 285.1 393.89 285.88 394.43 284.99 393.38 285.04" />
<polygon points="159.41 439.9 160.88 439.35 159.82 438.19 161.12 439.01 161.08 437.52 161.57 438.95 162.74 437.84 161.94 439.16 163.46 439.15 162.04 439.63 163.25 440.75 161.72 439.92 161.82 441.61 161.28 440 160.12 441.07 160.87 439.84 159.41 439.9" />
<polygon points="180.71 444.75 182.18 444.2 181.11 443.04 182.41 443.86 182.38 442.37 182.87 443.8 184.04 442.69 183.24 444.01 184.75 444 183.33 444.48 184.55 445.59 183.02 444.77 183.12 446.46 182.57 444.85 181.42 445.92 182.16 444.69 180.71 444.75" />
...
Desired output -- closing tags:
I think (<polygon.+)(\/>) should work, and replace with $1></polygon>.
example
Related
i am trying to create a regx expression for fluentbit parser and not sure how to drop specific characters from a string
<testsuite name="Activity moved" tests="1" errors="0" failures="0" skipped="0" time="151.109" timestamp="2022-09-05T16:22:53.184000">
Above is the input which is i have as a string and i want to make multiple keys out of it.
expected output:
name: Activity moved
tests: 1
errors: 0
failures: 0
skipped: 0
timestamp: 2022-09-05T16:22:53.184000
How can i achieve this please?
try this:
str = "<testsuite name=\"Activity moved\" tests=\"1\" errors=\"0\" failures=\"0\" skipped=\"0\" time=\"151.109\" timestamp=\"2022-09-05T16:22:53.184000\">"
regexp = /(\w*)="(.*?)"/ # there's your regexp
str.scan(regexp).to_h # and this is how you make the requested hash
# => {"name"=>"Activity moved", "tests"=>"1", "errors"=>"0", "failures"=>"0", "skipped"=>"0", "time"=>"151.109", "timestamp"=>"2022-09-05T16:22:53.184000"}
Of course you can write your own parser but may be it's more comfortable to use Nokogiri?
require 'nokogiri'
doc = Nokogiri::XML(File.open("your.file", &:read))
puts doc.at("testsuite").attributes.map { |name, value| "#{name}: #{value}" }
Let's say I have a line of text like this
アイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンヴガギグゲゴザジズゼゾダヂヅデドバビブベボパピプペポァィゥェォャュョッアイウエオカキクケコサシスセソタチツテトナアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンヴガギグゲゴザジズゼゾダヂヅデドバビブベボパピプペポァィゥェォャュョッアイウエオカキクケコサシスセソタチツテトナアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワヲンヴガギグ
I want to verify input is katakana or not so I use regex
'/^[゠ ァ ア ィ イ ゥ ウ ェ エ ォ オ カ ガ キ ギ ク グ ケ ゲ コ ゴ サ ザ シ ジ ス ズ セ ゼ ソ ゾ タ ダ チ ヂ ッ ツ ヅ テ デ ト ド ナ ニ ヌ ネ ノ ハ バ パ ヒ ビ ピ フ ブ プ ヘ ベ ペ ホ ボ ポ マ ミ ム メ モ ャ ヤ ュ ユ ョ ヨ ラ リ ル レ ロ ヮ ワ ヰ ヱ ヲ ン ヴ ヵ ヶ ヷ ヸ ヹ ヺ ・ ー ヽ ヾ ヿ⦅ ⦆ 。 「 」 、 ・ ヲ ァ ィ ゥ ェ ォ ャ ュ ョ ッ ー ア イ ウ エ オ カ キ ク ケ コ サ シ ス セ ソ タ チ ツ テ ト ナ ニ ヌ ネ ノ ハ ヒ フ ヘ ホ マ ミ ム メ モ ヤ ユ ヨ ラ リ ル レ ロ ワ ン ゙]+$/'
Is there some way to compact that?
I know its hard code, before that I used ^[ァ-ヴーァ-ン゙゚]+$ but it not work in laravel request rule.
Your regex ァ-ヴーァ-ン゙゚ is correct, you just need to add /u to make it work.
so the correct regex code is
/^[ァ-ヴーァ-ン゙゚]+$/u
or an example in the laravel validation :
'name' => 'required|regex:/^[ァ-ヴーァ-ン゙゚]+$/u',
The /u modifier is for unicode support
You can also use Unicode octal as regex range, an example for Katakana is ([\u30a0-\u30ff]*), but in php pcre \u should be changed to \x like:
'name' => 'required|regex:/^[\x{30a0}-\x{30ff} ]+$/u',
Also, you can check this gist for other katakana and hiragana regex. Example:
Regex for matching full-width Katakana (zenkaku 全角)
([ァ-ン])
Regex for matching half-width Katakana (hankaku 半角)
([ァ-ン゙゚])
I have an XML-code where some tags generate xml parse errors (Error #1090). The problem is in attributes that need to be quoted:
<div class=treeview>
Help me please to write a regular expression to make them as following:
<div class="treeview">
this one will be correct:
var pattern:RegExp = /(\w+)(=)(\w+)/g;
trace('regexTest:', pString.replace(pattern, '$1$2"$3"'));
because, there must be 3 groups: attribute_name, = (equals), attribute_value
Please, could you try the next code:
var regExp:RegExp = /(class\=)(\w+)/g;
var sourceText:String = "<div class=treeview>";
var replacedText:String = sourceText.replace(regExp, '$1"$2"');
trace(replacedText);
In a nutshell, this RegExp means:
Find 2 groups: (class=) and (any-word-after-it)
Add before and after the group 2 quotes.
You should try the following regex>
regex = /(<div[^>]*class=)(\S+)([^>]*>)/g;
sourceString.replace(regex, '$1"$2"$3');
Try using a general purpose markup repair tool such as John Cowan's TagSoup. This is likely to be much more robust than anything you attempt yourself (for example, most of the suggested regular expressions don't even check that the keyword=value construct is within a start tag).
I am trying to use logback's replace feature to not have empty values printed in my MDC log pattern.
http://logback.qos.ch/manual/layouts.html#replace
I am trying to follow an example from here
http://logogin.blogspot.com/2013/04/logback-mdc-and-empty-values.html
Some background
For 90% of the time my log pattern prints
2014-08-28 11:30:27,014 emp:Peter org:IT Expense submitted
For 5% of the time it prints
2014-08-28 11:30:27,014 emp: org: Cleanup jobs.
This is because the emp and org do not need to be supplied on MDC in the latter case.
For these cases, I want the emp: and org: to not be present at all in the log line.
Desired
2014-08-28 11:30:27,014 Cleanup jobs.
Possible solution with replace
Here is my variable and the appender I am using. The idea is that the mdcPattern will resolve to an empty string for no emp and org values.
<variable scope="context" name="mdcPattern" value="%replace( emp:%X{empName} org:%X{orgName} ) {'[a-z]+:( |$)', ''}"/>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d ${mdcPattern} %thread %-5level %logger{25} - %msg%n</pattern>
</encoder>
</appender>
However the replace regex does not work. I see log lines as:
2014-08-28 11:30:27,014 emp: org: {'[a-z]+:( |$
My regex is a bit weak. I can't seem to understand why the replace pattern is appearing as is in my log line. Any help is greatly appreciated.
With help from the author of the original blog post, I've managed to get this to work.
He has provided another example on GitHub.
http://logogin.blogspot.com/2013/04/logback-mdc-and-empty-values.html
https://gist.github.com/logogin/ff44c254f655340b653c
I had extra spaces in the replacement pattern which I've removed.
Eg: {orgName} ) {'[a-z]+
To: {orgName}){'[a-z]+
In my scenario UAS receives two Via headers. Using [last_Via:] it replies 183, 200 OK for subsequent PRACK but for 180 Ringing and 200 OK for the original INVITE it needs those two Via headers. How do I store them in a variable so that I can use here ?
Approach that I goggled:
<ereg regexp="[Vv][Ii][Aa][ ]*:[ ](.*)$" search_in="msg" check_it="true" assign_to="1"/>
$1= It has both the Via headers but also the rest of the message including SDP.
<nop>
<action>
<assignstr assign_to="1" value="[last_Via:]" />
</action>
</nop>
Otherwise, using your regex approach, you should be able to consume everything until the next CR LF characters with something like: "[^\r\n]*\r\n".
Do this right before your <send>:
<nop>
<action>
<assignstr assign_to="lvia" value="[last_Via:]" />
<ereg regexp="[Vv][Ii][Aa]: (.*), (.*)" search_in="var" variable="lvia" assign_to="5,6,7"/>
<exec command="echo Via1: [$5], via2: [$6], via3: [$7]"/>
</action>
</nop>
Then use the values store in variables 6 and 7.
This works for 2 Vias, you may need to adapt if you need to handle more.