Sed regex returning incorrect results [duplicate] - regex

This question already has answers here:
How to use sed/grep to extract text between two words?
(14 answers)
Closed 6 years ago.
This is returning incorrect results:
sed -regex -no print '/everythingbetween/,/thesephrases/p'
Cool. Now let me step through my sed command:
'/<literal string...="[collectionofcharacters(I also tried ^ at the beginning)]
.(anycharacter)+(greedy)?(lessgreedy, or until first instance of the following)
DBAppender(string literal)/,/ \/>(another string literal)/p'
My actual statement
sed -rn '/<appender-ref ref="[ET].+?DBAppender/,/\/>/p'
Result.
<appender-ref ref="ErrorDBAppender" />
<appender-ref ref="TracingDBAppender" />
<appender-ref ref="ErrorDBAppender" />
<appender-ref ref="RollingFileAppender" />
<appender-ref ref="ErrorDBAppender" />
<appender-ref ref="RollingFileAppender" />
<appender-ref ref="ErrorDBAppender" />
<appender-ref ref="TracingDBAppender" />
<appender-ref ref="ErrorDBAppender" />
<appender-ref ref="TracingDBAppender" />
<appender-ref ref="ErrorDBAppender" />
Please, someone tell my why "Rolling" is showing up when I'm even SPECIFYING the character after the " MUST either be a E or a T, NOT AN R!?
Edit: Please, if you're going to tell me how to do this with another tool, specify why it isn't working with sed first.

You are matching all the lines in a line range, that begins with pattern:
<appender-ref ref="[ET].+?DBAppender
and ends with pattern:
/>
I think your intention was to match the /> on the same line, but once the first pattern is matched it moves on to the next line and will match any line including the />.

Related

The problem of using regular expressions in the shell script

I have a regex that removes the content in the Activity tag. The regex is \s*<activity .*>(?:\s|\S)*<\/activity>. It is possible in Java, but it will not work when written in the shell. The wording in the shell is as follows:
sed 's+\s*<activity .*>(?:\s|\S)*<\/activity>++g' AndroidManifest.xml
AndroidManifest.xml
<?xml version="1.0" encoding="utf-8"?>
<!-- GENERATED BY UNITY. REMOVE THIS COMMENT TO PREVENT OVERWRITING WHEN EXPORTING AGAIN-->
<manifest xmlns:android="http://schemas.android.com/apk/res/android" package="com.unity3d.player" xmlns:tools="http://schemas.android.com/tools">
<application>
<activity android:name="com.unity3d.player.UnityPlayerActivity" android:theme="#style/UnityThemeSelector" android:screenOrientation="fullSensor" android:launchMode="singleTask" android:configChanges="mcc|mnc|locale|touchscreen|keyboard|keyboardHidden|navigation|orientation|screenLayout|uiMode|screenSize|smallestScreenSize|fontScale|layoutDirection|density" android:hardwareAccelerated="false">
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
<meta-data android:name="unityplayer.UnityActivity" android:value="true" />
<meta-data android:name="android.notch_support" android:value="true" />
</activity>
<meta-data android:name="unity.splash-mode" android:value="0" />
<meta-data android:name="unity.splash-enable" android:value="True" />
<meta-data android:name="notch.config" android:value="portrait|landscape" />
<meta-data android:name="unity.build-id" android:value="07a923ed-bdbd-46ed-98bd-afef17a7904a" />
</application>
<uses-feature android:glEsVersion="0x00030000" />
<uses-feature android:name="android.hardware.vulkan.version" android:required="false" />
<uses-feature android:name="android.hardware.touchscreen" android:required="false" />
<uses-feature android:name="android.hardware.touchscreen.multitouch" android:required="false" />
<uses-feature android:name="android.hardware.touchscreen.multitouch.distinct" android:required="false" />
</manifest>
What should I do. Thanks.
The syntax of regular expression are roughly classified in three
variants: BRE, ERE and PCRE. The latter has more features and power of
expression. Your regex is written in PCRE while sed supports up to ERE.
Another problem is that sed processes the input file line by line and
it requires some trick to make sed regex match across lines.
With sed please try the following:
sed -E '
:l # define a label "l"
N # append the next line of input into the pattern space
$!b l # repeat until the last line
# then whole lines are stored in the pattern space
s+[[:blank:]]*<activity .*>.*<\/activity>++g
# perform the replace command over the pattern space
' AndroidManifest.xml
The -E option enables ERE
It slurps the whole file at first then performs the replacement next.
BTW if perl is your option, you can apply your regex as is:
perl -0777 -pe 's+\s*<activity .*>(?:\s|\S)*<\/activity>++g' AndroidManifest.xml
There is one caveat regarding the (?:\s|\S)* expression. The quantifier *
is greedy and tries to match as long as possible. If the xml file contains multiple <activity> .. </activity>
tags, the entire block across the tags is removed including the intermediate lines which should
not be removed. It will be better to rewrite it as: (?:\s|\S)*? or
[\s\S]*? in a common manner.

Regex finds pattern [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
how can I find by using regular expression the "c start" pattern which is multiple times addressed. In other words, this should find only when this pattern is found more than once. Below is an example of an xml, which shows twice the "c start", so I would like to know what is the regex in order to find it?
<c start="11111" end="1111111" />
<c start="11111" end="222222222" />
</action>
<action
src="abc"
system="2222">
<param
name="trackID"
value="1"
valueType="data">
</param>
<param
name="trackName"
value="track"
valueType="data">
</param>
<c start="11111" end="1111111" />
<c start="11111" end="222222222" />
I may have the following xml which doesn't have more that once the pattern.
<c start="11111" end="1111111" />
</action>
<action
src="abc"
system="2222">
<param
name="trackID"
value="1"
valueType="data">
</param>
<param
name="trackName"
value="track"
valueType="data">
</param>
<c start="11111" end="1111111" />
Please, try this:
<c\s+start=\"(?<start>[^\"]+)\"\s+end=\"(?<end>[^\"]+)\"\s+\/>
This expression has two groups named as start and end to retrieve values from quotes.
But you can use <c\s+start=\"([^\"]+)\"\s+end=\"([^\"]+)\"\s+\/> if you don't need named groups.
You can check your regex here: https://regex101.com/

AWS CloudWatch log and disable log on server itself with springboot

In my springboot application, I configure to write logs to AWS CloudWatch, but the application also generates a log file log on the server itself in the folder /var/log/, now the log file is even larger than 19G
How can I disable the log in the server itself, and only write logs to CloudWatch?
The following is my current logback-spring.xml configuration. Any ideas will appreciate. Thanks in advance.
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<include resource="org/springframework/boot/logging/logback/base.xml" />
<springProperty scope="context" name="ACTIVE_PROFILE" source="spring.profiles.active" />
<property name="clientPattern" value="payment" />
<logger name="org.springframework">
<level value="INFO" />
</logger>
<logger name="com.payment">
<level value="INFO" />
</logger>
<logger name="org.springframework.ws.client.MessageTracing.sent">
<level value="TRACE" />
</logger>
<logger name="org.springframework.ws.client.MessageTracing.received">
<level value="TRACE" />
</logger>
<logger name="org.springframework.ws.server.MessageTracing">
<level value="TRACE" />
</logger>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<layout class="ch.qos.logback.classic.PatternLayout">
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [${HOSTNAME}:%thread] %-5level%replace([${clientPattern}] ){'\[\]\s',''}%logger{50}: %msg%n
</pattern>
</layout>
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>TRACE</level>
</filter>
</appender>
<springProfile name="local,dev">
<root level="INFO">
<appender-ref ref="CONSOLE" />
</root>
</springProfile>
<springProfile name="prod,uat">
<timestamp key="date" datePattern="yyyy-MM-dd" />
<appender name="AWS_SYSTEM_LOGS" class="com.payment.hybrid.log.CloudWatchLogsAppender">
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>TRACE</level>
</filter>
<layout>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [${HOSTNAME}:%thread] %-5level%replace([${clientPattern}] ){'\[\]\s',''}%logger{50}:
%msg%n
</pattern>
</layout>
<logGroupName>${ACTIVE_PROFILE}-hybrid-batch</logGroupName>
<logStreamName>HybridBatchLog-${date}</logStreamName>
<logRegionName>app-northeast</logRegionName>
</appender>
<appender name="ASYNC_AWS_SYSTEM_LOGS" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="AWS_SYSTEM_LOGS" />
</appender>
<root level="INFO">
<appender-ref ref="ASYNC_AWS_SYSTEM_LOGS" />
<appender-ref ref="CONSOLE" />
</root>
</springProfile>
</configuration>
The most likely fix is to remove this line:
<appender-ref ref="CONSOLE" />
I say "most likely" because this is just writing output to the console. Which means that there's something else that redirects the output to /var/log/whatever, probably in the startup script for your application.
It's also possible that the included default file, org/springframework/boot/logging/logback/base.xml, because this file defines a file appender. I don't know if the explicit <root> definition will completely override or simply update the included default, but unless you know you need the default I'd delete the <include> statement.
If you need to recover space from the existing logfile, you can truncate it:
sudo truncate -s 0 /var/log/WHATEVER
Deleting it is not the correct solution, because it won't actually be removed until the application explicitly closes it (which means restarting your server).
As one of the commenters suggested, you can use logrotate to prevent the on-disk file from getting too large.
But by far the most important thing you should do is read the Logback documentation.

QT XML reader reads the same tag everytime

I try to read an XML file and the reader reads it well, until it reads one specific tag (the close tag of Categories) and afterwards it read this tag infinite times.
This is the xml file:
<?xml version="1.0" encoding="utf-8"?>
<MovieMain MovieName="movie1" Version="1.29746.011215">
<FrameGroups FirstFrame="START" LastFrame="END">
<GroupFramesDescription>ALL MOVIE</GroupFramesDescription>
<frames Framenumber="1" >
<ObjectsGroup Name="1">
<LeftUpCorner X="30" Y="124" Z="0" />
<RightDownCorner X="53" Y="160" Z="0" />
<InfoAtt AttName="INDEX" AttInfo="1" />
<Categories>
<Category Name="computer" Probability="0.79" />
<Category Name="pen" Probability="0.7" />
<Category Name="desktop" Probability="0.1" />
<Category Name="mug" Probability="0.09" />
</categories>
</ObjectsGroup>
</frames>
</FrameGroups>
</MarkingChanges>
<ChangesList UserName="ooo" Date="12/3/2015" ChangesetIndex="1" />
</MarkingChanges>
</MovieMain>
And this is the function that I call to read the next element:
orXmlReader->readNextStartElement();
It gives me every time the next element till the close tag of Categories and than it read it again and again (I tried a loop of 100 times...).
I hope that you will help me as soon as you can,
Thanks.
Opening tag is <Categories> and closing is </categories> , i believe that search is case sensitive. Can you try with </Categories> as closing tag?

Regular Expression To find <jdoc:include type="component" />

I need to replace <jdoc:include type="component" /> in my docs with another string in PHP. So whitespace is important to me because the expression may be like all of these or more:
<jdoc:include type="component" />
<jdoc:include type="component" />
<jdoc:include type="component" xxxxx />
or more ...
<jdoc:include\s+type="component"\s+\/>
While playing with sed on bash, I did the following: created a file named test.php with all your strings inside it:
<jdoc:include type="component" />
<jdoc:include type="component" />
<jdoc:include type="component" xxxxx />
And then elaborated a single regex that matches all cases and replace them for <jdoc:include type="stackoverflow" />.
Enjoy:
sed -r 's/<jdoc:include\s+type=\"component\"\s+\w*\s*\/>/<jdoc:include type=\"stackoverflow\" \/>/' test.php
If you are wandering:
\s+ means match 1 or more whitespaces
\w* means match any character in the range 0 - 9, A - Z and a - z
This page is a great resource for regex.