Running regex from windows BATCH - regex

I'm trying to use regex running from a batch
<zed>(.*?)<zed>
to find values I've stored in a file
<process>34593845387<process>
<zed>M567<zed>
<encode>UTF16<encode>
I'm able to do that from java not from batch

You'll probably have to use something like powershell, or another tool. The basics of what you can get in batch won't be enough. Once you do, you'll probably want a regex like:
<zed>([^<]+)<
That way if later the format changes a bit from:
<zed>234<zed>
to
<zed>234< /zed>
or something, it will still work. It's happened to me before :)

Findstr can technically use regex, but it's limited to character sets and can't handle capturing.
If your data looks exactly like that (nothing to the left of <zed>), you can tokenize the string using < and > as delimiters and store the value of <zed> as the second token in the string.
for /F "tokens=1,2 delims=<>" %%A in (data.txt) do if "%%A"=="zed" set zed_value=%%B"
You can then access the variable with %zed_value%. If you have multiple <zed> fields, the variable will contain the value of the last one.

Why not use cscript to access use Javascript regex?
type data.txt | cscript //nologo match.js "<zed>(.*)<zed>"
Where match.js is defined as:
if (WScript.Arguments.Count() !== 1) {
WScript.Echo("Syntax: match.js regex");
WScript.Quit(1);
}
var rx = new RegExp(WScript.Arguments(0), "i");
var matched = false;
while (!WScript.StdIn.AtEndOfStream) {
var str = WScript.StdIn.ReadLine();
if (str.match(rx)) {
WScript.Echo(str);
matched = true;
}
}
if (!matched) {
WScript.Quit(1);
}

Related

Extract JSON from String using flutter dart

Hello I want to extract JSON from below input string.
I have tried bellow regex in java and it is working fine,
private static final Pattern shortcode_media = Pattern.compile("\"shortcode_media\":(\\{.+\\})");
I want in regex for dart.
Input String
<script type="text/javascript">window.__initialDataLoaded(window._sharedData);</script><script type="text/javascript">window.__additionalDataLoaded('/p/B9fphP5gBeG/',{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}});</script><script type="text/javascript">
<script type="text/javascript">window.__initialDataLoaded(window._newData);</script><script type="text/javascript">window._newData('/p/B9fphP5gBeG/',{"graphql":{"post":{"__typename":"id","id":"2260708142683789190","new_code":"B9fphP5gBeG"}}});</script><script type="text/javascript">
(function(){
function normalizeError(err) {
var errorInfo = err.error || {};
var getConfigProp = function(propName, defaultValueIfNotTruthy) {
var propValue = window._sharedData && window._sharedData[propName];
return propValue ? propValue : defaultValueIfNotTruthy;
};
return {}
}
)
Expected json
{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}}
Note: There are multiple json string in input string, i need json of shortcode_media tag
please use
void main() {
​
String json = '''
{"graphql":
{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}},
"abc":{"def":"test"}
}
''';
RegExp regExp = new RegExp(
"\"shortcode_media\":(\\{.+\\})",
caseSensitive: false,
multiLine: false,
);
print(regExp.stringMatch(json).toString());
}
output
"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}
Dartpad
The corresponding Dart RegExp would be:
static final RegExp shortcodeMedia = RegExp(r'"shortcode_media":(\{.+\})");
It does not work, though. JSON is not a regular language, so you can't parse it using regular expressions.
The value of "shortcode_media" in your example JSON ends with several } characters. The RegExp will stop the match at the third of those, even though the second } is the one matching the leading {. If your JSON text contains any further values after the shortcode_media entry, those might be included as well.
Stopping at the first } would also be too short.
If someone reorders the JSON source code to the equivalent
"shortcode_media":{"dimensions":{"height":1326,"width":1080},"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG"}
(that is, putting the "dimensions" entry first), then you would only capture until the end of the dimensions block.
I would recommend either using a proper JSON parser, or at least improving the RegExp to be able to handle a single nested JSON object - since you seem to already know that it will happen.
Such a RegExp could be:
RegExp(r'"shortcode_media":(\{(?:[^{}]*(?:\{.*?\})?)*?\})')
This RegExp will capture the correct number of braces for the example code, but still won't work if there are more nested JSON objects. Only a real parser can handle the general case correctly.

Google App Script findText regex not working for new line character

I'm trying to locate/modify text in my Google Document where the text has been broken across a full line break. My regular expression below works when I manually find text in the Google document (CTRL+F) and then search via the regular expression dialog. What is baffling is why the exact same regex doesn't work in the code below on full line breaks, i.e. "\n" (note: the soft line "\v" breaks are ok).
The second approach finds the text but I'm unable to do anything with it as I need the element object in-order to manipulate the text.
//Test document 1Q6v8ipqA81LoPtpk71NdqTaIEqMjki1KIJbrm0bILBg contains the following text:
//
//This Agreement shall not be assigned by either party without the prior\n
//written consent of the parties hereto
var doc = DocumentApp.openById('1Q6v8ipqA81LoPtpk71NdqTaIEqMjki1KIJbrm0bILBg');
//Method 1 - does NOT locate the text
var body = doc.getBody();
var pattern = "prior[\s]*written";
var foundElement = body.findText(pattern);
while (foundElement != null) {
var foundText = foundElement.getElement().asText();
var start = foundElement.getStartOffset();
var end = foundElement.getEndOffsetInclusive();
foundElement = body.findText(pattern, foundElement);
}
//Method 2 - locates the text, but I cannot acquire the element object
var body2 = doc.getBody().getText();
var pattern2 = /prior[\s]*written/;
while (m=pattern2.exec(body2))
{
Logger.log(m[0]);
}
}
If this were ever going to work, you would need the regex to be in s (single line) mode. Per https://developers.google.com/apps-script/reference/document/body#findtextsearchpattern,
A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.
So it looks like they have in fact chosen not to support multi-line matches in any way.

Apps Script findText() for Google Docs

I'm applying RegEx search to a Google Document text with some markdown code block ticks (```). Running the code below on my doc is returning a null result.
var codeBlockRegEx = '`{3}((?:.*?\s?)*?)`{3}'; // RegEx to find (lazily) all text between triple tick marks (/`/`/`), inclusive of whitespace such as carriage returns, tabs, newlines, etc.
var reWithCodeBlock = body.findText(codeBlockRegEx); // reWithCodeBlock evaluates to 'null'
I suspect that there's some element of regex in my code that is not supported by RE2, but the documentation has not shed light on this. Any ideas?
I received null as well- I was able to get the below to work using 3 ` surrounding the word test within a paragraph.
I did find this information:
findText method of objects of class Text in Apps Script, extending Google Docs. Documentation says “A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.” In particular, it does not support lookarounds.
function findXtext() {
var body = DocumentApp.getActiveDocument().getBody();
var foundElement = body.findText("`{3}(test)`{3}");
while (foundElement != null) {
// Get the text object from the element
var foundText = foundElement.getElement().asText();
// Where in the element is the found text?
var start = foundElement.getStartOffset();
var end = foundElement.getEndOffsetInclusive();
// Set Bold
foundText.setBold(start, end, true);
// Change the background color to yellow
foundText.setBackgroundColor(start, end, "#FCFC00");
// Find the next match
foundElement = body.findText("`{3}(test)`{3}", foundElement);
}
}

Matching and storing part of a string in a variable using JScript.NET

I am fiddling with some a script for Fiddler, which uses JScript.NET. I have a string of the format:
{"params":{"key1":"somevalue","key2":"someothervalue","key3":"whatevervalue", ...
I want to match and show "key2":"someothervalue" where someothervalue could be any value but the key is static.
Using good old sed and bash I can replace the part I am looking for with:
$ a='{"params":{"key1":"somevalue","key2":"someothervalue","key3":"whatevervalue", ...'
$ echo $a | sed -r 's/"key2":"[^"]+"/replaced/g'
{"params":{"key1":"somevalue",replaced,"key3":"whatevervalue", ...
Now. Instead of replacing it, I want to extract that part into a variable using JScript.NET. How can that be done?
The most graceful way is to use a JSON parser. My personal preference is to import IE's JSON parser using the htmlfile COM object.
import System;
var str:String = '{"params":{"key1":"foo","key2":"bar","key3":"baz"}}',
htmlfile = new ActiveXObject('htmlfile');
// force htmlfile COM object into IE9 compatibility
htmlfile.IHTMLDocument2_write('<meta http-equiv="x-ua-compatible" content="IE=9" />');
// clone JSON object and methods into familiar syntax
var JSON = htmlfile.parentWindow.JSON,
// deserialize your JSON-formatted string
obj = JSON.parse(str);
// access JSON values as members of a hierarchical object
Console.WriteLine("params.key2 = " + obj.params.key2);
// beautify the JSON
Console.WriteLine(JSON.stringify(obj, null, '\t'));
Compiling, linking, and running results in the following console output:
params.key2 = bar
{
"params": {
"key1": "foo",
"key2": "bar",
"key3": "baz"
}
}
Alternatively, there are also at least a couple of .NET namespaces which provide methods to serialize objects into a JSON string, and to deserialize a JSON string into objects. Can't say I'm a fan, though. The ECMAScript notation of JSON.parse() and JSON.stringify() are certainly a lot easier and profoundly less alien than whatever neckbeard madness is going on at Microsoft.
And while I certainly don't recommend scraping JSON (or any other hierarchical markup if it can be helped) as complicated text, JScript.NET will handle a lot of familiar Javascript methods and objects, including regex objects and regex replacements on strings.
sed syntax:
echo $a | sed -r 's/("key2"):"[^"]*"/\1:"replaced"/g'
JScript.NET syntax:
print(a.replace(/("key2"):"[^"]*"/, '$1:"replaced"'));
JScript.NET, just like JScript and JavaScript, also allows for calling a lambda function for the replacement.
print(
a.replace(
/"(key2)":"([^"]*)"/,
// $0 = full match; $1 = (key2); $2 = ([^"]*)
function($0, $1, $2):String {
var replace:String = $2.toUpperCase();
return '"$1":"' + replace + '"';
}
)
);
... Or to extract the value of key2 using the RegExp object's exec() method:
var extracted:String = /"key2":"([^"]*)"/.exec(a)[1];
print(extracted);
Just be careful with that, though, as retrieving element [1] of the result of exec() will cause an index-out-of-range exception if there is no match. Might either want to if (/"key2":/.test(a)) or add a try...catch. Or better yet, just do what I said earlier and deserialize your JSON into an object.

Pattern doesn't remove special characters which are by themselves on a website

So i am currently getting a user input in the form of a URL and parsing it and then printing the other pages that website links to. The package that i am using is:
LWP::Simple
I fetch the link using user input from command line and store it in a variable. I get it using the $ARGV[0].
Then i proceed to make another variable and use the $get on the variable where i store the website.
Then i proceeded to make an array variable and apply the regex on the variable
/\shref="?([^\s>"]+)/gi;
which stored the results of the get function being used on the variable containing the website string. And then i did a foreach loop on the array to print out the results.
However, while it does print links and stuff, it also end up printing just standalone special characters such as / and # if there is nothing after them.
So like if there is something like /blabalbla it prints that. but if there are just standalone special characters such as /, \, or #, it also prints them. Any way i can modify the regex so that if the special characters don't follow a string, they should not print. New at learning perl and not so talented at regex
I can't help you with your specific problem without further information, but in the mean time I suggest that you look at HTML::LinkExtor which was written for this purpose.
Here's an example code its output. It lists only <a> elements that have an href attribute.
use strict;
use warnings;
use 5.010;
use LWP;
use HTML::LinkExtor;
my $ua = LWP::UserAgent->new;
my $resp = $ua->get('http://www.bbc.co.uk/');
my $extor = HTML::LinkExtor->new(undef, $resp->base);
$extor->parse($resp->decoded_content);
for my $link ($extor->links) {
my ($tag, %attr) = #$link;
next unless $tag eq 'a' and $attr{href};
say $attr{href};
}
output
http://m.bbc.co.uk
http://www.bbc.co.uk/
http://www.bbc.co.uk/#h4discoveryzone
http://www.bbc.co.uk/accessibility/
https://ssl.bbc.co.uk/id/status
http://www.bbc.co.uk/news/
http://www.bbc.com/news/
http://www.bbc.co.uk/sport/
http://www.bbc.co.uk/weather/
http://shop.bbc.com/
http://www.bbc.com/earth/
http://www.bbc.com/travel/
http://www.bbc.com/capital/
http://www.bbc.co.uk/iplayer/
http://www.bbc.com/culture/
http://www.bbc.com/autos/
http://www.bbc.com/future/
http://www.bbc.co.uk/tv/
http://www.bbc.co.uk/radio/
http://www.bbc.co.uk/cbbc/
http://www.bbc.co.uk/cbeebies/
http://www.bbc.co.uk/arts/
http://www.bbc.co.uk/ww1/
http://www.bbc.co.uk/food/
http://www.bbc.co.uk/history/
http://www.bbc.co.uk/learning/
http://www.bbc.co.uk/music/
http://www.bbc.co.uk/science/
http://www.bbc.co.uk/nature/
http://www.bbc.com/earth/
http://www.bbc.co.uk/local/
http://www.bbc.co.uk/travel/
http://www.bbc.co.uk/a-z/
http://www.bbc.co.uk/#orb-footer
http://search.bbc.co.uk/search
http://www.bbc.co.uk/privacy/cookies/managing/cookie-settings.html
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F
http://www.bbc.co.uk/#
http://www.bbc.co.uk/#
http://www.bbc.co.uk/weather/2643743?day=0
http://www.bbc.co.uk/weather/2643743?day=0
http://www.bbc.co.uk/weather/2643743?day=1
http://www.bbc.co.uk/weather/2643743?day=1
http://www.bbc.co.uk/weather/2643743?day=2
http://www.bbc.co.uk/weather/2643743?day=2
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F
http://www.bbc.co.uk/weather/2643743
http://www.bbc.co.uk/news/science-environment-30311816
http://www.bbc.co.uk/news/science-environment-30311822
http://www.bbc.co.uk/news/science-environment-30311818
http://www.bbc.co.uk/news/magazine-30282261
http://www.bbc.co.uk/news/science-environment-30311816
http://www.bbc.co.uk/news/uk-politics-30291460
http://www.bbc.co.uk/news/
http://www.bbc.co.uk/news/uk-england-kent-30319549
http://www.bbc.co.uk/news/world-europe-30306106
http://www.bbc.co.uk/news/world-europe-30306992
http://www.bbc.co.uk/news/uk-30306145
http://www.bbc.co.uk/news/local/
http://www.bbc.co.uk/news/england/london/
http://www.bbc.co.uk/news/uk-england-london-30308694
http://www.bbc.co.uk/news/uk-england-london-30315650
http://www.bbc.co.uk/news/uk-england-london-30321504
http://www.bbc.co.uk/sport/live/football/29959148
http://www.bbc.co.uk/sport/0/
http://www.bbc.co.uk/sport/live/snooker/29618359
http://www.bbc.co.uk/sport/football/30204433
http://www.bbc.co.uk/sport/cricket/30308980
http://www.bbc.co.uk/sport/football/30204434
http://www.bbc.co.uk/sport/0/football/
http://www.bbc.co.uk/sport/football/30204459
http://www.bbc.co.uk/sport/football/30204511
http://www.bbc.co.uk/sport/football/28647040
http://www.bbc.co.uk/?dzf=sport
http://www.bbc.co.uk/?dzf=entertainment
http://www.bbc.co.uk/?dzf=bbcnow
http://www.bbc.co.uk/?dzf=entertainment
http://www.bbc.co.uk/?dzf=news
http://www.bbc.co.uk/?dzf=lifestyle
http://www.bbc.co.uk/?dzf=knowledge
http://www.bbc.co.uk/?dzf=sport
http://www.bbc.co.uk/news/
http://www.bbc.com/news/
http://www.bbc.co.uk/sport/
http://www.bbc.co.uk/weather/
http://shop.bbc.com/
http://www.bbc.com/earth/
http://www.bbc.com/travel/
http://www.bbc.com/capital/
http://www.bbc.co.uk/iplayer/
http://www.bbc.com/culture/
http://www.bbc.com/autos/
http://www.bbc.com/future/
http://www.bbc.co.uk/tv/
http://www.bbc.co.uk/radio/
http://www.bbc.co.uk/cbbc/
http://www.bbc.co.uk/cbeebies/
http://www.bbc.co.uk/arts/
http://www.bbc.co.uk/ww1/
http://www.bbc.co.uk/food/
http://www.bbc.co.uk/history/
http://www.bbc.co.uk/learning/
http://www.bbc.co.uk/music/
http://www.bbc.co.uk/science/
http://www.bbc.co.uk/nature/
http://www.bbc.com/earth/
http://www.bbc.co.uk/local/
http://www.bbc.co.uk/travel/
http://www.bbc.co.uk/a-z/
http://www.bbc.co.uk/
http://www.bbc.co.uk/terms/
http://www.bbc.co.uk/aboutthebbc/
http://www.bbc.co.uk/privacy/
http://www.bbc.co.uk/privacy/cookies/about
http://www.bbc.co.uk/accessibility/
http://www.bbc.co.uk/guidance/
http://www.bbc.co.uk/contact/
http://www.bbc.co.uk/bbctrust/
http://www.bbc.co.uk/complaints/
http://www.bbc.co.uk/help/web/links/