Autohotkey extract text using regex - regex

I am just now learning regex using autohotkey but can't figure out how to extract specific string and save to a variable?
Line of text I am searching:
T NW CO NORWALK HUB NW 201-DS3-WLFRCTAICM5-NRWLCT02K16 [DS3 LEC] -1 -1 PSTN
I am trying to save, NW 201-DS3-WLFRCTAICM5-NRWLCT02K16 [DS3 LEC] ONLY.
Here is my regex code:
NW\D\d.DS3.]
But how do I store that as a variable in autohotkey?
I have tried RegexMatch but that only shows the position. I am doing something wrong.

You may provide the third argument that will hold the match array:
RegExMatch(str,"NW\D\d.*DS3.*\]",matches)
Then, matches[0] will contain the match.
If you use capturing groups inside the pattern, you will be able to access their values by using further indices. If you use "NW\D(\d.*DS3.*)\]" against "NW 5xxx DS3 yyy], you will have the whole string inside matches[0] and matches[1] will hold 5xxx DS3 yyy.
See AHK RegExMatch docs:
FoundPos := RegExMatch(Haystack, NeedleRegEx [, UnquotedOutputVar = "", StartingPosition = 1])
UnquotedOutputVar
Mode 1 (default): OutputVar is the unquoted name of a variable in which to store the part of Haystack that matched the entire pattern. If the pattern is not found (that is, if the function returns 0), this variable and all array elements below are made blank.
If any capturing subpatterns are present inside NeedleRegEx, their matches are stored in a pseudo-array whose base name is OutputVar. For example, if the variable's name is Match, the substring that matches the first subpattern would be stored in Match1, the second would be stored in Match2, and so on. The exception to this is named subpatterns: they are stored by name instead of number. For example, the substring that matches the named subpattern "(?P<Year>\d{4})" would be stored in MatchYear. If a particular subpattern does not match anything (or if the function returns zero), the corresponding variable is made blank.

; If you want to delete ALL ....
Only(ByRef C)
{
/*
RegExReplace
https://autohotkey.com/docs/commands/RegExReplace.htm
*/
; NW 201-DS3-WLFRCTAICM5-NRWLCT02K16 [DS3 LEC]
C:=RegExReplace(C, "NW\s[\w-]+\s\[[\w\s]+\]","",ReplacementCount,-1)
if (ReplacementCount = 0)
return C
else
return Only(C)
} ; Only(ByRef C)
string:="Line of text I am searching: T NW CO NORWALK HUB NW 201-DS3-WLFRCTAICM5-NRWLCT02K16 [DS3 LEC] -1 -1 PSTN"
Result:=Only(string)
MsgBox, % Result
MsgBox, % Only(string)

Related

Get each value after specific words with Regex

I have the below string and I am trying to get every value after ID and Display Name. I have tried to create a lookup but I could not get it to work and it only grabs the first value while I want to grab all of them.
This was my code to grab the value after DisplayName
(?<=\bDisplayName\\\"\=\>\\\")(\w+)
When I tried it, it grabs the first value, but only if it is alphabet while most of my text is a mix of Japanese Kanji, Katakana, Hiragana and special characters such as ・.
"{\"Ancestor\"=>{\"Ancestor\"=>{\"Ancestor\"=>{\"Ancestor\"=>{\"ContextFreeName\"=>\"本\", \"DisplayName\"=>\"本\", \"Id\"=>\"465392\"}, \"ContextFreeName\"=>\"本\", \"DisplayName\"=>\"ジャンル別\", \"Id\"=>\"465610\"}, \"ContextFreeName\"=>\"ビジネス・経済\", \"DisplayName\"=>\"ビジネス・経済\", \"Id\"=>\"466282\"}, \"ContextFreeName\"=>\"経営学・キャリア・MBA\", \"DisplayName\"=>\"経営学・キャリア・MBA\", \"Id\"=>\"492076\"}, \"ContextFreeName\"=>\"経営学・キャリア・MBAの起業・開業\", \"DisplayName\"=>\"起業・開業\", \"Id\"=>\"492058\", \"IsRoot\"=>false}"
What I want to achieve from the above string is the following:
Grab each string after DisplayName
ex.
"DisplayName"=>"本" grab 本
"DisplayName"=>"経営学・キャリア・MBA" grab 経営学・キャリア・MBA
Grab each integer after Id
ex.
"Id"=>"465392" grab 465392
"Id"=>"4920588" grab 4920588
Is it possible to do this in Regex or should I look for something else than Regex?
You can use capturing groups like in
"DisplayName"=>"([^"]*)"
"Id"=>"(\d+)
Details:
"DisplayName"=>"([^"]*)" - "DisplayName"=>" is matched first, then one or more chars other than " are captured into Group 1.
"Id"=>"(\d+) - "Id"=>" is matched first, then one or more digits are captured into Group 1.
See the Python demo:
import re
s = "{\"Ancestor\"=>{\"Ancestor\"=>{\"Ancestor\"=>{\"Ancestor\"=>{\"ContextFreeName\"=>\"本\", \"DisplayName\"=>\"本\", \"Id\"=>\"465392\"}, \"ContextFreeName\"=>\"本\", \"DisplayName\"=>\"ジャンル別\", \"Id\"=>\"465610\"}, \"ContextFreeName\"=>\"ビジネス・経済\", \"DisplayName\"=>\"ビジネス・経済\", \"Id\"=>\"466282\"}, \"ContextFreeName\"=>\"経営学・キャリア・MBA\", \"DisplayName\"=>\"経営学・キャリア・MBA\", \"Id\"=>\"492076\"}, \"ContextFreeName\"=>\"経営学・キャリア・MBAの起業・開業\", \"DisplayName\"=>\"起業・開業\", \"Id\"=>\"492058\", \"IsRoot\"=>false}"
print(re.findall(r'"DisplayName"=>"([^"]*)"', s))
# => ['本', 'ジャンル別', 'ビジネス・経済', '経営学・キャリア・MBA', '起業・開業']
print(re.findall(r'"Id"=>"(\d+)', s))
# => ['465392', '465610', '466282', '492076', '492058']

filter the text file using multiple regex in ruby

I have the text file contains the below text which I need to filter based on condition.
CODE=0xea00e60c
CODE=0xea00e60d
OUTPUT="HW Address: 91183010\n,HWType:00000030\n"
CODE=0xea00e60e
CODE=0xea01ff00
If the line starts with CODE, extract everything after 0x(e.g ea00e60c) from 1st line and paste in xyz file.
If the line starts with OUTPUT, extract out everything under double quotes and paste in xyz files. Sequence of extracting and putting the text in XYZ file should be maintained.
def filter_logs(filename)
postcode = "postcode_logs"
File.open(filename, 'r').each do |line|
result = (line.scan(/"(.*?)"/)) || (line.split("x")[1])
File.open(postcode, 'a') do |selected_line|
selected_line.puts(result)
end
end
end
filename and postcode is file defined already.
There is no error in code but output is also not there.
**Expected output**
ea00e60c
ea00e60d
HW Address: 91183010\n,HWType:00000030\n
ea00e60e
ea01ff00
**current output**
HW Address: 91183010\n,HWType:00000030\n
The reason this doesn't succeed is because #scan always succeeds. If nothing is found an empty array is returned (which evaluates as truthy). Simply getting the first result should be good enough (returning nil for empty arrays):
result = line.scan(/"(.*?)"/).first || line.split("x")[1]
Although you could also use other techniques like:
result = line[/\ACODE=0x(\h*)/, 1]
result ||= line[/\AOUTPUT="([^"]*)"/, 1]
Matching from the start of the string either CODE=0x followed by zero or more hexadecimal characters (\h*) capturing them in group 1 or OUTPUT=" followed by zero or more non-quote characters ([^"]*) capturing them in group 1 followed by a ".
Check out the regular expression documentation for Ruby if anything is unclear about the regex. Check out the documentation of the square bracket accessor of String if anything is unclear about the square bracket method usage.

Find group of strings starting and ending by a character using regular expression

I have a string, and I want to extract, using regular expressions, groups of characters that are between the character : and the other character /.
typically, here is a string example I'm getting:
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
and so, I want to retrieved, 45.72643,4.91203 and also hereanotherdata
As they are both between characters : and /.
I tried with this syntax in a easier string where there is only 1 time the pattern,
[tt]=regexp(str,':(\w.*)/','match')
tt = ':45.72643,4.91203/'
but it works only if the pattern happens once. If I use it in string containing multiples times the pattern, I get all the string between the first : and the last /.
How can I mention that the pattern will occur multiple time, and how can I retrieve it?
Use lookaround and a lazy quantifier:
regexp(str, '(?<=:).+?(?=/)', 'match')
Example (Matlab R2016b):
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = regexp(str, '(?<=:).+?(?=/)', 'match')
result =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
In most languages this is hard to do with a single regexp. Ultimately you'll only ever get back the one string, and you want to get back multiple strings.
I've never used Matlab, so it may be possible in that language, but based on other languages, this is how I'd approach it...
I can't give you the exact code, but a search indicates that in Matlab there is a function called strsplit, example...
C = strsplit(data,':')
That should will break your original string up into an array of strings, using the ":" as the break point. You can then ignore the first array index (as it contains text before a ":"), loop the rest of the array and regexp to extract everything that comes before a "/".
So for instance...
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
Breaks down into an array with parts...
1 - 'abcd'
2 - '45.72643,4.91203/Rou'
3 - 'hereanotherdata/defgh'
Then Ignore 1, and extract everything before the "/" in 2 and 3.
As John Mawer and Adriaan mentioned, strsplit is a good place to start with. You can use it for both ':' and '/', but then you will not be able to determine where each of them started. If you do it with strsplit twice, you can know where the ':' starts :
A='abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
B=cellfun(#(x) strsplit(x,'/'),strsplit(A,':'),'uniformoutput',0);
Now B has cells that start with ':', and has two cells in each cell that contain '/' also. You can extract it with checking where B has more than one cell, and take the first of each of them:
C=cellfun(#(x) x{1},B(cellfun('length',B)>1),'uniformoutput',0)
C =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
Starting in 16b you can use extractBetween:
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = extractBetween(str,':','/')
result =
2×1 cell array
{'45.72643,4.91203'}
{'hereanotherdata' }
If all your text elements have the same number of delimiters this can be vectorized too.

postgresql: How to concatenate two regexp_matches()

I'm trying to extract both ints and chars from names such as 123A America, 234B Britania.
I only want the the number and the attached letter (i.e. 123A) .
I'm using regexp_matches(name, '(\d+)(\D)') and it results as:
{123,A},
{456,B}
I thought using concatenation, getting the first element of an array and the second element using two different functions
(regexp_matches(name, '(\d+)(\D)' )) [1] || (regexp_matches(name, '(\d+)(\D)' )) [2]
But it generates an error:
ERROR: functions and operators can take at most one set argument
How can I get the two element as one string?
You don't have to get the two items you're searching for as different sets, just get them as a single set. Remove the )( between \d+ and \D and that will return a set containing the entire string you're looking for.
Results in this -
regexp_matches('123A America, 234B Britania', '(\d+\D)' )
This will only find the first match. To get all matching substrings, use the g flag -
regexp_matches('123A America, 234B Britania', '(\d+\D)', 'g')
good answer by #Scott S however if you can't achieve what you need within one capture group the solution is to write a function, assign the regexp result to a variable and then use it.
CREATE OR REPLACE FUNCTION do_something(_input character varying)
RETURNS character varying AS
$BODY$
DECLARE
matches text[];
BEGIN
matches := regexp_matches(_input, '^([0-9]{1,}_[^_]{1,})_[a-z]{1,}(.*)$','i');
return substring(matches[1], 0, 24)||matches[2];
END
$BODY$
LANGUAGE plpgsql;

Replace using RegEx outside of text markers

I have the following sample text and I want to replace '[core].' with something else but I only want to replace it when it is not between text markers ' (SQL):
PRINT 'The result of [core].[dbo].[FunctionX]' + [core].[dbo].[FunctionX] + '.'
EXECUTE [core].[dbo].[FunctionX]
The Result shoud be:
PRINT 'The result of [core].[dbo].[FunctionX]' + [extended].[dbo].[FunctionX] + '.'
EXECUTE [extended].[dbo].[FunctionX]
I hope someone can understand this. Can this be solved by a regular expression?
With RegLove
Kevin
Not in a single step, and not in an ordinary text editor. If your SQL is syntactically valid, you can do something like this:
First, you remove every string from the SQL and replace with placeholders. Then you do your replace of [core] with something else. Then you restore the text in the placeholders from step one:
Find all occurrences of '(?:''|[^'])+' with 'n', where n is an index number (the number of the match). Store the matches in an array with the same number as n. This will remove all SQL strings from the input and exchange them for harmless replacements without invalidating the SQL itself.
Do your replace of [core]. No regex required, normal search-and-replace is enough here.
Iterate the array, replacing the placeholder '1' with the first array item, '2' with the second, up to n. Now you have restored the original strings.
The regex, explained:
' # a single quote
(?: # begin non-capturing group
''|[^'] # either two single quotes, or anything but a single quote
)+ # end group, repeat at least once
' # a single quote
JavaScript this would look something like this:
var sql = 'your long SQL code';
var str = [];
// step 1 - remove everything that looks like an SQL string
var newSql = sql.replace(/'(?:''|[^'])+'/g, function(m) {
str.push(m);
return "'"+(str.length-1)+"'";
});
// step 2 - actual replacement (JavaScript replace is regex-only)
newSql = newSql.replace(/\[core\]/g, "[new-core]");
// step 3 - restore all original strings
for (var i=0; i<str.length; i++){
newSql = newSql.replace("'"+i+"'", str[i]);
}
// done.
Here is a solution (javascript):
str.replace(/('[^']*'.*)*\[core\]/g, "$1[extended]");
See it in action