how to remove some strings in emeditor(Regex ) - regex

i want to remove some string and save other part of string that i need from a file with emeditor ..
file line like :
{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"user_flags\":2143,\"id\":702212125,\"access_hash\":\"914250561826\",\"first_name\":\"david\",\"last_name\":\"jones\",\"username\":\"david_d192\",\"phone\":\"051863329875\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"051863329875","version":"3","type":"unknown","token":"1556189892619764206","p_id":702212125,"username":"david_d192","type":"redis","user_flags":2143,"host":"win",from":"contacts"}
{"index": {"_type": "_doc", "_id": "36GG54F"}}
{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"user_flags\":2143,\"id\":702212125,\"access_hash\":\"914250561826\",\"first_name\":\"david\",\"last_name\":\"jones\",\"username\":\"david_d192\",\"phone\":\"051863329875\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"051863329875","version":"3","type":"unknown","token":"1556189892619764206","p_id":702212125,"username":"david_d192","type":"redis","user_flags":2143,"host":"win",from":"contacts"}
{"index": {"_type": "_doc", "_id": "36GG54F"}}
{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"user_flags\":2143,\"id\":702212125,\"access_hash\":\"914250561826\",\"first_name\":\"david\",\"last_name\":\"jones\",\"phone\":\"051863329875\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"051863329875","version":"3","type":"unknown","token":"1556189892619764206","p_id":702212125,"type":"redis","user_flags":2143,"host":"win",from":"contacts"}
{"index": {"_type": "_doc", "_id": "36GG54F"}}
i want to save id, first_name , last_name , phone , username(if exist) in every line =>
id:702212125 first_name:david last_name:jones phone:051863329875 username:david_d192,
id:702212125 first_name:david last_name:jones phone:051863329875 username:david_d192,
id:702212125 first_name:david last_name:jones phone:051863329875,
how i can do this ?
thanks

JSON parsing is the optimal way to do this (https://linuxconfig.org/how-to-parse-data-from-json-into-python). But you can make life harder and use regex (here presented in PCRE (PHP) flavor):
Get all id's:
(?<=id\":\s\")(\w+)(?=\")
See example:
https://regex101.com/r/g5vfEd/1
Get all first names:
(?<=first_name\\\":\\\")(\w)+(?=\\)
See example:
https://regex101.com/r/g5vfEd/2
Get all last names:
(?<=last_name\\\":\\\")(\w)+(?=\\)
See example:
https://regex101.com/r/g5vfEd/3
Get all phone numbers:
(?<=phone\\\":\\\")(\w)+(?=\\)
See example:
https://regex101.com/r/g5vfEd/4
Get all user names if they exist:
(?<=username\\\":\\\")(\w)+(?=\\)
See example:
https://regex101.com/r/g5vfEd/5
complete pattern to match everything:
id\\?\":\s?\"?(\w+),?[\\\"].*first_name\\\":\\"(\w+).*last_name\\\":\\\"(\w+).*phone\":\"(\d+).*(?=username)?\":\"(\w+).*
Returns 3 matches, each with the following 5 groups (here match 1 is shown):
Group 1. 85-94 702212125
Group 2. 145-150 david
Group 3. 169-174 jones
Group 4. 285-297 051863329875
Group 5. 454-462 contacts
See link: https://regex101.com/r/g5vfEd/6

As you've tagged regex and Emeditor you can try this.
Emeditor version 19.1 onwards supports regex named groups like this:
(?<id>expression)
and named backreference by using this form:
\k<id>
So steps:
Find and Replace (Ctrl-H). Tick "Match Case" and select "Regular Expressions".
Find:
\\"id\\"[\\":]*(?<id>[^\\":,]*).*?\\"first_name\\"[\\":]*(?<first_name>[^\\":,]*).*?\\"last_name\\"[\\":]*(?<last_name>[^\\":,]*).*?\\"phone\\"[\\":]*(?<phone>[^\\":,]*)(.*?"username"[\\":]*(?<username>[^\\":,]*))?
Replace with:
id:\k<id>\tfirst_name:\k<first_name>\tlast_name:\k<last_name>\tphone:\k<phone>\tusername:\k<username>
Click the down Arrow next to the Extract button and select "To New Document"
Click the Extract button to output to a new tab delimited file.

Related

Filter with REGEXMATCH in Google sheet to filter out containing text in cells

Right now I have these data and I'm trying to filter out the data containing in cell C3, C4, etc.
I have no problem filtering the regexmatch data for 1 cell as shown below
but I'm unable to do regexmatch for more than 2 cells like so for example, it seems like I'm unable to make the pipework between cells as I'll get parse error, I tried adding in "C3|C4" too.
and
The wanted output that I wanted is as below but I could only hardcode the containing text in which isn't what I'm looking for. I'm hoping that I could have some tips to regexmatch the text in more than 1 cell such that it could regexmatch the text in cell C3(Apple) and C4(Pear) and show the wanted output.
you need to use TEXTJOIN for dynamic list in C column:
=IF(TEXTJOIN( , 1, C3:C)<>"", FILTER(A2:A, REGEXMATCH(LOWER(A2:A),
TEXTJOIN("|", 1, LOWER(C3:C)))), "no input")
You may use
=IF(C3<>"", FILTER(A2:A,REGEXMATCH(A2:A, TEXTJOIN("|", TRUE, C3:C4) )), "no input")
Or, you may go a step further and match Apple or Pear as whole words using \b word boundaries and a grouping construct around the alternatives:
=IF(C3<>"", FILTER(A2:A,REGEXMATCH(A2:A, "\b(?:" & TEXTJOIN("|", TRUE, C3:C4) & ")\b")), "no input")
And if you need to make the search case insensitive, just append (?i) at the start:
=IF(C3<>"", FILTER(A2:A,REGEXMATCH(A2:A, "(?i)\b(?:" & TEXTJOIN("|", TRUE, C3:C4) & ")\b")), "no input")
See what the TEXTJOIN documentation says:
Combines the text from multiple strings and/or arrays, with a specifiable delimiter separating the different texts.
So, when you pass TRUE as the second argument, you do not have to worry if the range contains empty cells, and the regex won't be ruined by extraneous |||.
Test:

Find number and replace + 1

I have a large file with a list of objects that have an incrementing page # ie
[
{page: 1},
{page: 2},
{page: 3}
]
I can find each instance of page: # with page: (\d) in vscode's ctrl+f finder. How would I replace each of these numbers with # + 1?
It can be done rather easily in vscode using one of emmet's built-in commands:
Emmet: Increment by 1
Use your regex to find all the page: \d+ in your file.
Ctrl-Shift-L to select all those occurrences.
Trigger the Emmet: Increment by 1 command.
Here is a demo:
It's not possible to perform arithmetic with regex. I use LINQPad to execute these small kind of scripts. An example of how I would do it is in the c# program below.
void Main()
{
var basePath = #"C:\";
// Get all files with extension .cs in the directory and all its subdirectories.
foreach (var filePath in Directory.GetFiles(basePath, "*.cs", SearchOption.AllDirectories))
{
// Read the content of the file.
var fileContent = File.ReadAllText(filePath);
// Replace the content by using a named capture group.
// The named capture group allows one to work with only a part of the regex match.
var replacedContent = Regex.Replace(fileContent, #"page: (?<number>[0-9]+)", match => $"page: {int.Parse(match.Groups["number"].Value) + 1}");
// Write the replaced content back to the file.
File.WriteAllText(filePath, replacedContent);
}
}
I also took the liberty of changing your regex to the one below.
page: (?<number>[0-9]+)
page: matches with "page: " literally.
(?<number> is the start of a named capture group called number. We can then use this group during replacement.
[0-9]+ matches a number between 0 and 9 one to infinite times. This is more specific than using \d as \d also matches other number characters.
The + makes it match more than on digit allowing for the number 10 and onwards.
) is the end of a named capture group.
You could do that in Ruby as follows.
FileIn = "in"
FileOut = "out"
File let's construct a sample file (containing 37 characters).
File.write FileIn, "[\n{page: 1},\n{page: 2},\n{page: 33}\n]\n"
#=> 37
We may now read the input file FileIn, convert it and write it to a new file FileOut.
File.write(FileOut, File.read(FileIn).
gsub(/\{page: (\d+)\}/) { "{page: #{$1.next}}" })
Let's look at what's be written.
puts File.read(FileOut)
[
{page: 2},
{page: 3},
{page: 34}
]
I've gulped the entire file, made the changes in memory and spit out the modified file. If the original file were large this could be easily modified to read from and write to the files line-by-line.
Adding another answer as it is significantly different than the other. I wrote an extension Find and Transform which makes it easy to do math in a find in a file.
In this case with this keybinding (in your keybindings.json file):
{
"key": "alt+r", // whatever keybinding you want
"command": "findInCurrentFile",
"args": {
"find": "page: (\\d)",
"replace": "page: $${ return $1 + 1 }$$",
"isRegex": true
}
[That could also be a setting in your settings.json file if you wish with slightly different syntax of course.]
The $${ return $1 + 1 }$$ represents a javascript operation. Here 1 will be added to capture group 1 from the find regex.
Within the $${ ... }$$ almost any javascript operation can be inserted. There are many examples in the repo.

How to split a string in db2?

I've some URL's in my cas_fnd_dwd_det table,
casi_imp_urls cas_code
----------------------------------- -----------
www.casiac.net/fnds/CASI/qnxp.pdf
www.casiac.net/fnds/casi/as.pdf
www.casiac.net/fnds/casi/vindq.pdf
www.casiac.net/fnds/CASI/mnip.pdf
how do i copy the letters between last '/' and '.pdf' to another column
expected outcome
casi_imp_urls cas_code
----------------------------------- -----------
www.casiac.net/fnds/CASI/qnxp.pdf qnxp
www.casiac.net/fnds/casi/as.pdf as
www.casiac.net/fnds/casi/vindq.pdf vindq
www.casiac.net/fnds/CASI/mnip.pdf mnip
the below URL's are static
www.casiac.net/fnds/CASI/
www.casiac.net/fnds/casi/
Advise, how do i select the codes between last '/' and '.pdf' ?
I would recommend to take a look at REGEXP_SUBSTR. It allows to apply a regular expression. Db2 has string processing functions, but the regex function may be the easiest solution. See SO question on regex and URI parts for different ways of writing the expression. The following would return the last slash, filename and the extension:
SELECT REGEXP_SUBSTR('http://fobar.com/one/two/abc.pdf','\/(\w)*.pdf' ,1,1)
FROM sysibm.sysdummy1
/abc.pdf
The following uses REPLACE and the pattern is from this SO question with the pdf file extension added. It splits the string in three groups: everything up to the last slash, then the file name, then the ".pdf". The '$1' returns the group 1 (groups start with 0). Group 2 would be the ".pdf".
SELECT REGEXP_REPLACE('http://fobar.com/one/two/abc.pdf','(?:.+\/)(.+)(.pdf)','$1' ,1,1)
FROM sysibm.sysdummy1
abc
You could apply LENGTH and SUBSTR to extract the relevant part or try to build that into the regex.
For older Db2 versions than 11.1. Not sure if it works for 9.5, but definitely should work since 9.7.
Try this as is.
with cas_fnd_dwd_det (casi_imp_urls) as (values
'www.casiac.net/fnds/CASI/qnxp.pdf'
, 'www.casiac.net/fnds/casi/as.pdf'
, 'www.casiac.net/fnds/casi/vindq.pdf'
, 'www.casiac.net/fnds/CASI/mnip.PDF'
)
select
casi_imp_urls
, xmlcast(xmlquery('fn:replace($s, ".*/(.*)\.pdf", "$1", "i")' passing casi_imp_urls as "s") as varchar(50)) cas_code
from cas_fnd_dwd_det

Remove leading 0 in String with letters and digits

I have a comma separated file where I need to change the first column removing leading zeroes in string. Text file is as below
ABC-0001,ab,0001
ABC-0010,bc,0010
I need to get the data as under
ABC-1,ab,0001
ABC-10,bc,0010
I can do a command line replace which i tried as below:
sed 's/ABC-0*[1-9]/ABC-[1-9]/g' file
I ended up getting output:
ABC-[1-9],ab,0001
ABC-[1-9]0,ac,0010
Can you please tell me what I am missing in here.
Alternately I also tried to apply formatting in the SQL that generates this file as below:
select regexp_replace(key,'((0+)|1-9|0+)','(1-9|0+)') from file where key in ('ABC-0001','ABC-0010')
which gives output as
ABC-(1-9|0+)1
ABC-(1-9|0+)1(1-9|0+)
Help on either of solution will be very helpful!
Try this :
sed -E 's/ABC-0*([1-9])/ABC-\1/g' file
------ --
| |
capturing group |
captured group
To do it in the query using Oracle, where the key value with the zeroes you want to remove is in a column called "key" in a table called "file", would look like this:
select regexp_replace(key, '(-)(0+)(.*)', '\1\3')
from file;
You need to capture the dash as it is "consumed" by the regex as it is matched. Followed by the second group of one or more 0's, followed by the rest of the field. Replace with captured groups 1 and 3, leaving the 0's (if any) between out.

REGEXP_EXTRACT () every word except ‘,’ in a field

I’d like to select country except ‘,’ from a data field which looks like this
Japan,Singapore,Italy,France
and my Code looks like this REGEXP_EXTRACT(country,'([^,]*)'), unfortunately, it works but only the country at the first was selected. How can I code it to select it all?
I slightly changed the RegEx to ([^,]+) to make the country name at least one digit. Using * creates empty matches so that every other match contains the country name. (Example)
Take a look at the fixed example here.
Important is the /g tag in the end to make the RegEx match globally.
If you are looking to extract all the characters except , then it could be achieved using either of the the REGEXP_REPLACE Calculated Fields below:
1) Replace , with (space)
REGEXP_REPLACE(country, ",", " ")
2) Remove ,
REGEXP_REPLACE(country, ",", "")
Google Data Studio Report and a GIF to elaborate: