Why does this substitution also remove a whitespace character? - regex

I am writing a script to extract and convert SQL statements from a file. I need to convert the sql unloaded froma gupta sqlbase database into sql that SQLServer can understand.
One task is to replace keywords that are not allowed as column names with a compatible name.
In the following code $commands is an array ref that contains sql statements. (There is actually more code here, but I extracted it because it shouldn't be relevant here)
my #KeyWords = ("LEFT", "RIGHT", "PERCENT", "FILE", "PRINT", "CROSS", "PLAN", "TOP", "END", "FILE", "Default", "CHECK", "TEXT");
foreach $cmd (#$commands) {
foreach my $kw (#KeyWords) {
$cmd =~ s/\b$kw\b[^(]/_$kw/gi;
}
push #$converted, $cmd;
}
This works fine for most statements but in the following command "DEFAULT" gets replaced with "_DEFAULT instead of "_DEFAULT". So the second quotation mark is lost.
CREATE TABLE SYSADM.SUBTYPE ( ID_SUBTYPE INTEGER NOT NULL,
ID_TYPE INTEGER NOT NULL,
TYPE VARCHAR(1),
BEZEICH VARCHAR(60),
NUM_COLOR INTEGER,
NUM_TXTCOLOR INTEGER,
"DEFAULT" SMALLINT,
GENER_ARBA SMALLINT,
PROJEKTPLANUNG SMALLINT)
Is there a way to modify the regular expression/substition so this will not remove the second quotation mark? Or an other way?

[^(] matches any single character that is not a left opening paranthesis.
You want to use a negative zero-width lookahead assertion instead:
s/\b$kw\b(?!\()/_$kw/gi;
(Alternatively: (?![(]))
You can also add the replaced character back to the string:
s/\b$kw\b([^(])/_$kw$1/gi;
But note that this will not work in all cases. Especially if there is nothing after the keyword, this pattern will not match whereas zero-width assertion will.

Related

Sublime Workflow for replacing quotes

I use text editor Sublime Text 3 to edit code, and very often I'll have a string literal wrapped in double quotes, that I want to change to single quotes, or vise versa. Right now I scroll to each quotation mark, and replace it with the one I want. Is there a faster workflow for this? Say, highlighting the word or a hotkey or something? I would find it super useful.
If you have a large number of such strings in a file and you want to convert all of them at once, you could use a regex find/replace operation to find and replace them all. You would use Find > Replace... or Find > Find in files... to search for a matching regex that captures the text in the quotes.
For example you could use \"([^"\n]*)\" as a search term and '\1' as the replacement text to swap all double quoted strings for single quotes.
You can't bind something like that to a key directly because Find/Replace can't be used in a Macro, but you could use the RegReplace package to do this if you want to go that route.
You can potentially speed up the workflow that you're currently using by taking advantage of multiple cursors, if you're not already doing that.
You could for example select the first quote, then press Ctrl+D or Option+D to select the other one. Now that you have two cursors, press Backspace to delete both quotes and press the new quote character to insert the new ones.
This can't be macro-ized and bound to a key because the find_under_expand command can't be used in a macro, though.
For a full key press solution, as far as I'm aware you would need a plugin of some sort to do this for you. One such example appears to be ChangeQuotes, although I've never personally used it.
It's also possible to write your own small plugin such as the following:
import sublime
import sublime_plugin
class SwapQuotesCommand(sublime_plugin.TextCommand):
pairs = ["'", '"']
def run(self, edit):
self.view.run_command("expand_selection", {"to": "scope"})
for sel in self.view.sel():
self.toggle(edit, sel)
def toggle(self, edit, region):
begin = self.view.substr(region.begin())
end = self.view.substr(region.end() - 1)
if begin == end and begin in self.pairs:
index = self.pairs.index(begin) + 1
new = self.pairs[index % len(self.pairs)]
for point in (region.begin(), region.end() - 1):
self.view.replace(edit, sublime.Region(point, point+1), new)
This expands the selection in all of the cursors out by the current scope, and then if both ends of the selection are a matching quote, the quote in use is swapped.
In use, you would use a key binding such as the following, which includes a context to make the key only trigger while the cursor is inside of a string so that it doesn't mess up your selection in cases where it definitely won't work.
{
"keys": ["ctrl+shift+'"], "command": "swap_quotes",
"context": [
{ "key": "selector", "operator": "equal", "operand": "string.quoted", "match_all": true }
]
},

perl regex to handle and preserver single and multiple words into a variable

I am writing a perl script to read the full name of a member and save it to variables firstname and lastname like below:
my ($firstname, $lastname) = $member =~ m/^(\w+.*?) +(\w+)$/;
my $member_name = $firstname.' '.$lastname;
The value for $member comes from an upstream service which would be like for example "Jane Doe"
Now the code above cannot handle when the service sends $member value like "Jane". The regex fails to handle a single word in that code line. I need it to handle both multiple and single words. I cannot implement a new code functionality so I am looking to add to the existing regex so that there is minimal change and that it can handle both the conditions.
So far this is what I am testing with in the command line but so far no luck:
perl -e 'my ($firstname, $lastname) = "Jane Doe" =~ m/^(\w+.*?) +(\w+)$/|m/^(\w+)$/; print "$firstname\n$lastname";'
When I substitute "Jane Doe" with "Jane", nothing prints. I want the code to be in this format though. like if the value is multiple words it should print them both, otherwise just the single word.
Your help will be greatly appreciated.
There is a syntax error in your Perl code. You terminated the pattern too early.
# / / / /
# V
m/^(\w+.*?) +(\w+)$/|m/^(\w+)$/
This will lead to the | being interpreted as a bit-wise or. Since there's another m// behind it, the | will take the return values of both m// operations and do its magic. The second m// will just match against the topic $_.
What you actually want is to merge both patterns.
my ($firstname, $lastname) = "Jane Doe" =~ m/^(?:(\w+.*?) +)?(\w+)$/;
You need to make the first name optional with a non-capture group (?:), followed by a ? none-or-one quantifier.
You cannot have three capture groups, like you probably intended, because the third one would go to $3, and not $1.
However, the above solution uses the last name, which you then assign to the $firstname variable. Your full name pattern allows for names with any characters in them, like Jean-Luc Picard. But if you pass in just Jean-Luc, the match will fail. So if you want only the first name, you should use the correct pattern to make it consistent.
A simple way of doing that is to make the last name optional instead.
my ($firstname, $lastname) = "Jane" =~ m/^(\w+.*?)(?: +(\w+))?$/;
Remember that this will set $lastname to undef, which doesn't matter so much in your command line example, but in a proper program with strict and warnings (which you of course have turned on, right?) it will complain if $lastname is used as a string while it's undef.
I suggest you read this article about names.

Replace pair of % in oracle

please, I have in Oracle table this texts (as 2 records)
"Sample text with replace parameter %1%"
"You reached 90% of your limit"
I need replace %1% with specific text from input parameter in Oracle Function. In fact, I can have more than just one replace parameters. I have also record with "Replace this %12% with real value"
This functionality I have programmed:
IF poc > 0 THEN
FOR i in 1 .. poc LOOP
p := get_param(mString => mbody);
mbody := replace(mbody,
'%' || p || '%', parameters(to_number(p, '99')));
END LOOP;
END IF;
But in this case I have problem with text number 2. This functionality trying replace "90%" also and I then I get this error:
ORA-06502: PL/SQL: numeric or value error: NULL index table key value
It's a possible to avoid try replace "90%"? Many thanks for advice.
Best regards
PS: Oracle version: 10g (OCI Version: 10.2)
Regular expressions can work here. Try the following and build them into your script.
SELECT REGEXP_REPLACE( 'Sample text with replace parameter %1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
and
SELECT REGEXP_REPLACE( 'Sample text with replace parameter 1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
The pattern is pretty simple; look for patterns where a '%' is followed by 1 or more numbers followed by a '%'.
The only issue here will be if you have more than one replacement to make in each string and each replacement is different. In that case you will need to loop round the string each time replacing the next parameter. To do this add the position and occurrence parameters to REGEXP_REPLACE after the replacement string, e.g.
REGEXP_REPLACE( 'Sample text with replace parameter %88888888888%','\%[0-9]+\%','db_size',0,1 )
You are getting the error because at parameters(to_number(p, '99')). Can you please check the value of p?
Also, if the p=90 then then REPLACE will not try to replace "90%". It will replace "%90%". How have you been sure that it's trying to replace "90%"?

OpenRefine custom text faceting

I have a column of names like:
Quaglia, Pietro Paolo
Bernard, of Clairvaux, Saint, or
.E., Calvin F.
Swingle, M Abate, Agostino, Assereto
Abati, Antonio
10-NA)\u, Ferraro, Giuseppe, ed, Biblioteca comunale ariostea. Mss. (Esteri
I want to make a Custom text facet with openrefine that mark as "true" the names with one comma and "false" all the others, so that I can work with those last (".E., Calvin F." is not a problem, I'll work with that later).
I'm trying using "Custom text facet" and this expression:
if(value.match(/([^,]+),([^,]+)/), "true", "false")
But the result is all false. What's the wrong part?
The expression you are using:
if(value.match(/([^,]+),([^,]+)/), "true", "false")
will always evaluate to false because the output of the 'match' function is either an array, or null. When evaluated by 'if' neither an array nor 'null' evaluate to true.
You can wrap the match function in a 'isNonBlank' or similar to get a boolean true/false, which would then cause the 'if' function to work as you want. However, once you have a boolean true/false result the 'if' becomes redundant as its only function is to turn the boolean true/false into string "true" or "false" - which won't make any difference to the values function of the custom text facet.
So:
isNonBlank(value.match(/([^,]+),([^,]+)/))
should give you the desired result using match
Instead of using 'match' you could use 'split' to split the string into an array using the comma as a split character. If you measure the length of the resulting array, it will give you the number of commas in the string (i.e. number of commas = length-1).
So your custom text facet expression becomes:
value.split(",").length()==2
This will give you true/false
If you want to break down the data based on the number of commas that appear, you could leave off the '==2' to get a facet which just gives you the length of the resulting array.
I would go with lookahead assertion to check if only 1 "," can find from the beginning until the end of line.
^(?=[^\,]+,[^\,]+$).*
https://regex101.com/r/iG4hX6/2

Is there a database that can store regex as values?

I am looking for a database that can store regex expressions as values. E.g. somthing like this:
{:name => "Tim", :count => 3, :expression => /t+/},
{:name => "Rob", :count => 4, :expression => /a\d+/},
{:name => "Fil", :count => 1, :expression => /tt/},
{:name => "Marc", :count => 1, :expression => /bb/}
So I could return rows/documents based on whether the query matches the expression or not (e.g."FIND rows WHERE "tt" =~ :expression"). And get Tim and Fil rows as the result. Most databases can do the exactly opposite thing (check whether a text field matches a regex query). But neither mongo nor postgres can do the opposite thing, unfortunately.
P.S. Or perhaps I am wrong and there are some extensions for postgres or mongo that allow me to store regex?
MongoDB will allow you to store actual regular expressions (i.e. not a string representing a regular expression), as shown below:
> db.mycoll.insertOne({myregex: /aa/})
{
"acknowledged" : true,
"insertedId" : ObjectId("5826414249bf0898c1059b38")
}
> db.mycoll.insertOne({myregex: /a+/})
{
"acknowledged" : true,
"insertedId" : ObjectId("5826414949bf0898c1059b39")
}
> db.mycoll.find()
{ "_id" : ObjectId("5826414249bf0898c1059b38"), "myregex" : /aa/ }
{ "_id" : ObjectId("5826414949bf0898c1059b39"), "myregex" : /a+/ }
You can use this to then query for rows with a regex that matches a query, as follows:
> db.mycoll.find(function() { return this.myregex.test('a'); } )
{ "_id" : ObjectId("5826414949bf0898c1059b39"), "myregex" : /a+/ }
Here we search for rows where the string 'a' is matched by the myregex field, resulting in the second document, with regex /a+/, being returned.
Oracle database can do that.
Example query: WHERE REGEXP_LIKE(first_name, '^Ste(v|ph)en$')
You want to select an regexp from a column, See SQL Fiddle example below for an example.
SQL Fiddle
Choose Oracle database.
In schema window execute the following:
CREATE TABLE regexp (name VARCHAR2(20), count NUMBER, regexp VARCHAR2(50));
INSERT INTO regexp VALUES ('Tim', 3, 't+');
INSERT INTO regexp VALUES ('Rob', 4, 'a\d+');
INSERT INTO regexp VALUES ('Fil', 1, 'tt');
INSERT INTO regexp VALUES ('Marc', 1, 'bb');
COMMIT;
Execute an SQL statement, e.g. (as you mentioned in your question):
SELECT * FROM regexp WHERE REGEXP_LIKE('tt', regexp);
Yields:
NAME COUNT REGEXP
Tim 3 t+
Fil 1 tt
Reference here.
Excerpt:
Oracle Database implements regular expression support with a set of
Oracle Database SQL functions and conditions that enable you to search
and manipulate string data. You can use these functions in any
environment that supports Oracle Database SQL. You can use these
functions on a text literal, bind variable, or any column that holds
character data such as CHAR, NCHAR, CLOB, NCLOB, NVARCHAR2, and
VARCHAR2 (but not LONG).
And some more info to consider:
A string literal in a REGEXP function or condition conforms to the
rules of SQL text literals. By default, regular expressions must be
enclosed in single quotes. If your regular expression includes the
single quote character, then enter two single quotation marks to
represent one single quotation mark within the expression. This
technique ensures that the entire expression is interpreted by the SQL
function and improves the readability of your code. You can also use
the q-quote syntax to define your own character to terminate a text
literal. For example, you could delimit your regular expression with
the pound sign (#) and then use a single quote within the expression.
Note: If your expression comes from a column or a bind variable, then
the same rules for quoting do not apply.
Note there is no column type named RegEx, you would need to save the string as is, in a textual column.
Also you can use RegEx in constraint checking and when you project columns.
SQL Server (and probably some other SQL databases) supports this out of the box, though as has been noted before, this can only be executed by the database as a table scan -- something to keep in mind if you have large numbers of regexes. You just reverse the usual order of the LIKE operator:
create table demo.query
(
id int identity not null,
regex nvarchar(max),
primary key(id)
);
insert into demo.query (regex) values ('aa%');
select * from demo.query where 'aaaa' like regex;
Looks a little funny, but it's perfectly valid.
Adding to Ely's answer, thought of letting you all know that MySQL also supports this.
In http://sqlfiddle.com/, I tested with MySQL 5.6
Build schema:
CREATE TABLE rule (name VARCHAR(20), tot INT, exp VARCHAR(50));
INSERT INTO rule VALUES ('Tim', 3, 't+');
INSERT INTO rule VALUES ('Rob', 4, 'a\d+');
INSERT INTO rule VALUES ('Fil', 1, 'tt');
INSERT INTO rule VALUES ('Jack', 1, '^tt$');
INSERT INTO rule VALUES ('Marc', 1, 'bb');
COMMIT;
Test:
select * from rule where 'ttt' RLIKE exp ;
Expected: rows for Tim, and Fil