Find/Replace string that doesn't contain quotes - regex

I have inherited a rather large/ugly php codebase (language is unimportant, this is a generic vim question) , where nothing is quoted properly (old php doesn't mind, but new php versions throw warnings).
I'd like to turn $something[somekey] into $something['somekey'], only if its not already quoted or contain the character $
I was trying to build a regular expression to quote the keys, but just cant seem to be able get it to cooperate.
This is what i have so far, which doesn't work but maybe will help explain my question better. And to show that i have actually tried.
:%s/\v\$(.{-})\[(['"$]#<!.{-})\]/$\1['\2']/
My goal is to have something like this:
$something[somekey] = $something['somekey']
$somethingelse[someotherthing] = $something['someotherthing']
$another['key'] = $another['key'] (is ignored)
$yetanother["keykey"] = $yetanother["keykey"] (is ignored)
$derp[$herp] = $derp[$herp] (is ignored)
$array[3] = $array[3] (is ignored)
These can appear anywhere in text, even multiple on the same line, and even touching each other like $something[key]$something[key2], which i would like to be replaced with $something['key']$something['key2']
Another problem, there seems to be random javascript arrays in some files.. which have [] square brackets. So the regex needs to check to see if it starts with $ and text before the brackets.
Im probably asking for the impossible, but any help on this would be great before i go insane editing each file one by one manually.
EDIT: forgot that keys can be numeric, and shouldn't be quoted.

I tried the following, which processed everything from your question correctly:
:%s/\[\(\I\i*\)\]/['\1']/g
Or, with optional white spaces inside the parens:
:%s/\[\s*\(\I\i*\)\s*\]/['\1']/g
And also checking for $identifier before the parens:
:%s/\(\$\i\+\)\[\s*\(\I\i*\)\s*\]/\1['\2']/g

Related

Regex-Match while ignoring a char from Searchword

I am using an Engineering Program which lets me Code formulas in order to filter out specific lines in a database. I am trying to look for a certain line in the database which contains e.g. "concrete" as a property.
In the Code I can use regular expressions.
The regex I was using so far looked like this:
".*(concrete).*";
so if the line in the database contains concrete, I will get the wanted result.
Now the Problem is: i would like to switch the word concrete with a variable, so that it Looks like this:
".*(#VARIABLE1).*";
(the Syntax with the # works in the program btw.)
the Problem is: if i set the variable as concrete, the program automatically switches it for 'concrete' . Obviously, the word concrete cant be found anymore, since the searchterm now contains the two ' Symbols in the beginning and i the end.
Is there a way to ignore those two characters using the Right regex?
what I want it to do is the following:
If a line in the database contains "25cm concrete in Grey"
I should get a match from the regex.
with the searchterm ".*(concrete).*"; it works, with the variable ".*(#VARIABLE1).*"; it doesnt.
EDIT:
the whole "Formula" in the program Looks like that:
if(Match(QTO(Typ:="Attribut{FloorsLayer_02_MaterialName}");".*(#V_QUALITY).*" ;"regex") ;QTO(Typ:="Attribut{Fläche}");0)
I want the if-condition to be true, when the match inside is true.
the whole QTO function is just the programs Syntax to use a certain Attribute into the match-function, the middle part is my Problem. I really don't know the programming language or anything,I'm new to this. hope it helps!
Thats more of a hack than a real solution and i'm not sure if it even works:
if you use the regex
.*(#VARIABLE1)?).*
and the string ?concrete(
this will result in a regex looking like this:
.*('?concrete(')?).*
which makes the additional characters optional.
This uses the following assumtption:
the string (#VARIABLE1) gets replaced by the ('<content of VARIABLE1>')

Scanning a language with non-delimited strings with nested tokens

I want to create a lexer/parser for a language that has non-delimited strings.
Which part of the language is a string is defined by the command preceding it.
For example it has statements that look like this:
pause 5
alert Hello world[CRLF] this contains 'pause' once (1)
Alert in this instance can end with any string, including keywords and numbers.
Further complicating things, the text can contain tags like [CRLF] that I want to separate too.
Ideally I'd want this to be broken up into:
[PAUSE][INT 5]
[ALERT][STR "Hello world"][CRLF][STR " this contains 'pause' once (1)"]
I'm currently using flex but from what I've gathered this kind of thing isn't possible with flex.
How can I achieve what I want here?
(Since one of your tags is "regex", I'll suggest a non-flex approach.)
From the example, it seems like you could just:
match each line against ^(\w+) (.+) to obtain command and arguments-text, and then
get individual arguments by splitting the arguments-text on (\[\w+\]) (assuming your regex library's split function can return both the splitter-strings and the split-strings).
It's possible your actual situation is more complex and something like flex makes more sense, but I'm not really seeing it so far.

Highlighting Javascript Inline Block Comments in Vim

Background
I use JScript (Microsoft's ECMAScript implementation) for a lot of Windows system administration needs. This means I use a lot of ActiveX (Automated COM) objects. The methods of these objects often expect Number or Boolean arguments. For example:
var fso = new ActiveXObject("Scripting.FileSystemObject");
var a = fso.CreateTextFile("c:\\testfile.txt", true);
a.WriteLine("This is a test.");
a.Close();
(CreateTextFile Method on MSDN)
On the second line you see that the second argument is one that I'm talking about. A Boolean of "true" doesn't really describe how the method's behavior will change. This isn't a problem for me, but my automation-shy coworkers are easily spooked. Not knowing what an argument does spooks them. Unfortunately a long list of constants (not real constants, of course, since current JScript versions don't support them) will also spook them. So I've taken to documenting some of these basic function calls with inline block comments. The second line in the above example would be written as such:
var a = fso.CreateTextFile("c:\\testfile.txt", /*overwrite*/ true, /*unicode*/ false);
That ends up with a small syntax highlighting dilemma for me, though. I like my comments highlighted vibrantly; both block and line comments. These tiny inline block comments mean little to me, personally, however. I'd like to highlight those particular comments in a more muted fashion (light gray on white, for example). Which brings me to my dilemma.
Dilemma
I'd like to override the default syntax highlighting for block comments when both the beginning and end marks are on the same line. Ideally this is done solely in my vimrc file, and not in a superseding personal copy of the javascript.vim syntax. My initial attempt is pathetic:
hi inlineComment guifg=#bbbbbb
match inlineComment "\/\*.*\*\/"
Straight away you can see the first problem with this regular expression pattern is that it's a greedy search. It's going to match from the first "/*" to the last "*/" on the line, meaning everything between two inline block comments will get this highlight style as well. I can fix that, but I'm really not sure how to deal with my second concern.
Comments can't be defined inside of String literals in ECMAScript. So this syntax highlighting will override String highlighting as well. I've never had a problem with this in system administration scripts, but it does often bite me when I'm examining the source of many javascript libraries intended for browsers (less.js for example).
What regex pattern, syntax definition, or other solution would the amazing StackOverflow community recommend to restore my vimrc zen?
I'm not sure, but from your description it sounds like you don't need a new syntax definition. Vim syntax files usually let you override a particular syntax item with your own choice of highlighting. In this case, the item you want is called javaScriptComment, so a command like this will set its highlighting:-
hi javaScriptComment guifg=#bbbbbb
but you have to do this in your .vimrc file (or somewhere that's sourced from there), so it's evaluated before the syntax file. The syntax file uses the highlight default command, so the syntax file's choice of highlighting only affects syntax items with no highlighting set. See :help :hi-default for more details on that. BTW, it only works on Vim 5.8 and later.
The above command will change all inline /* */ comments, and leave // line comments with their default setting, because line comments are a different syntax item (javaScriptLineComment). You can find the names of all these groups by looking at the javascript.vim file. (The easiest way to do this is :e $VIMRUNTIME/syntax/javascript.vim .)
If you only want to change some inline comments, it's a little more complicated, but still easy to see what to do by looking at javascript.vim . If you do that, you can see that block comments are defined like this:-
syn region javaScriptComment start="/\*" end="\*/" contains=#Spell,javaScriptCommentTodo
See that you can use separate regexes for begin and end markers: you don't need to worry about matching the stuff in between with non-greedy quantifiers, or anything like that. To have a syntax item that works similarly but only on one line, try adding the oneline option (:h :syn-oneline for more details):-
syn region myOnelineComment start="/\*" end="\*/" oneline
I've removed the two contains groups because (1) if you're only using it for parameter names, you probably don't want spell-checking turned on inside these comments, and (2) contained sections that aren't oneline override the oneline in the container region, so you would still match all TODO comments with this region.
You can define this new kind of comment region in your .vimrc, and set the highlighting how you like: it looks like you already know how to do that, so I won't go into more details on that. I haven't tried out this particular example, so you may still need a bit of fiddling to make it work. Give it a try and let me know how it goes.
Why don't you simply add a comment line above the call?
I think that
// fso.CreateTextFile(filename:String, overwrite:Boolean, unicode:Boolean)
var a = fso.CreateTextFile("c:\\testfile.txt", true, false);
is a lot more readable and informative than
var a = fso.CreateTextFile("c:\\testfile.txt", /*overwrite*/ true, /*unicode*/ false);

Use cases for regular expression find/replace

I recently discussed editors with a co-worker. He uses one of the less popular editors and I use another (I won't say which ones since it's not relevant and I want to avoid an editor flame war). I was saying that I didn't like his editor as much because it doesn't let you do find/replace with regular expressions.
He said he's never wanted to do that, which was surprising since it's something I find myself doing all the time. However, off the top of my head I wasn't able to come up with more than one or two examples. Can anyone here offer some examples of times when they've found regex find/replace useful in their editor? Here's what I've been able to come up with since then as examples of things that I've actually had to do:
Strip the beginning of a line off of every line in a file that looks like:
Line 25634 :
Line 632157 :
Taking a few dozen files with a standard header which is slightly different for each file and stripping the first 19 lines from all of them all at once.
Piping the result of a MySQL select statement into a text file, then removing all of the formatting junk and reformatting it as a Python dictionary for use in a simple script.
In a CSV file with no escaped commas, replace the first character of the 8th column of each row with a capital A.
Given a bunch of GDB stack traces with lines like
#3 0x080a6d61 in _mvl_set_req_done (req=0x82624a4, result=27158) at ../../mvl/src/mvl_serv.c:850
strip out everything from each line except the function names.
Does anyone else have any real-life examples? The next time this comes up, I'd like to be more prepared to list good examples of why this feature is useful.
Just last week, I used regex find/replace to convert a CSV file to an XML file.
Simple enough to do really, just chop up each field (luckily it didn't have any escaped commas) and push it back out with the appropriate tags in place of the commas.
Regex make it easy to replace whole words using word boundaries.
(\b\w+\b)
So you can replace unwanted words in your file without disturbing words like Scunthorpe
Yesterday I took a create table statement I made for an Oracle table and converted the fields to setString() method calls using JDBC and PreparedStatements. The table's field names were mapped to my class properties, so regex search and replace was the perfect fit.
Create Table text:
...
field_1 VARCHAR2(100) NULL,
field_2 VARCHAR2(10) NULL,
field_3 NUMBER(8) NULL,
field_4 VARCHAR2(100) NULL,
....
My Regex Search:
/([a-z_])+ .*?,?/
My Replacement:
pstmt.setString(1, \1);
The result:
...
pstmt.setString(1, field_1);
pstmt.setString(1, field_2);
pstmt.setString(1, field_3);
pstmt.setString(1, field_4);
....
I then went through and manually set the position int for each call and changed the method to setInt() (and others) where necessary, but that worked handy for me. I actually used it three or four times for similar field to method call conversions.
I like to use regexps to reformat lists of items like this:
int item1
double item2
to
public void item1(int item1){
}
public void item2(double item2){
}
This can be a big time saver.
I use it all the time when someone sends me a list of patient visit numbers in a column (say 100-200) and I need them in a '0000000444','000000004445' format. works wonders for me!
I also use it to pull out email addresses in an email. I send out group emails often and all the bounced returns come back in one email. So, I regex to pull them all out and then drop them into a string var to remove from the database.
I even wrote a little dialog prog to apply regex to my clipboard. It grabs the contents applies the regex and then loads it back into the clipboard.
One thing I use it for in web development all the time is stripping some text of its HTML tags. This might need to be done to sanitize user input for security, or for displaying a preview of a news article. For example, if you have an article with lots of HTML tags for formatting, you can't just do LEFT(article_text,100) + '...' (plus a "read more" link) and render that on a page at the risk of breaking the page by splitting apart an HTML tag.
Also, I've had to strip img tags in database records that link to images that no longer exist. And let's not forget web form validation. If you want to make a user has entered a correct email address (syntactically speaking) into a web form this is about the only way of checking it thoroughly.
I've just pasted a long character sequence into a string literal, and now I want to break it up into a concatenation of shorter string literals so it doesn't wrap. I also want it to be readable, so I want to break only after spaces. I select the whole string (minus the quotation marks) and do an in-selection-only replace-all with this regex:
/.{20,60} /
...and this replacement:
/$0"¶ + "/
...where the pilcrow is an actual newline, and the number of spaces varies from one incident to the next. Result:
String s = "I recently discussed editors with a co-worker. He uses one "
+ "of the less popular editors and I use another (I won't say "
+ "which ones since it's not relevant and I want to avoid an "
+ "editor flame war). I was saying that I didn't like his "
+ "editor as much because it doesn't let you do find/replace "
+ "with regular expressions.";
The first thing I do with any editor is try to figure out it's Regex oddities. I use it all the time. Nothing really crazy, but it's handy when you've got to copy/paste stuff between different types of text - SQL <-> PHP is the one I do most often - and you don't want to fart around making the same change 500 times.
Regex is very handy any time I am trying to replace a value that spans multiple lines. Or when I want to replace a value with something that contains a line break.
I also like that you can match things in a regular expression and not replace the full match using the $# syntax to output the portion of the match you want to maintain.
I agree with you on points 3, 4, and 5 but not necessarily points 1 and 2.
In some cases 1 and 2 are easier to achieve using a anonymous keyboard macro.
By this I mean doing the following:
Position the cursor on the first line
Start a keyboard macro recording
Modify the first line
Position the cursor on the next line
Stop record.
Now all that is needed to modify the next line is to repeat the macro.
I could live with out support for regex but could not live without anonymous keyboard macros.

Why is this regular expression faster?

I'm writing a Telnet client of sorts in C# and part of what I have to parse are ANSI/VT100 escape sequences, specifically, just those used for colour and formatting (detailed here).
One method I have is one to find all the codes and remove them, so I can render the text without any formatting if needed:
public static string StripStringFormating(string formattedString)
{
if (rTest.IsMatch(formattedString))
return rTest.Replace(formattedString, string.Empty);
else
return formattedString;
}
I'm new to regular expressions and I was suggested to use this:
static Regex rText = new Regex(#"\e\[[\d;]+m", RegexOptions.Compiled);
However, this failed if the escape code was incomplete due to an error on the server. So then this was suggested, but my friend warned it might be slower (this one also matches another condition (z) that I might come across later):
static Regex rTest =
new Regex(#"(\e(\[([\d;]*[mz]?))?)?", RegexOptions.Compiled);
This not only worked, but was in fact faster to and reduced the impact on my text rendering. Can someone explain to a regexp newbie, why? :)
Do you really want to do run the regexp twice? Without having checked (bad me) I would have thought that this would work well:
public static string StripStringFormating(string formattedString)
{
return rTest.Replace(formattedString, string.Empty);
}
If it does, you should see it run ~twice as fast...
The reason why #1 is slower is that [\d;]+ is a greedy quantifier. Using +? or *? is going to do lazy quantifing. See MSDN - Quantifiers for more info.
You may want to try:
"(\e\[(\d{1,2};)*?[mz]?)?"
That may be faster for you.
I'm not sure if this will help with what you are working on, but long ago I wrote a regular expression to parse ANSI graphic files.
(?s)(?:\e\[(?:(\d+);?)*([A-Za-z])(.*?))(?=\e\[|\z)
It will return each code and the text associated with it.
Input string:
<ESC>[1;32mThis is bright green.<ESC>[0m This is the default color.
Results:
[ [1, 32], m, This is bright green.]
[0, m, This is the default color.]
Without doing detailed analysis, I'd guess that it's faster because of the question marks. These allow the regular expression to be "lazy," and stop as soon as they have enough to match, rather than checking if the rest of the input matches.
I'm not entirely happy with this answer though, because this mostly applies to question marks after * or +. If I were more familiar with the input, it might make more sense to me.
(Also, for the code formatting, you can select all of your code and press Ctrl+K to have it add the four spaces required.)