C / C++ implement and pass code [duplicate] - c++

This question already has answers here:
String literals C++?
(4 answers)
Closed 9 years ago.
I have interpreter from another language in C. I have to pass code (about 200 lines) from another language to this interpreter and here problem occur.
char* command_line[] = {
"",
"-e",
"print \"Hello from C!\\n\";"
};
This code is parse by:
(..)
parser(my_p, NULL, 3, command_line, (char **)NULL);
(...)
In code abobe I use array but even simple code have to be wrapped with \ before chars like " ; | \ etc.
How to avoid this problem and pass more then 200 multi row lines of code comfortable?

If you are using C++11, you can use raw string literals.
R"(print "Hello from C!\n";)"
Or you can simply put all the data into an external file and just read it on start. No need to escape any data there.

You could use the C preprocessor to do the stringification for you:
#define STRINGIFY(...) #__VA_ARGS__
#define STRINGIFY_NL(...) #__VA_ARGS__ "\n"
char* command_line[] = {
"",
"-e",
STRINGIFY(print "Hello from C!\n";), //< one element of the array
//< another element of the array
//< with embedded NL
STRINGIFY_NL(print "Hello from C!") //< no comma and the next line is glued
STRINGIFY ("for a second time";), //< end of other string
};
The only restrictions to observe would be that possible () have to balance inside the argument to STRINGIFY and that you'd have to place the macro on each line that you want to escape.

Unfortunately there is no support for literal strings or similar useful constructs in C, so if you want to write the interpreted code inside your C program you will have to be careful and escape quotes and slashes as you have stated.
The alternative is to write the code into a text file and treat it as an external resource file. You can read the resource file from inside your code into a string and then pass that to parser().
The ease of this depends on the platform you are using. Windows has good support for resource files and embedding them in to .exe files. I am sure it is possible with gcc too, but I haven't done it before. A bit vague I'm afraid, but I hope it helps.

You can use a script to take your text input, as a file, and stringify it (escaping double-quotes and newlines):
#!/usr/bin/env python
import sys
def main(argv = None):
if argv is None:
argv = sys.argv
if len(argv) < 2:
sys.stderr.write("Usage: stringify.py input.txt [... input.txt]\n")
sys.exit(1)
i = 1
while i < len(argv):
inname = argv[i]
firstline = True
try:
infile = open(inname, "r")
for line in infile:
line = line.replace('"', '\\"').replace('\n', '\\n')
sys.stdout.write('"{0}"\n'.format(line))
except IOError, msg:
sys.stderr.write("exception {0}\n".format(msg))
return 2
i = i + 1
return 0
if __name__ == "__main__":
sys.exit(main())

Related

python, pyparsing, stopOn and repeating structures

The time has come to brush up on my pyparsing skills.
given a file containing repetitive structures
space_missions
Main Objects:
/Projects/antares_III
/Projects/apollo
ground_missions
Main Objects:
/Projects/Barbarossa
/Projects/Desert_Eagle
and my chopped-down 2.7 script
def last_occurance_of( expr):
return expr + ~pp.FollowedBy( expr)
ppKeyName = pp.Word( pp.alphanums)
ppObjectLabel = pp.Literal("Main Objects") + pp.FollowedBy(':')
ppObjectRegex = pp.Regex(r'\/Projects\/\w+')
ppTag = pp.Group( ppKeyName.setResultName('keyy') + pp.Suppress( ppObjectLabel) + pp.ZeroOrMore( ppObjectRegex, stopOn=last_occurance_of( ppObjectRegex)).setResultName('objects') )
ppTags = pp.OneOrMore( ppTag)
with open( fn) as fp:
slurp = fp.read()
results = ppTags.parseString( slurp)
I'd like to get results to return
[['space_missions',['/Projects/antares_III','/Projects/apollo']
,['ground_missions',['/Projects/Barbarossa','/Projects/Desert_Eagle']]
So what am I missing here? I realize I'm lucky in that the strings that make up the lists all have the same beginning which gives last_occurance_of() something to lock on to, but what does one do in the more general case where the strings have nothing to differentiate them from tag-strings
Still-Searching Steve
Three things to fix in your parser:
Your given ppKeyNames include '_'s, but you don't include them in the definition of ppKeyName
ppObjectLabel will parse "Main Objects" followed by a ':', but the ':' does not actually get parsed anywhere. Easiest to just add it to ppObjectLabel instead of using pp.FollowedBy.
last_occurance_of is unnecessary, the repetition of ppObjectRegex will not be confused by the next tag's ppKeyName

C++ , How can I ignore comma (,) from csv char *?

I have searched a lot about it on SO and solutions like "" the part where comma is are giving errors. Moreover it is using C++ :)
char *msg = new char[40];
msg = "1,2, Hello , how are you ";
char msg2[30];
strcpy_s(msg2, msg);
char * pch;
pch = strtok(msg2, ",");
while (pch != NULL)
{
cout << pch << endl;
pch = strtok(NULL, ",");
}
Output I want :
1
2
Hello , how are you
Out put it is producing
1
2
Hello
how are you
I have tried putting "" around Hello , how are you. But it did not help.
The CSV files are comma separated values. If you want a comma inside the value, you have to surround it with quotes.
Your example in CSV, as you need your output, should be:
msg = "1,2, \"Hello , how are you \"";
so the value Hello , how are you is surrounded with quotes.
This is the standard CSV. This has nothing to do with the behaviour of the strtok function.
The strtok function just searches, without considering anything else, the tokens you have passed to it, in this case the ,, thus it ignores the ".
In order to make it work as you want, you would have to tokenize with both tokens, the , and the ", and consider the previous found token in order to decide if the , found is a new value or it is inside quotes.
NOTE also that if you want to be completely conforming with the CSV specification, you should consider that the quotes may also be escaped, in order to have a quote character inside the value term. See this answer for an example:
Properly escape a double quote in CSV
NOTE 2: Just for completeness, here is the CSV specification (RFC-4180): https://www.rfc-editor.org/rfc/rfc4180

Extracting data using regular expressions: Python

The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for a regular expression of [0-9]+ and then converting the extracted strings to integers and summing up the integers.
I am finding trouble in appending the list. From my below code, it is just appending the first(0) index of the line. Please help me. Thank you.
import re
hand = open ('a.txt')
lst = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('[0-9]+', line)
if len(stuff)!= 1 : continue
num = int (stuff[0])
lst.append(num)
print sum(lst)
import re
ls=[];
text=open('C:/Users/pvkpu/Desktop/py4e/file1.txt');
for line in text:
line=line.rstrip();
l=re.findall('[0-9]+',line);
if len(l)==0:
continue
ls+=l
for i in range(len(ls)):
ls[i]=int(ls[i]);
print(sum(ls));
Great, thank you for including the whole txt file! Your main problem was in the if len(stuff)... line which was skipping if stuff had zero things in it and when it had 2,3 and so on. You were only keeping stuff lists of length 1. I put comments in the code but please ask any questions if something is unclear.
import re
hand = open ('a.txt')
str_num_lst = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('[0-9]+', line)
#If we didn't find anything on this line then continue
if len(stuff) == 0: continue
#if len(stuff)!= 1: continue #<-- This line was wrong as it skip lists with more than 1 element
#If we did find something, stuff will be a list of string:
#(i.e. stuff = ['9607', '4292', '4498'] or stuff = ['4563'])
#For now lets just add this list onto our str_num_list
#without worrying about converting to int.
#We use '+=' instead of 'append' since both stuff and str_num_lst are lists
str_num_lst += stuff
#Print out the str_num_list to check if everything's ok
print str_num_lst
#Get an overall sum by looping over the string numbers in the str_num_lst
#Can convert to int inside the loop
overall_sum = 0
for str_num in str_num_lst:
overall_sum += int(str_num)
#Print sum
print 'Overall sum is:'
print overall_sum
EDIT:
You are right, reading in the entire file as one line is a good solution, and it's not difficult to do. Check out this post. Here is what the code could look like.
import re
hand = open('a.txt')
all_lines = hand.read() #Reads in all lines as one long string
all_str_nums_as_one_line = re.findall('[0-9]+',all_lines)
hand.close() #<-- can close the file now since we've read it in
#Go through all the matches to get a total
tot = 0
for str_num in all_str_nums_as_one_line:
tot += int(str_num)
print('Overall sum is:',tot) #editing to add ()

Python lstrip Sometimes Removes Extra Character [duplicate]

This question already has answers here:
Python string.strip stripping too many characters [duplicate]
(3 answers)
Closed 7 years ago.
I have Python 2.7 code that operates on a list of files. In part of the code I strip away the directory information. Today I was surprised to find that code didn't work correctly, when the file names begin with "s". This sample code demonstrates the problem:
import os
TEST_RESULTS_DIR = ".." + os.sep + "Test Results"
filename = TEST_RESULTS_DIR + os.sep + "p_file.txt"
stripped_filename = filename.lstrip(TEST_RESULTS_DIR + os.sep)
print ("%s : %s") % (filename, stripped_filename)
filename = TEST_RESULTS_DIR + os.sep + "s_file.txt"
stripped_filename = filename.lstrip(TEST_RESULTS_DIR + os.sep)
print ("%s : %s") % (filename, stripped_filename)
When I run this code, I get this:
..\Test Results\p_file.txt : p_file.txt
..\Test Results\s_file.txt : _file.txt
Does anyone understand why?
Lstrip doesn't replace a string at the beginning of another string, it strips all characters that match the characters in the string argument from the string it is called on.
For example:
"aaabbbc".lstrip("ba") = "c"
Your directory has an s in it, so it get's striped, you would see the same result if the file started with a u or an e.

How to parse a command line with regular expressions?

I want to split a command line like string in single string parameters. How look the regular expression for it. The problem are that the parameters can be quoted. For example like:
"param 1" param2 "param 3"
should result in:
param 1, param2, param 3
You should not use regular expressions for this. Write a parser instead, or use one provided by your language.
I don't see why I get downvoted for this. This is how it could be done in Python:
>>> import shlex
>>> shlex.split('"param 1" param2 "param 3"')
['param 1', 'param2', 'param 3']
>>> shlex.split('"param 1" param2 "param 3')
Traceback (most recent call last):
[...]
ValueError: No closing quotation
>>> shlex.split('"param 1" param2 "param 3\\""')
['param 1', 'param2', 'param 3"']
Now tell me that wrecking your brain about how a regex will solve this problem is ever worth the hassle.
I tend to use regexlib for this kind of problem. If you go to: http://regexlib.com/ and search for "command line" you'll find three results which look like they are trying to solve this or similar problems - should be a good start.
This may work:
http://regexlib.com/Search.aspx?k=command+line&c=-1&m=-1&ps=20
("[^"]+"|[^\s"]+)
what i use
C++
#include <iostream>
#include <iterator>
#include <string>
#include <regex>
void foo()
{
std::string strArg = " \"par 1\" par2 par3 \"par 4\"";
std::regex word_regex( "(\"[^\"]+\"|[^\\s\"]+)" );
auto words_begin =
std::sregex_iterator(strArg.begin(), strArg.end(), word_regex);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i)
{
std::smatch match = *i;
std::string match_str = match.str();
std::cout << match_str << '\n';
}
}
Output:
"par 1"
par2
par3
"par 4"
Without regard to implementation language, your regex might look something like this:
("[^"]*"|[^"]+)(\s+|$)
The first part "[^"]*" looks for a quoted string that doesn't contain embedded quotes, and the second part [^"]+ looks for a sequence of non-quote characters. The \s+ matches a separating sequence of spaces, and $ matches the end of the string.
Regex: /[\/-]?((\w+)(?:[=:]("[^"]+"|[^\s"]+))?)(?:\s+|$)/g
Sample: /P1="Long value" /P2=3 /P3=short PwithoutSwitch1=any PwithoutSwitch2
Such regex can parses the parameters list that built by rules:
Parameters are separates by spaces (one or more).
Parameter can contains switch symbol (/ or -).
Parameter consists from name and value that divided by symbol = or :.
Name can be set of alphanumerics and underscores.
Value can absent.
If value exists it can be the set of any symbols, but if it has the space then value should be quoted.
This regex has three groups:
the first group contains whole parameters without switch symbol,
the second group contains name only,
the third group contains value (if it exists) only.
For sample above:
Whole match: /P1="Long value"
Group#1: P1="Long value",
Group#2: P1,
Group#3: "Long value".
Whole match: /P2=3
Group#1: P2=3,
Group#2: P2,
Group#3: 3.
Whole match: /P3=short
Group#1: P3=short,
Group#2: P3,
Group#3: short.
Whole match: PwithoutSwitch1=any
Group#1: PwithoutSwitch1=any,
Group#2: PwithoutSwitch1,
Group#3: any.
Whole match: PwithoutSwitch2
Group#1: PwithoutSwitch2,
Group#2: PwithoutSwitch2,
Group#3: absent.
Most languages have other functions (either built-in or provided by a standard library) which will parse command lines far more easily than building your own regex, plus you know they'll do it accurately out of the box. If you edit your post to identify the language that you're using, I'm sure someone here will be able to point you at the one used in that language.
Regexes are very powerful tools and useful for a wide range of things, but there are also many problems for which they are not the best solution. This is one of them.
This will split an exe from it's params; stripping parenthesis from the exe; assumes clean data:
^(?:"([^"]+(?="))|([^\s]+))["]{0,1} +(.+)$
You will have two matches at a time, of three match groups:
The exe if it was wrapped in parenthesis
The exe if it was not wrapped in parenthesis
The clump of parameters
Examples:
"C:\WINDOWS\system32\cmd.exe" /c echo this
Match 1: C:\WINDOWS\system32\cmd.exe
Match 2: $null
Match 3: /c echo this
C:\WINDOWS\system32\cmd.exe /c echo this
Match 1: $null
Match 2: C:\WINDOWS\system32\cmd.exe
Match 3: /c echo this
"C:\Program Files\foo\bar.exe" /run
Match 1: C:\Program Files\foo\bar.exe
Match 2: $null
Match 3: /run
Thoughts:
I'm pretty sure that you would need to create a loop to capture a possibly infinite number of parameters.
This regex could easily be looped onto it's third match until the match fails; there are no more params.
If its just the quotes you are worried about, then just write a simple loop to dump character by character to a string ignoring the quotes.
Alternatively if you are using some string manipulation library, you can use it to remove all quotes and then concatenate them.
there's a python answer thus we shall have a ruby answer as well :)
require 'shellwords'
Shellwords.shellsplit '"param 1" param2 "param 3"'
#=> ["param 1", "param2", "param 3"] or :
'"param 1" param2 "param 3"'.shellsplit
Though answer is not RegEx specific but answers Python commandline arg parsing:
dash and double dash flags
int/float conversion based on SO answer
import sys
def parse_cmd_args():
_sys_args = sys.argv
_parts = {}
_key = "script"
_parts[_key] = [_sys_args.pop(0)]
for _part in _sys_args:
# Parse numeric values float and integers
if _part.replace("-", "1", 1).replace(".", "1").replace(",", "").isdigit():
_part = int(_part) if '.' not in _part and float(_part)/int(_part) == 1 else float(_part)
_parts[_key].append(_part)
elif "=" in _part:
_part = _part.split("=")
_parts[_part[0].strip("-")] = _part[1].strip().split(",")
elif _part.startswith(("-")):
_key = _part.strip("-")
_parts[_key] = []
else:
_parts[_key].extend(_part.split(","))
return _parts
Something like:
"(?:(?<=")([^"]+)"\s*)|\s*([^"\s]+)
or a simpler one:
"([^"]+)"|\s*([^"\s]+)
(just for the sake of finding a regexp ;) )
Apply it several time, and the group n°1 will give you the parameter, whether it is surrounded by double quotes or not.
If you are looking to parse the command and the parameters I use the following (with ^$ matching at line breaks aka multiline):
(?<cmd>^"[^"]*"|\S*) *(?<prm>.*)?
In case you want to use it in your C# code, here it is properly escaped:
try {
Regex RegexObj = new Regex("(?<cmd>^\\\"[^\\\"]*\\\"|\\S*) *(?<prm>.*)?");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
It will parse the following and know what is the command versus the parameters:
"c:\program files\myapp\app.exe" p1 p2 "p3 with space"
app.exe p1 p2 "p3 with space"
app.exe
Here's a solution in Perl:
#!/usr/bin/perl
sub parse_arguments {
my $text = shift;
my $i = 0;
my #args;
while ($text ne '') {
$text =~ s{^\s*(['"]?)}{}; # look for (and remove) leading quote
my $delimiter = ($1 || ' '); # use space if not quoted
if ($text =~ s{^(([^$delimiter\\]|\\.|\\$)+)($delimiter|$)}{}) {
$args[$i++] = $1; # acquired an argument; save it
}
}
return #args;
}
my $line = <<'EOS';
"param 1" param\ 2 "pa\"ram' '3" 'pa\'ram" "4'
EOS
say "ARG: $_" for parse_arguments($line);
Output:
ARG: param 1
ARG: param\ 2
ARG: pa"ram' '3
ARG: pa'ram" "4
Note the following:
Arguments can be quoted with either " or ' (with the "other"
quote type treated as a regular character for that argument).
Spaces and quotes in arguments can be escaped with \.
The solution can be adapted to other languages. The basic approach is to (1) determine the delimiter character for the next string, (2) extract the next argument up to an unescaped occurrence of that delimiter or to the end-of-string, then (3) repeat until empty.
\s*("[^"]+"|[^\s"]+)
that's it
(reading your question again, just prior to posting I note you say command line LIKE string, thus this information may not be useful to you, but as I have written it I will post anyway - please disregard if I have missunderstood your question.)
If you clarify your question I will try to help but from the general comments you have made i would say dont do that :-), you are asking for a regexp to split a series of parmeters into an array. Instead of doing this yourself I would strongly suggest you consider using getopt, there are versions of this library for most programming languages. Getopt will do what you are asking and scales to manage much more sophisticated argument processing should you require that in the future.
If you let me know what language you are using I will try and post a sample for you.
Here are a sample of the home pages:
http://www.codeplex.com/getopt
(.NET)
http://www.urbanophile.com/arenn/hacking/download.html
(java)
A sample (from the java page above)
Getopt g = new Getopt("testprog", argv, "ab:c::d");
//
int c;
String arg;
while ((c = g.getopt()) != -1)
{
switch(c)
{
case 'a':
case 'd':
System.out.print("You picked " + (char)c + "\n");
break;
//
case 'b':
case 'c':
arg = g.getOptarg();
System.out.print("You picked " + (char)c +
" with an argument of " +
((arg != null) ? arg : "null") + "\n");
break;
//
case '?':
break; // getopt() already printed an error
//
default:
System.out.print("getopt() returned " + c + "\n");
}
}