Splitting a char array by delimeter, then saving the result?

Splitting a char array by delimeter, then saving the result? - c++

I need to be able to parse the following two strings in my program:
cat myfile || sort
more myfile || grep DeKalb
The string is being saved in char buffer[1024]. What I need to end up with is a pointer to a char array for the left side, and a pointer to a char array for the right side so that I can use these to call the following for each side:
int execvp(const char *file, char *const argv[]);
Anyone have any ideas as to how I can get the right arguments for the execvp command if the two strings above are saved in a character buffer char buffer[1024]; ?
I need char *left to hold the first word of the left side, then char *const leftArgv[] to hold both words on the left side. Then I need the same thing for the right. I have been messing around with strtok for like two hours now and I am hitting a wall. Anyone have any ideas?

I recommend you to learn more about regular expressions. And in order to solve your problem painlessly, you could utilize the Boost.Regex library which provides a powerful regular expression engine. The solution would be just several lines of code, but I encourage you to do it yourself - that would be a good exercise. If you still have problems, come back with some results and clearly state where you were stuck.

You could use std::getline(stream, stringToReadInto, delimeter).
I personally use my own function, which has some addition features baked into it, that looks like this:
StringList Seperate(const std::string &str, char divider, SeperationFlags seperationFlags, CharValidatorFunc whitespaceFunc)
{
return Seperate(str, CV_IS(divider), seperationFlags, whitespaceFunc);
}
StringList Seperate(const std::string &str, CharValidatorFunc isDividerFunc, SeperationFlags seperationFlags, CharValidatorFunc whitespaceFunc)
{
bool keepEmptySegments = (seperationFlags & String::KeepEmptySegments);
bool keepWhitespacePadding = (seperationFlags & String::KeepWhitespacePadding);
StringList stringList;
size_t startOfSegment = 0;
for(size_t pos = 0; pos < str.size(); pos++)
{
if(isDividerFunc(str[pos]))
{
//Grab the past segment.
std::string segment = str.substr(startOfSegment, (pos - startOfSegment));
if(!keepWhitespacePadding)
{
segment = String::RemovePadding(segment);
}
if(keepEmptySegments || !segment.empty())
{
stringList.push_back(segment);
}
//If we aren't keeping empty segments, speedily check for multiple seperators in a row.
if(!keepEmptySegments)
{
//Keep looping until we don't find a divider.
do
{
//Increment and mark this as the (potential) beginning of a new segment.
startOfSegment = ++pos;
//Check if we've reached the end of the string.
if(pos >= str.size())
{
break;
}
}
while(isDividerFunc(str[pos]));
}
else
{
//Mark the beginning of a new segment.
startOfSegment = (pos + 1);
}
}
}
//The final segment.
std::string lastSegment = str.substr(startOfSegment, (str.size() - startOfSegment));
if(keepEmptySegments || !lastSegment.empty())
{
stringList.push_back(lastSegment);
}
return stringList;
}
Where 'StringList' is a typedef of std::vector, and CharValidatorFunc is a function pointer (actually, std::function to allow functor and lambda support) for a function taking one char, and returning a bool. it can be used like so:
StringList results = String::Seperate(" Meow meow , Green, \t\t\nblue\n \n, Kitties!", ',' /* delimeter */, DefaultFlags, is_whitespace);
And would return the results:
{"Meow meow", "Green", "blue", "Kitties!"}
Preserving the internal whitespace of 'Meow meow', but removing the spaces and tabs and newlines surrounding the variables, and splitting upon commas.
(CV_IS is a functor object for matching a specific char or a specific collection of chars taken as a string-literal. I also have CV_AND and CV_OR for combining char validator functions)
For a string literal, I'd just toss it into a std::string() and then pass it to the function, unless extreme performance is required. Breaking on delimeters is fairly easy to roll your own - the above function is just customized to my projects' typical usage and requirements, but feel free to modify it and claim it for yourself.

In case this gives anyone else grief, this is how I solved the problem:
//variables for the input and arguments
char *command[2];
char *ptr;
char *LeftArg[3];
char *RightArg[3];
char buf[1024]; //input buffer
//parse left and right of the ||
number = 0;
command[0] = strtok(buf, "||");
//split left and right
while((ptr=strtok(NULL, "||")) != NULL)
{
number++;
command[number]=ptr;
}
//parse the spaces out of the left side
number = 0;
LeftArg[0] = strtok(command[0], " ");
//split the arguments
while((ptr=strtok(NULL, " ")) != NULL)
{
number++;
LeftArg[number]=ptr;
}
//put null at the end of the array
number++;
LeftArg[number] = NULL;
//parse the spaces out of the right side
number = 0;
RightArg[0] = strtok(command[1], " ");
//split the arguments
while((ptr=strtok(NULL, " ")) != NULL)
{
number++;
RightArg[number]=ptr;
}
//put null at the end of the array
number++;
RightArg[number] = NULL;
Now you can use LeftArg and RightArg in the command, after you get the piping right
execvp(LeftArg[0], LeftArg);//execute left side of the command
Then pipe to the right side of the command and do
execvp(RightArg[0], RightArg);//execute right side of command

Related

How to pad char array with empty spaces on left and right hand side of the text

I am fairly new with C++ so for some people the answer to the quesiton I have might seem quite obvious.
What I want to achieve is to create a method which would return the given char array fill with empty spaces before and after it in order to meet certain length. So the effect at the end would be as if the given char array would be in the middle of the other, bigger char array.
Lets say we have a char array with HelloWorld!
I want the method to return me a new char array with the length specified beforehand and the given char array "positioned" in the middle of returning char array.
char ch[] = "HelloWorld";
char ret[20]; // lets say we want to have the resulting char array the length of 20 chars
char ret[20] = " HelloWorld "; // this is the result to be expected as return of the method
In case of odd number of given char array would like for it to be in offset of one space on the left of the middle.
I would also like to avoid any memory consuming strings or any other methods that are not in standard library - keep it as plain as possible.
What would be the best way to tackle this issue? Thanks!

There are mainly two ways of doing this: either using char literals (aka char arrays), like you would do in C language or using built-in std::string type (or similar types), which is the usual choice if you're programming in C++, despite there are exceptions.
I'm providing you one example for each.
First, using arrays, you will need to include cstring header to use built-in string literals manipulation functions. Keep in mind that, as part of the length of it, a char array always terminates with the null terminator character '\0' (ASCII code is 0), therefore for a DIM-dimensioned string you will be able to store your characters in DIM - 1 positions. Here is the code with comments.
constexpr int DIM = 20;
char ch[] = "HelloWorld";
char ret[DIM] = "";
auto len_ch = std::strlen(ch); // length of ch without '\0' char
auto n_blanks = DIM - len_ch - 1; // number of blank chars needed
auto half_n_blanks = n_blanks / 2; // half of that
// fill in from begin and end of ret with blanks
for (auto i = 0u; i < half_n_blanks; i++)
ret[i] = ret[DIM - i - 2] = ' ';
// copy ch content into ret starting from half_n_blanks position
memcpy_s(
ret + half_n_blanks, // start inserting from here
DIM - half_n_blanks, // length from our position to the end of ret
ch, // string we need to copy
len_ch); // length of ch
// if odd, after ch copied chars
// there will be a space left to insert a blank in
if (n_blanks % 2 == 1)
*(ret + half_n_blanks + len_ch) = ' ';
I chose first to insert blank spaces both to the begin and to the end of the string and then to copy the content of ch.
The second approach is far easier (to code and to understand). The max characters size a std::string (defined in header string) can contain is std::npos, which is the max number you can have for the type std::size_t (usually a typedef for unsigned int). Basically, you don't have to worry about a std::string max length.
std::string ch = "HelloWorld", ret;
auto ret_max_length = 20;
auto n_blanks = ret_max_length - ch.size();
// insert blanks at the beginning
ret.append(n_blanks / 2, ' ');
// append ch
ret += ch;
// insert blanks after ch
// if odd, simply add 1 to the number of blanks
ret.append(n_blanks / 2 + n_blanks % 2, ' ');
The approach I took here is different, as you can see.
Notice that, because of '\0', the result of these two methods are NOT the same. If you want to obtain the same behaviour, you may either add 1 to DIM or subtract 1 from ret_max_length.

Assuming that we know the size, s, of the array, ret and knowing that the last character of any char array is '\0', we find the length, l, of the input char array, ch.
int l = 0;
int i;
for(i=0; ch[i]!='\0'; i++){
l++;
}
Then we compute how many spaces we need on either side. If total_space is even, then there are equal spaces on either side. Otherwise, we can choose which side will have the extra space, in this case, the left side.
int total_spaces = size-l-1; // subtract by 1 to adjust for '\0' character
int spaces_right = 0, spaces_left = 0;
if((total_spaces%2) == 0){
spaces_left = total_spaces/2;
spaces_right = total_spaces/2;
}
else{
spaces_left = total_spaces/2;
spaces_right = (total_spaces/2)+1;
}
Then first add the left_spaces, then the input array, ch, and then the right_spaces to ret.
i=0;
while(spaces_left > 0){
ret[i] = ' ';
spaces_left--;
i++;
} // add spaces
ret[i] = '\0';
strcat(ret, ch); // concatenate ch to ret
while(spaces_right){
ret[i] = ' ';
spaces_right--;
i++;
}
ret[i] = '\0';
Make sure to include <cstring> to use strcat().

Combine mysql_real_escape string with sprintf

Maybe I'm just too stupid to search again.
Anyway, here's the situation.
To prevent SQL-Injections, I need to use mysql_real_escape_string, however this function is awfully clunky and requires a 'lot' of extra code. I'd like to keep the function under the wraps of essentially the sprintf-function.
The idea: Whenever sprintf encounters a %s, it would run mysql_real_escape_string on the corresponding va_arg and then add it to the target string.
Example:
doQuery("SELECT * FROM `table` WHERE name LIKE '%%s%%';", input);
Assuming input is a string like Tom's diner, the complete query would look like:
SELECT * FROM `table` WHERE name LIKE '%Tom\'s diner%';
I've found a fairly elegant way to achieve what I want, however there's a security risk connected with it and I'm wondering if there isn't a better way.
Here's what I'm trying:
void doQuery(const char *Format, ...) {
char sQuery[1024], tQuery[1024], *pQuery = sQuery, *pTemp = tQuery;
va_list val;
strcpy(sQuery, Format);
while((pQuery = strchr(pQuery, '\'')) != NULL) *pQuery = 1;
va_start(val, Format);
vsprintf(tQuery, sQuery, val);
va_end(val);
pQuery = sQuery;
do {
if(*pTemp == 1) {
char *pSearch = strchr(pTemp, 1);
if(!pSearch) return; //Error, missing second placeholder
else {
*pQuery++ = '\'';
mysql_real_escape_string(sql, pQuery, pTemp, pSearch - pTemp);
pQuery += strlen(pQuery);
*pQuery++ = '\'';
pTemp = pSearch;
}
} else *pQuery++ = *pTemp;
} while(*pTemp++);
//Execute query, return result, etc.
}
This function was written from memory, I'm not 100 % sure about it's correctness, but I think you get the idea. Anyway, the obvious security risk rests with the placeholder 1. If an attacker got the idea of putting said 1 (numeric value, not the character '1') into the input string, he'd automatically have an attack point, i.e. a non-escaped apostrophe.
Now, does anyone have any idea, how I could fix this problem and still get the behavior I want, preferably without allocating an extra buffer for each and every string I want to send to the database? I'd also like to avoid overriding the entire sprintf-function, if somehow possible.
Thank you very much.

After pondering the problem for a little longer, I believe to have found a rather simple answer, which will serve the purpose well.
I simply need to count the occurrences of the apostrophes I'm replacing with the placeholder and then, while parsing the formatted string, count backwards. If I find the placeholder more often than I counted while the first pass, I'll know that one of the arguments contains an illegal character and is therefor invalid and shall not be passed to the database.
Edit:
WAY late, but I think now (as I stumbled over the SAME problem once more) I found a good way. Clunky, but workable.
bool SQL::vQuery(const char *Format, va_list val) {
bool Ret = true, bExpanded = false;
if(strchr(Format, '%') != NULL) { //Is there any expanding to be done here?
int32_t ReqLen = vsnprintf(NULL, 0, Format, val) + 1; //Determine the required buffer length.
if(ReqLen < 2) Ret = false; //Lengthquery successful?
else {
char *Exp = new char[ReqLen]; //Evaluation requires a sufficiently large buffer.
bExpanded = true; //Tell the footer of this function to free the query buffer.
vsprintf(Exp, Format, val); //Expand the string into the first buffer.
if(strchr(Format, '\'') == NULL) Format = Exp; //No apostrophes found in the format(!) string? No escaping necessary.
else if(strchr(Exp, 1)) Ret = false; //Illegal character detected. Abort.
else {
char *pExp = Exp,
*Query = new char[ReqLen * 2], //Reserves (far more than) enough space for escaping.
*pQuery = Query;
strcpy(Query, Format); //Copy the format string to the (modifiable) Query buffer.
while((pQuery = strchr(pQuery, '\'')) != NULL)
*pQuery = 1; //Replace the character with the control character.
vsprintf(Exp, Query, val); //Expand the whole thing AGAIN, this time with the substitutions.
pQuery = Query; //And rewind the pointer.
while(char *pEnd = strchr(pExp, 1)) { //Look for the text-delimiter.
*pEnd = 0; //Terminate the string at this point.
strcpy(pQuery, pExp); //Copy the unmodified string to the final buffer.
pQuery += pEnd - pExp; //And advance the pointer to the new end.
pExp = ++pEnd; //Beginning of the 'To be escaped' string.
if((pEnd = strchr(pExp, 1)) != NULL) { //And what about the end?
*pEnd = 0; //Terminate the string at this point.
*pQuery++ = '\'';
pQuery += mysql_real_escape_string(pSQL, pQuery, pExp, pEnd - pExp);
*pQuery++ = '\'';
pExp = ++pEnd; //End of the 'To be escaped' string.
} else Ret = false; //Malformed query string.
}
strcpy(pQuery, pExp); //No more '? Just copy the rest.
Format = Query; //And please use the Query-Buffer instead of the raw Format.
delete[] Exp; //Get rid of the expansion buffer.
}
}
}
if(Ret) {
if(result) mysql_free_result(result); //Gibt ein ggf. bereits vorhandenes Ergebnis wieder frei.
Ret = mysql_query(pSQL, Format, result);
columns = (result) ? mysql_num_fields(result) : 0;
row = NULL;
}
if(bExpanded) delete[] Format; //Query completed -> Dispose of buffer.
return Ret;
}
What this monstrosity does is the following steps:
Determine, if there's any formatting to be done at all. (Not much use wasting cycles on expanding and stuff, if there aren't any arguments in it)
Determine the required length (including 0) and error checking.
Reserve sufficient room for one full expansion, set the marker and expand.
Check, if there are any 'strings to be escaped' expected (marked by '')
Check, if the control character 1 is somewhere in the expansion. If so, you'll know this is an attempted attack and can deny it right there.
Replace every occurrence of the apostrophe with the control character of your choice. (The one, which must not be passed as parameter)
Expand the query again with the altered format string.
Look for the control characters as delimiters for the string and copy the whole thing into the final buffer, which is then passed to mysql
Free the dynamic memories as they expire.
I think I'm finally satisfied with this solution...and it only took me 1.5 years to figure it out. :D
I hope this helps someone else as well.

std string vs char performance, best technique to delete parts from the beginning

well i have to process a large chunk of text, analysing it linear from begin to end. And i wonder what is the better approach for this: using char* or std::string.
while using char* i can alter the pointer to a position further in the string eg.
//EDIT later: mallocing some space for text
char str[] = "text to analyse";
char * orig = str;
//process
str += processed_chars; //quite fast
//process again
// later: free(orig);
but using string i might have to use std::string::erase - but it create a copy, or move bytes or something (i don't know the actual implementation)
string str = "text to analyse";
//process
str = str.erase(0,processed_chars);
or is there a way to alter the std::string's hidden pointer?
EDIT: as Sylvain Defresne requested here more code:
class tag {
public:
tag(char ** pch) {
*pch = strstr(*pch,"<");
if(pch == NULL) return;
char *orig = *pch+1;
*pch = strstr(*pch,">");
if(pch == NULL) return;
*pch+=sizeof(char); //moving behind the >
//process inner tag data
if(*(*pch-2)!='/'){// not selfclose
while (!(**pch == '<' && *(*pch+1) == '/')){ //sarch for closing tag
tag* kid = new tag(pch);
sublings.push_back(*kid);
}
*pch = strstr(*pch,">");
if(pch == NULL) return;
*pch+=sizeof(char); //moving behind the >
//add check if the clothing tag is matching
}
}
}
i use it for recursive xml-like notation parsing
char str[] ="<father><kid /></fatherr>";
char * pch = str;
tag *root = new tag(&pch);
this code is ugly as hell, i am just starting with low-level pointer arithmetic and stuff, used visual components till now so don't judge too hard

With std::string, you would probably use std::string::iterator. Your code would be:
std::string str = "text to analyse";
std::string::iterator iter = str.begin();
// process
iter += processed_chars;

Anything you can do with a char*, you can do with an std::string::iterator.

You can use std::string::iterator (look here).
std::string is not neccassary in such task (but such classes as std::string are very useful in other situations).

Find a char or substring with specifying start pos

I couldn't find a function which would let me specify the start pos for beginning a char or substring search.
I have, for example:
char *c = "S1S2*S3*S4";
I'd like search for 'S3' by searching the first '*' asterisk and then the second asterisk following it and finally getting the substring 'S3' enclosed by those asterisks.

The string class has a large find family of functions that take an index as a second argument. Repeated applications of find('*', index) should get you what you need.
std::string s(c);
std::string::size_type star1 = s.find('*');
std::string::size_type star2 = s.find('*', star1 + 1);
std::string last_part = s.substr(star2 + 1);

One solution would be to find the location of the first asterisk, then the location of the second asterisk. Then use those positions as the start and end locations to search for S3.

Use
char *strchr( const char *str, int ch );
See here for reference

#include <string>
std::string between_asterisks( const std::string& s ) {
std::string::size_type ast1 = s.find('*');
if (ast1 == std::string::npos) {
throw some_exception();
}
std::string::size_type sub_start = ast1+1;
std::string::size_type ast2 = s.find('*', sub_start);
if (ast2 == std::string::npos) {
throw some_exception();
}
return s.substr(sub_start, ast2-sub_start);
}

You can use strchr(). Simply save the returned pointer and pass it to the next call. As this pointer points to the occurence of your search, the search will start from there.

well one possibility - if you are to use c-style char* arrays for strings - is to use strchr to search for the occurrences of the asterisks, e.g., (and with NO error checking, mind)
char c []= "S1S2*S3*S4";
char* first = strchr(c,'*');
if (first) {
char* start = ++first;
char* nextast = strchr(start,'*');
char* s3str = new char[nextast-start+1];
strncpy(s3str,start,nextast-start);
s3str[next-start] = '\0';
}
But it would be easier to use the C++ string class to do this.

Splitting the string at Enter key

I'm getting the text from editbox and I'd want to get each name separated by enter key like the character string below with NULL characters.
char *names = "Name1\0Name2\0Name3\0Name4\0Name5";
while(*names)
{
names += strlen(names)+1;
}
how would you do the same for enter key (i.e separated by /r/n) ? can you do that without using the std::string class?

Use strstr:
while (*names)
{
char *next = strstr(names, "\r\n");
if (next != NULL)
{
// If you want to use the key, the length is
size_t len = next - names;
// do something with a string here. The string is not 0 terminated
// so you need to use only 'len' bytes. How you do this depends on
// your need.
// Have names point to the first character after the \r\n
names = next + 2;
}
else
{
// do something with name here. This version is 0 terminated
// so it's easy to use
// Have names point to the terminating \0
names += strlen(names);
}
}
One thing to note is that this code also fixes an error in your code. Your string is terminated by a single \0, so the last iteration will have names point to the first byte after your string. To fix your existing code, you need to change the value of names to:
// The algorithm needs two \0's at the end (one so the final
// strlen will work and the second so that the while loop will
// terminate). Add one explicitly and allow the compiler to
// add a second one.
char *names = "Name1\0Name2\0Name3\0Name4\0Name5\0";

If you want to start and finish with a C string, it's not really C++.
This is a job for strsep.
#include <stdlib.h>
void split_string( char *multiline ) {
do strsep( &multiline, "\r\n" );
while ( multiline );
}
Each call to strsep zeroes out either a \r or a \n. Since only the string \r\n appears, every other call will return an argument. If you wanted, you could build an array of char*s by recording multiline as it advances or the return value of strsep.
void split_string( char *multiline ) {
vector< char* > args;
do {
char *arg = strsep( &multiline, "\r\n" );
if ( * arg != 0 ) {
args.push_back( arg );
}
} while ( multiline );
}
This second example is at least not specific to Windows.

Here's a pure pointer solution
char * names = "name1\r\nName2\r\nName3";
char * plast = names;
while (*names)
{
if (names[0] == '\r' && names[1] == '\n')
{
if (plast < names)
{
size_t cch = names - plast;
// plast points to a name of length cch, not null terminated.
// to extract the name use
// strncpy(pout, plast, cch);
// pout[cch] = '\0';
}
plast = names+2;
}
++names;
}
// plast now points to the start of the last name, it is null terminated.
// extract it with
// strcpy(pout, plast);

Since this has the C++ tag, the easiest would probably using the C++ standard library, especially strings and string streams. Why do you want to avoid std::string when you're doing C++?
std::istringstream iss(names);
std::string line;
while( std::getline(iss,line) )
process(line); // do process(line.c_str()) instead if you need to

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Splitting a char array by delimeter, then saving the result? - c++

Related

How to pad char array with empty spaces on left and right hand side of the text

Combine mysql_real_escape string with sprintf

std string vs char performance, best technique to delete parts from the beginning

Find a char or substring with specifying start pos

Splitting the string at Enter key

Categories

Resources