Substring of char* to std::string - c++

I have an array of chars and I need to extract subsets of this array and store them in std::strings. I am trying to split the array into lines, based on finding the \n character. What is the best way to approach this?
int size = 4096;
char* buffer = new char[size];
// ...Array gets filled
std::string line;
// Find the chars up to the next newline, and store them in "line"
ProcessLine(line);
Probably need some kind of interface like this:
std::string line = GetSubstring(char* src, int begin, int end);

I'd create the std::string as the first step, as splitting the result will be far easier.
int size = 4096;
char* buffer = new char[size];
// ... Array gets filled
// make sure it's null-terminated
std::string lines(buffer);
// Tokenize on '\n' and process individually
std::istringstream split(lines);
for (std::string line; std::getline(split, line, '\n'); ) {
ProcessLine(line);
}

You can use the std::string(const char *s, size_t n) constructor to build a std::string from the substring of a C string. The pointer you pass in can be to the middle of the C string; it doesn't need to be to the very first character.
If you need more than that, please update your question to detail exactly where your stumbling block is.

I didn't realize you only wanted to process each line one at a time, but just in case you need all the lines at once, you can also do this:
std::vector<std::string> lines;
char *s = buffer;
char *head = s;
while (*s) {
if (*s == '\n') { // Line break found
*s = '\0'; // Change it to a null character
lines.push_back(head); // Add this line to our vector
head = ++s;
} else s++; //
}
lines.push_back(head); // Add the last line
std::vector<std::string>::iterator it;
for (it = lines.begin(); it != lines.end(); it++) {
// You can process each line here if you want
ProcessLine(*it);
}
// Or you can process all the lines in a separate function:
ProcessLines(lines);
// Cleanup
lines.erase(lines.begin(), lines.end());
I've modified the buffer in place, and the vector.push_back() method generates std::string objects from each of the resulting C substrings automatically.

your best bet (best meaning easiest) is using strtok and convert the tokens to std::string via the constructor. (just note that pure strtok is not reentrant, for that you need to use the non standard strtok_r).
void ProcessTextBlock(char* str)
{
std::vector<std::string> v;
char* tok = strtok(str,"\n");
while(tok != NULL)
{
ProcessLine(std::string(tok));
tok = strtok(tok,"\n");
}
}

You can turn a substring of char* to std::string with a std::string's constructor:
template< class InputIterator >
basic_string( InputIterator first, InputIterator last, const Allocator& alloc = Allocator() );
Just do something like:
char *cstr = "abcd";
std::string str(cstr + 1, cstr + 3);
In that case str would be "bc".

Related

Init and use C++ std::string as array of char

I want to use a std::string for dynamic string handling. The data is append and append to a string, sometime I wan't to set the value for the character at index i. I don't know how many character will be added to string. Something like dynamic collection in .NET.
When I allocate a std::string in C++
std::string s;
and try to set element at index i for it:
s[0] = 'a';
It will through an error related to memory.
A stupid way is init it with exist data and replace them later:
std::string s = generate1000chars();
s[2] = 'c';
Is there a way to init a string that allow manipulate character at index i, like a char array?
You can use std::string::resize() to resize it and fill it in later:
std::string s;
s.resize(1000);
//later..
s[2] = 'c';
Maybe you can try a wrapper function like
std::string& SetChar(std::string& str, char ch, size_t index)
{
if (str.length() <= index)
{
str.resize(index + 1);
}
str[index] = ch;
return str;
}
So that the string can auto extend if needed.
(Modified to a better signature)

C++; Putting characters into a C-Style string

I've been reading a book for self study (http://www.amazon.com/gp/product/0321992784) and I'm on chapter 17 doing the exercises. One of them I solved, but I'm not satisfied and would like some help. Thank you in advanced.
The Exercise: Write a program that reads characters from cin into an array that you allocate on the free store. Read indvidual characters until an exclamation mark(!) is entered. Do not use std::string. Do not worry about memory exhaustion.
What I did:
char* append(const char* str, char ch); // Add a character to the string and return a duplicate
char* loadCstr(); // Read characters from cin into an array of characters
int main()
{
char* str{ loadCstr() };
std::cout << str << '\n';
return 0;
}
I made 2 functions, 1 to create a new string with a size 1 larger than the old and add a character at the end.
char* append(const char* str, char ch)
/*
Create a new string with a size 1 greater than the old
insert old string into new
add character into new string
*/
{
char* newstr{ nullptr };
int i{ 0 };
if (str)
newstr = new char [ sizeof(str) + 2 ];
else
newstr = new char [ 2 ];
if(str)
while (str [ i ] != '\0')
newstr [ i ] = str [ i++ ]; // Put character into new string, then increment the index
newstr [ i++ ] = ch; // Add character and increment the index
newstr [ i ] = '\0'; // Trailing 0
return newstr;
}
This is the function for the exercise using the append function I created, It works, but from what I understand each time I call append, there is a memory leak because I create a new character array and didn't delete the old.
char* loadCstr()
/*
get a character from cin, append it to str until !
*/
{
char* str{ nullptr };
for (char ch; std::cin >> ch && ch != '!';)
str = append(str, ch);
return str;
}
I tried adding another pointer to hold the old array and delete it after making a new one, but after about 6 calls in this loop I get a runtime error that I think tells me I'm deleting something I shouldn't? which is where I got confused.
This is the old one that doesn't work beyond 6 characters:
char* loadCstr()
/*
get a character from cin, append it to str until !
*/
{
char* str{ nullptr };
for (char ch; std::cin >> ch && ch != '!';) {
char* temp{ append(str, ch) };
if (str)
delete str;
str = temp;
}
return str;
}
So I want to know how I can fix this function so there are no memory leaks. Thank you again. (Also please note, I do know these functions already exist and using std::string handles all the free store stuff for me, I just want to understand it and this is a learning exercise.)
You have to use standard C function std::strlen instead of the sizeof operator because in case of your function the sizeof operator returns the size of pointer instead of the length of the string.
Also you need to delete already allocated array.
The function can look the following way
char* append(const char* str, char ch)
/*
Create a new string with a size 1 greater than the old
insert old string into new
add character into new string
*/
{
size_t n = 0;
if ( str ) n = std::strlen( str );
char *newstr = new char[ n + 2 ];
for ( size_t i = 0; i < n; i++ ) newstr[i] = str[i];
delete [] str;
newstr[n] = ch;
newstr[n+1] = '\0';
return newstr;
}
And in the function loadCstr it can be called like
str = append( str, ch );
Also instead of the loop to copy the string you could use standard algorithm std::copy
Is the point to learn about memory management, or about how string operations work internally?
For the second (learning about string operations), you should use std::unique_ptr<char[]> which will automatically free the attached array when the pointer dies. You'll still need to calculate string length, copy between strings, append -- all the things you are doing now. But std::unique_ptr<char[]> will handle the deallocation.
For the first case, you're better off writing an RAII class (custom version of std::unique_ptr<T>) and learning how to free memory in a destructor, than scattering delete [] statements all over your code. Writing delete [] everywhere is actually a bad habit, learning it will move your ability to program C++ backwards.

How to extract words out of a string and store them in different array in c++

How to split a string and store the words in a separate array without using strtok or istringstream and find the greatest word?? I am only a beginner so I should accomplish this using basic functions in string.h like strlen, strcpy etc. only. Is it possible to do so?? I've tried to do this and I am posting what I have done. Please correct my mistakes.
#include<iostream.h>
#include<stdio.h>
#include<string.h>
void count(char n[])
{
char a[50], b[50];
for(int i=0; n[i]!= '\0'; i++)
{
static int j=0;
for(j=0;n[j]!=' ';j++)
{
a[j]=n[j];
}
static int x=0;
if(strlen(a)>x)
{
strcpy(b,a);
x=strlen(a);
}
}
cout<<"Greatest word is:"<<b;
}
int main( int, char** )
{
char n[100];
gets(n);
count(n);
}
The code in your example looks like it's written in C. Functions like strlen and strcpy originates in C (although they are also part of the C++ standard library for compatibility via the header cstring).
You should start learning C++ using the Standard Library and things will get much easier. Things like splitting strings and finding the greatest element can be done using a few lines of code if you use the functions in the standard library, e.g:
// The text
std::string text = "foo bar foobar";
// Wrap text in stream.
std::istringstream iss{text};
// Read tokens from stream into vector (split at whitespace).
std::vector<std::string> words{std::istream_iterator<std::string>{iss}, std::istream_iterator<std::string>{}};
// Get the greatest word.
auto greatestWord = *std::max_element(std::begin(words), std::end(words), [] (const std::string& lhs, const std::string& rhs) { return lhs.size() < rhs.size(); });
Edit:
If you really want to dig down in the nitty-gritty parts using only functions from std::string, here's how you can do to split the text into words (I leave finding the greatest word to you, which shouldn't be too hard):
// Use vector to store words.
std::vector<std::string> words;
std::string text = "foo bar foobar";
std::string::size_type beg = 0, end;
do {
end = text.find(' ', beg);
if (end == std::string::npos) {
end = text.size();
}
words.emplace_back(text.substr(beg, end - beg));
beg = end + 1;
} while (beg < text.size());
I would write two functions. The first one skips blank characters for example
const char * SkipSpaces( const char *p )
{
while ( *p == ' ' || *p == '\t' ) ++p;
return ( p );
}
And the second one copies non blank characters
const char * CopyWord( char *s1, const char *s2 )
{
while ( *s2 != ' ' && *s2 != '\t' && *s2 != '\0' ) *s1++ = *s2++;
*s1 = '\0';
return ( s2 );
}
try to get a word in a small array(obviously no word is >35 characters) you can get the word by checking two successive spaces and then put that array in strlen() function and then check if the previous word was larger then drop that word else keep the new word
after all this do not forget to initialize the word array with '\0' or null character after every word catch or this would happen:-
let's say 1st word in that array was 'happen' and 2nd 'to' if you don't initialize then your array will be after 1st catch :
happen
and 2nd catch :
*to*ppen
Try this. Here ctr will be the number of elements in the array(or vector) of individual words of the sentence. You can split the sentence from whatever letter you want by changing function call in main.
#include<iostream>
#include<string>
#include<vector>
using namespace std;
void split(string s, char ch){
vector <string> vec;
string tempStr;
int ctr{};
int index{s.length()};
for(int i{}; i<=index; i++){
tempStr += s[i];
if(s[i]==ch || s[i]=='\0'){
vec.push_back(tempStr);
ctr++;
tempStr="";
continue;
}
}
for(string S: vec)
cout<<S<<endl;
}
int main(){
string s;
getline(cin, s);
split(s, ' ');
return 0;
}

How can I tokenize a c-style string from a file without making a copy?

Let's say I have a constant c-style string say
const char* msg = "fred,jim,345,7665";
I'd like to tokenize this and read out the individual fields but for performance reasons I don't want to make a copy. How can I do this?
Obviously strtok takes a non-constant pointer and boost::tokenizer is an option but I am unsure what is doing behind the scenes.
Inevitably you will require some copy of the string, even if it is a substring being copied.
If you have a strtok_r function, you can use that, but it will still require a mutable string to do its work. Beware, however, as not all systems provide the function (e.g. Windows), which is why I've provided an implementation here. It works by requiring an additional parameter: a pointer to a C string to save the address of the next match. This allows for it to be more reentrant (thread-safe) in theory. However, you'll still be mutating the value. You could modify it to suit your needs if you like, perhaps copying N bytes into a destination buffer and null-terminating that buffer to avoid the need to modify the source string.
/*
Usage:
char *tok;
char *savep;
tok = mystrtok_r (somestr, ",", &savep);
while (NULL != tok)
{
/* Do something with `tok'. */
tok = mystrtok_r (NULL, ",", &savep);
}
*/
char *
mystrtok_r (char *str, const char *delims, char **nextp)
{
if (str == NULL)
str = *nextp;
str += strspn (str, delims);
*nextp = str + strcspn (str, delims);
**nextp = 0;
if (*str == 0)
return NULL;
++*nextp;
return str;
}
It depends on how you're going to use it.
If you want to get the next token, and then the next (like an iteration over the string, then you only really need to copy the current token into memory.
long strtok2( char *strDest, const char *strSrc, const char cTok, long lOffset, long lMax)
{
if(lMax > 0)
{
strSrc += lOffset;
char * start = strDest;
while(--lMax && *strSrc != cTok && (*strDest++ = * strSrc++) );
*strDest = 0; //for when the token was found, not the null.
return strDest - start - 1; //the length of the token
}
return 0;
}
I snagged a simple strcpy from http://vijayinterviewquestions.blogspot.com.au/2007/07/implement-strcpy-function.html
const char* msg = "fred,jim,345,7665";
char * buffer[20];
long offset = 0;
while(length = strtok2(buffer, msg, ',', offset, 20))
{
cout << buffer;
offset += (length+1);
}
Well, without a little more detail it's hard to know exactly what you want. I'll guess you are parsing delimited items where consecutive delimiters should be treated as zero length tokens (which is usually correct for comma separated elements). I'm also assuming a blank line counts as a single zero length token. This is how I'd approach it:
const char *token_begin = msg;
int length;
for(;;)
{
length = 0;
while(!isDelimiter(token_begin[length])) //< must include \0 as delimiter
++length;
//..do something here with token. token is at: token_begin[0..length)
if ( token_begin[length] != 0 )
token_begin = &token_begin[length+1]; //skip beyond non-null delimiter
else
break; //token null terminated. exit
}
If you are going to store the tokens somewhere then a copy will be necessary in any case and strtok does this nicely by using the string a placing null terminating character inside it.
The only other option I see to avoid copying it is a lexer which reads the string and through a state machine produces tokens by scanning the string and storing the partial results in a buffer but every token should in any case be stored at least in a null terminated string to you are not really saving anything.
Here is my proposal, my code is structured and use a global variable pos(I know global variable are a bad practice but is only to give you the idea), you can replace it with a data member if you need OOP.
int position, messageLength;
char token[MAX]; // MAX = Value greater than the maximum length
// of the tokens(e.g. 1,000);
bool hasNext()
{
return position < messageLength;
}
char* next(const char* message)
{
int i = 0;
while (position < messageLength && message[position] != ',') {
token[i++] = message[position];
position++;
}
position++; // ',' found
token[i] = '\0';
return token;
}
int main(int argc, char **argv)
{
const char* msg = "fred,jim,345,7665";
position = 0;
messageLength = strlen(msg);
while (hasNext())
cout << next(msg) << endl;
return EXIT_SUCCESS;
}

Malformed output when converting string to char* in C++

I've got a function that splits up a string into various sections and then parses them, but when converting a string to char* I get a malformed output.
int parseJob(char * buffer)
{ // Parse raw data, should return individual jobs
const char* p;
int rows = 0;
for (p = strtok( buffer, "~" ); p; p = strtok( NULL, "~" )) {
string jobR(p);
char* job = &jobR[0];
parseJobParameters(job); // At this point, the data is still in good condition
}
return (1);
}
int parseJobParameters(char * buffer)
{ // Parse raw data, should return individual job parameters
const char* p;
int rows = 0;
for (p = strtok( buffer, "|" ); p; p = strtok( NULL, "|" )) { cout<<p; } // At this point, the data is malformed.
return (1);
}
I don't know what happens between the first function calling the second one, but it malforms the data.
As you can see from the code example given, the same method to convert string to char* is used and it works fine.
I'm using Visual Studio 2012/C++, any guidance and code examples will be greatly appreciated.
The "physical" reason your code does not work has nothing to do with std::string or C++. It wouldn't work in pure C as well. strtok is a function that stores its intermediate parsing state in some global variable. This immediately means that you cannot use strtok to parse more than one string at a time. Starting the second parse session before finishing the first would override the internal data stored by the first parse session, thus ruining it beyond repair. In other words, strtok parse sessions must not overlap. In your code they do overlap.
Also, in C++03 the idea of using std::string with strtok directly is doomed from the start. The internal sequence stored in std::string is not guaranteed to be null-terminated. This means that generally &jobR[0] is not a C-string. It can't be used with strtok. To convert a std::string to a C-string you have to use c_str(). But C-string returned by c_str() is non-modifiable.
In C++11 the null-termination is supposed to be visible through the [] operator, but still there seems to be no requirement to store the terminator object contiguously with the actual string, so &jobR[0] is still not a C-string even in C++11. C-string returned by c_str() or data() is non-modifiable.
You cannot use strtok() to parse multiple strings at the same time, like you are doing. The first call to parseJobParameters() in the first loop iteration of parseJob() will alter the internal buffer that strtok() points to, thus the second loop iteration of parseJob() will not be processing the original data anymore. You need to rewrite your code to not use nested calls to strtok() anymore, eg:
#include <vector>
#include <string>
void split(std::string s, const char *delims, std::vector &vec)
{
// alternatively, use s.find_first_of() and s.substr() instead...
for (const char* p = strtok(s.c_str(), delims); p != NULL; p = strtok(NULL, delims))
{
vec.push_back(p);
}
}
int parseJob(char * buffer)
{
std::vector<std::string> jobs;
split(buffer, "~", jobs);
for (std::vector<std::string>::iterator i = jobs.begin(); i != jobs.end(); ++i)
{
parseJobParameters(i->c_str());
}
return (1);
}
int parseJobParameters(char * buffer)
{
std::vector<std::string> params;
split(buffer, "|", params);
for (std::vector<std::string>::iterator i = params.begin(); i != params.end(); ++i)
{
std::cout << *i;
}
return (1);
}
Whilst this will give you the address of the first character in the string char* job = &jobR[0];, it does not give you a valid C-style string. YOu SHOULD use char* job = jobR.c_str();
I'm fairly sure that will solve your problem, but there could of course be something wrong with the way you read the buffer that is passed to parseJob in as well.
Edit: of course, you are also calling strtok from a function that uses strtok. Inside strtok looks a bit like this:
char *strtok(char *str, char *separators)
{
static char *last;
char *found = NULL;
if (!str) str = last;
... do searching for needle, set found to beginning of non-separators ...
if (found)
{
*str = 0; // mark end of string.
}
last = str;
return found;
}
Since "last" gets overwritten when you call parseParameters, you can't use strtok(NULL, ... ) when you get back to parseJobs