I'm trying to translate the c++ code and i can't work out what "char linebuf[1000]" is, can some kind sole translate this to python or explain what linebuf is. Thanks! :) Taken from http://www.scintilla.org/ScintillaUsage.html
if (ch == '\r' || ch == '\n') {
char linebuf[1000];
int curLine = GetCurrentLineNumber();
int lineLength = SendEditor(SCI_LINELENGTH, curLine);
//Platform::DebugPrintf("[CR] %d len = %d\n", curLine, lineLength);
if (curLine > 0 && lineLength <= 2) {
int prevLineLength = SendEditor(SCI_LINELENGTH, curLine - 1);
if (prevLineLength < sizeof(linebuf)) {
WORD buflen = sizeof(linebuf);
memcpy(linebuf, &buflen, sizeof(buflen));
SendEditor(EM_GETLINE, curLine - 1,
reinterpret_cast<LPARAM>(static_cast<char *>(linebuf)));
linebuf[prevLineLength] = '\0';
for (int pos = 0; linebuf[pos]; pos++) {
if (linebuf[pos] != ' ' && linebuf[pos] != '\t')
linebuf[pos] = '\0';
}
SendEditor(EM_REPLACESEL, 0, reinterpret_cast<LPARAM>(static_cast<char *>(linebuf)));
}
}
It is a buffer for a line of input text, of type char[1000], i.e. an array of 1000 char elements (which are actually bytes, because C++ is based upon C, which in turn predates the whole idea of character encodings).
If we really wanted a literal translation of the algorithm, the closest fit in Python is probably something like array.array('B', [0]*1000). However, this initializes the Python array, whereas the C++ array is uninitialized - there is really no way to skip that initialization in C++; it just reserves space without paying any attention to what's already in that chunk of memory.
Related
I was asked this question in tech test.
They asked how to change ' ' to '_' in string.
I think they didn't want common answer. like this (I can assure this)
void replaceChar(char originalStr[], size_t strLength, char originalChar, char newChar
{
for(size_t i = 0 ; i < strLength ; i++)
{
if(originalStr[i] == originalChar)
{
originalStr[i] = newChar ;
}
}
}
So I answered like this. Use WORD. ( Actually I didn't write code, They want just explaining how to do)
I think comparing Each 8 byte(64bit OS) of string with mask 8 byte.
if They eqaul, replace 8byte in a time.
When Cpu read data with size less than WORD , Cpu should do operation clearing rest bits.
It's slow. So I tried to use WORD in comparing chars.
void replaceChar(char originalStr[], size_t strLength, char originalChar, char newChar //
{
size_t mask = 0;
size_t replaced = 0;
for(size_t i = 0 ; i < sizeof(size_t) ; i++)
{
mask |= originalChar << i;
replaced |= newChar << i;
}
for(size_t i = 0 ; i < strLength ; i++)
{
// if 8 byte data equal with 8 byte data filled with originalChar
// replace 8 byte data with 8 byte data filled with newChar
if(i % sizeof(size_t) == 0 &&
strLength - i > sizeof(size_t) &&
*(size_t*)(originalStr + i) == mask)
{
*(size_t*)(originalStr + i) = replaced;
i += sizeof(size_t);
continue;
}
if(originalStr[i] == originalChar)
{
originalStr[i] = newChar ;
}
}
}
Is There any faster way??
Do not try to optimize a code when you do not know what is the bottleneck of the code. Try to write a clear readable code.
This function declaration and definition
void replaceChar(char originalStr[], size_t strLength, char originalChar, char newChar
{
for(size_t i = 0 ; i < strLength ; i++)
{
if(originalStr[i] == originalChar)
{
originalStr[i] = newChar ;
}
}
}
does not make a sense because it duplicates the behavior of the standard algorithm std::replace.
Moreover for such a simple basic general-purpose function you are using too long identifier names.
If you need to write a similar function specially for C-strings then it can look for example the following way as it is shown in the demonstrative program below
#include <iostream>
#include <cstring>
char * replaceChar( char s[], char from, char to )
{
for ( char *p = s; ( p = strchr( p, from ) ) != nullptr; ++p )
{
*p = to;
}
return s;
}
int main()
{
char s[] = "Hello C strings!";
std::cout << replaceChar( s, ' ', '_' ) << '\n';
return 0;
}
The program output is
Hello_C_strings!
As for your second function then it is unreadable. Using the continue statement in a body of for loop makes it difficult to follow its logic.
As a character array is not necessary aligned by the value of size_t then the function is not as fast as you think.
If you need a very optimized function then you should write it directly in assembler.
The first thing in the road to being fast is being correct. The problem with the original proposal is that sizeof(s) should be a cached value of strlen(s). Then the obvious problem is that this approach scans the string twice -- first to find the terminating character and then the character to be replaced.
This should be addressed by a data structure with known length, or data structure, with enough guaranteed excess data so that multiple bytes can be processed at once without Undefined Behaviour.
Once this is solved (the OP has been edited to fix this) the problem with the proposed approach of scanning 8 bytes worth of data for ALL the bytes being the same is that a generic case does have 8 successive characters, but maybe only 7. In all those cases one would need to scan the same area twice (on top of scanning the string terminating character).
If the string length is not known, the best thing is to use a low level method:
while (*ptr != 0) {
if (*ptr == search_char) {
*ptr = replace_char;
}
++ptr;
}
If the string length is known, it's best to use a library method std::replace, or it's low level counterpart
for (auto i = 0; i < size; ++i) {
if (str[i] == search_char) {
str[i] = replace_char;
}
}
Any decent compiler is able to autovectorize this, although the compiler might generate a larger variety of kernels than intended (one kernel for small sizes, one for intermediate and one to process in chunks of 32 or 64 bytes).
recently I implemented a custom function for trimming std::strings that removes whitespace character prefixes and suffixes.
I tested the functionality and it works according to my unit tests, but when run the tests using valgrind, I get the following output:
==4486== Conditional jump or move depends on uninitialised value(s)
==4486== at 0x415DDA: is_ws_char(char) (parse.cpp:22)
==4486== by 0x415BC6: parse::trim(std::string&) (parse.cpp:34)
My input test string was
string s(" a");
I do not see what is the problem here.
The code looks like this:
inline bool is_ws_char(const char c) { // this is line 22 in my code
return (c == ' ' || c == '\n' || c == '\t' || c == '\r');
}
void parse::trim(std::string& str) {
size_t len = str.size();
size_t i = 0;
for (; i < len; ++i)
if (!is_ws_char(str[i]))
break;
const size_t start = i;
for (i = len - 1; i >= 0; --i)
if (!is_ws_char(str[i])) // this is line 34 in my code
break;
const size_t end = i;
str = str.substr(start, end - start + 1);
}
Does anybody has an idea what is the problem here?
I briefly thought that is's just a valgrind oddity, but that seems to be rather unlikely.
Thanks in advance!
This loop is invalid
for (i = len - 1; i >= 0; --i)
The condition will be always equal to true because expression --i will be always >= 0 due to the fact that i is unsigned integer.
Also when str.size() is equal to zero then len - 1 will be equal to std::string::npos.
I have been implementing a factory for a component based game engine recently. I am deserializing objects by reading in from a file what component they need and what to initialize them with. It works except for when I try to read in a property longer than 15 characters. At 15 characters, it reads it in perfectly, anything longer and I get "ε■ε■ε■ε■ε■ε■ε■ε■ε" as output.
I am using std::string to store these lines of text.
Example:
JunkComponent2 test "1234567890123456" test2 "123456789012345"
With this the value of test becomes garbage, while test2 stays perfectly intact.
Any idea's what might be going on?
char line[1024];
while (file.getline(line, 1024))
{
std::vector<std::string> words;
std::string word;
int j = 0;
for (unsigned i = 0; line[i] != '\0' && i < 1024; ++i)
{
if (line[i] == ' ' && j > 0 && line[i - 1] != '\\')
{
words.push_back(word);
j = 0;
word = "";
}
else
{
++j;
word += line[i];
}
}
words.push_back(word);
// std::cout << (*Parts)["JunkComponent"]->GetName() << std::endl;
Component* c = (*Parts)[words[0]]->clone(words);
object->AddComponent(words[0], c);
for (std::list<Member*>::iterator it = members.begin(); it != members.end(); ++it)
{
for (unsigned i = 0; i < words.size(); ++i)
{
if ((*it)->GetName() == words[i])
{
if (words[i + 1][0] == '\"')
{
std::vector<char> chars;
chars.push_back('\"');
chars.push_back('\\');
for (unsigned int n = 0; n < chars.size(); ++n)\
{
words[i + 1].erase(std::remove(words[i + 1].begin(), words[i + 1].end(), chars[n]), words[i + 1].end());
}
Container((*it)->GetMeta(), GET_MEMBER(data.GetData(), (*it)->GetOffset()), (*it)->GetName()).SetValue<std::string>(words[i + 1]);
}
else
{
Container((*it)->GetMeta(), GET_MEMBER(data.GetData(), (*it)->GetOffset()), (*it)->GetName()).SetValue<int>(std::stoi(words[i + i]));
}
++i;
break;
}
}
}
}
GET_MEMBER Macro expands to:
#define GET_MEMBER(P, OFFSET) ((void *)(((char *)(P)) + (OFFSET)))
SetValue Function: (data is a void*)
template <typename T>
void SetValue(T data_)
{
memcpy(data, &data_, sizeof(T));
}
I'll take a stab having just eyed your code. GET_MEMBER is really nasty and I think that's where your problem is. It seems to rely on std::string being convertible to char*, which it is not. Why does your code work with strings shorter than 15? Well that's more than likely because std::string on most popular implementations actually contains a special case for strings where it keeps an internal buffer of length 16 ( last element \0 ) to avoid dynamic memory allocation. When the string is larger than 15 this buffer is uninitialized because it isn't used. The correct way to access the string is by using operator[].
The string input format is like this
str1 str2
I DONT know the no. of characters to be inputted beforehand so need to store 2 strings and get their length.
Using the C-style strings ,tried to made use of the scanf library function but was actually unsuccessful in getting the length.This is what I have:
// M W are arrays of char with size 25000
while (T--)
{
memset(M,'0',25000);memset(W,'0',25000);
scanf("%s",M);
scanf("%s",W);
i = 0;m = 0;w = 0;
while (M[i] != '0')
{
++m; ++i; // incrementing till array reaches '0'
}
i = 0;
while (W[i] != '0')
{
++w; ++i;
}
cout << m << w;
}
Not efficient mainly because of the memset calls.
Note:
I'd be better off using std::string but then because of 25000 length input and memory constraints of cin I switched to this.If there is an efficient way to get a string then it'd be good
Aside from the answers already given, I think your code is slightly wrong:
memset(M,'0',25000);memset(W,'0',25000);
Do you really mean to fill the string with the character zero (value 48 or 0x30 [assuming ASCII before some pedant downvotes my answer and points out that there are other encodings]), or with a NUL (character of the value zero). The latter is 0, not '0'
scanf("%s",M);
scanf("%s",W);
i = 0;m = 0;w = 0;
while (M[i] != '0')
{
++m; ++i; // incrementing till array reaches '0'
}
If you are looking for the end of the string, you should be using 0, not '0' (as per above).
Of course, scanf will put a 0 a the end of the string for you, so there's no need to fill the whole string with 0 [or '0'].
And strlen is an existing function that will give the length of a C style string, and will most likely have a more clever algorithm than just checking each character and increment two variables, making it faster [for long strings at least].
You do not need memset when using scanf, scanf adds the terminating '\0' to string.
Also, strlen is more simple way to determine string's length:
scanf("%s %s", M, W); // provided that M and W contain enough space to store the string
m = strlen(M); // don't forget #include <string.h>
w = strlen(W);
C-style strlen without memset may looks like this:
#include <iostream>
using namespace std;
unsigned strlen(const char *str) {
const char *p = str;
unsigned len = 0;
while (*p != '\0') {
len++;
*p++;
}
return len;
}
int main() {
cout << strlen("C-style string");
return 0;
}
It's return 14.
So I have a program that makes char* stuff lowercase. It does it by iterating through and manipulating the ascii. Now I know there's probably some library for this in c++, but that's not the point - I'm a student trying to get a grasp on char*s and stuff :).
Here's my code:
#include <iostream>
using namespace std;
char* tolower(char* src);
int main (int argc, char * const argv[])
{
char* hello = "Hello, World!\n";
cout << tolower(hello);
return 0;
}
char* tolower(char* src)
{
int ascii;
for (int n = 0; n <= strlen(src); n++)
{
ascii = int(src[n]);
if (ascii >= 65 && ascii <= 90)
{
src[n] = char(ascii+32);
}
}
return src;
}
( this is not for an assignment ;) )
It builds fine, but when I run it it I get a "The Debugger has exited due to signal 10" and Xcode points me to the line: "src[n] = char(ascii+32);"
Thanks!
Mark
Yowsers!
Your "Hello World!" string is what is called a string literal, this means its memory is part of the program and cannot be written to.
You are performing what is called an "in-place" transform, e.g. instead of writing out the lowercase version to a new buffer you are writing to the original destination. Because the destination is a literal and cannot be written to you are getting a crash.
Try this;
char hello[32];
strcpy(hello, "Hello, World!\n");
Also in your for loop, you should use <, not <=. strlen returns the length of a string minus its null terminator, and array indices are zero-based.
As Andrew noted "Hello World\n" in code is a read-only literal. You can either use strcpy to make a modifiable copy, or else try this:
char hello[] = "Hello, World!\n";
This automatically allocates an array on the stack big enough to hold a copy of the literal string and a trailing '\0', and copies the literal into the array.
Also, you can just leave ascii as a char, and use character literals instead of having to know what the numeric value of 'A' is:
char ascii;
for (int n = 0; n < strlen(src); n++)
{
ascii = src[n];
if (ascii >= 'A' && ascii <= 'Z')
{
src[n] = ascii - 'A' + 'a';
}
}
While you're at it, why bother with ascii at all, just use src[n]:
for (int n = 0; n < strlen(src); n++)
{
if (src[n] >= 'A' && src[n] <= 'Z')
{
src[n] -= 'A' - 'a';
}
}
And then, you can take advantage of the fact that in order to determine the length of a c-string, you have to iterate though it anyway, and just combine both together:
for (char *n = src; *n != 0; n++)
if (*n >= 'A' && *n <= 'Z')
*n -= 'A' - 'a';