Today I get problems with serialization in MQL4.
I have a method, which I imported from a DLL:
In MQL4:
void insertQuery( int id,
string tableName,
double &values[4],
long ×[3],
int volume
);
In DLL:
__declspec(dllexport) void __stdcall insertQuery( int id,
wchar_t *tableName,
double *values,
long *times,
int volume
);
I tested it with this function calls in MQL4:
string a = "bla";
double arr[4] = { 1.1, 1.3, 0.2, 0.9 };
long A[3] = { 19991208, 19991308, 19992208 };
int volume = 1;
insertQuery( idDB, a, arr, A, volume );
Inside of this method I collect this values to files.
C++ :
stringstream stream;
stream << " '";
for (int i = 0; i < 2; ++i) {
stream << times[i] << "' , '";
}
stream << times[2] << ", ";
for (int i = 0; i < 4; ++i) {
stream << values[i] << ", ";
}
stream << volume;
wstring table(tableName);
query.append("INSERT INTO ");
query.append(table.begin(), table.end());
query.append(" VALUES (");
query.append(stream.str());
query.append(" )");
std::ofstream out("C:\\Users\\alex\\Desktop\\text.txt");
out << query;
out.close();
But in output file I receive this record:
INSERT INTO bla VALUES ( '19991208' , '0' , '19991308, 1.1, 1.3, 0.2, 0.9, 1 )
So my question is : why I lose one long value in array when I receive my record in DLL?
I tried a lot of ways to solve this problem ( I transfered two and three long values, etc ) and always I get a result that I lose second long value at serialization. Why?
The problem is cause because in MQL4, a long is an 8 bytes, while a long in C++ is a 4 bytes.
What you want is a long long in your C++ constructor.
Or you could also pass them as strings, then convert them into the appropriate type within your C++ code.
Well, be carefull, New-MQL4.56789 is not a c-compatible language
The first thing to test is to avoid passing MQL4 string into DLL calling interface, where really a c-lang string is expected.
Since old-MQL4 has been silently re-defined into a still-WIP-creeping syntax New-MQL4,the MQL4 string is not a string, but a struct.
Root-cause [ isolation ]:
Having swallowed the shock about string/struct trouble, if you can, first try to test the MQL4/DLL interactions without passing any string to proof, that all other parameters, passed by value and addressed by-ref, do get their way to the hands of a DLL-function as you expect.
If this works as you wish, proceed to the next step:
How to pass the very data to expected string representation, then?
Let me share a dirty hack I used for passing data where DLL expects string-s
#import "mql4TOOL.dll"
...
int mql4TOOL_msg_init_data ( int &msg[],
uchar &data[],
int size
);
...
#import
...
int tool_msg_init_data ( int &msg[], string data, int size ) { uchar dataChar[]; StringToCharArray( data, dataChar );
return ( mql4TOOL_msg_init_data ( msg, dataChar, size ) );
}
Yes, dirty, but works for years and saved us many tens-of-man*years of re-engineering upon a maintained code-base with heavy dependence on the MQL4/DLL interfacing in massively distributed heterogeneous computing systems.
The last resort:
If all efforts went in vain, go low level, passing a uchar[] as needed, where you assemble some serialised representation in MQL4 and parse that on the opposite end, before processing the intended functionality.
Ugly?
Yes, might look like that,butkeeps you focused on core-functionality and isolates you from any next shift of paradigm if not only strings cease to be strings et al.
Related
I'm working on a project that involves a large JSON file, basically a multidimensional array dumped in JSON form, but the overall size would be larger than the amount of memory I have. If I load it in as a string and then parse the string, that will consume all of the memory.
Are there any methods to limit the memory consumption, such as only retrieving data between specific indices? Could I implement that using solely the Nlohmann json library/the standard libraries?
RapidJSON and others can do it. Here's an example program using RapidJSON's "SAX" (streaming) API: https://github.com/Tencent/rapidjson/blob/master/example/simplereader/simplereader.cpp
This way, you'll get an event (callback) for each element encountered during parsing. The memory consumption of the parsing itself will be quite small.
Could you please specify the context of your question
What programming language you are using (NodeJS, Vanilla JavaScript, Java, React)
What environment your code is running (Monolithic app on a server, AWS Lambda, Serverless)
Computing large JSON files can consume a lot of memory resources on a server, perhaps, make your app to crash.
I have experienced first-hand, that manipulating large JSON files on my local computer with 8 GB of memory RAM is not a problem using a NodeJS script to compute the large JSON files payloads. However, trying to run those large JSON payloads in an application running on a server give me problems too.
I hope this helps.
Using DAW JSON Link, https://github.com/beached/daw_json_link , you can create an iterator pair/range and iterate over the JSON array 1 record at a time. The library also has routines for working with JSONL, which is common in large datasets.
For opening the file, I would use something like mmap/virtual alloc to handle that for us. The examples in the library use this via the daw::filesystem::memory_mapped_file_t type that abstracts the file mapping.
With that, the memory mapped file allows the OS to page the data in/out as needed, and the iterator like interface keeps the memory requirement to that of one array element at a time.
The following demonstrates this, using a simple Record that
struct Point {
int x;
int y;
};
The program to do this looks like
#include <cassert>
#include <daw/daw_memory_mapped_file.h>
#include <daw/json/daw_json_iterator.h>
#include <daw/json/daw_json_link.h>
#include <iostream>
struct Point {
double x;
double y;
};
namespace daw::json {
template<>
struct json_data_contract<Point> {
using type =
json_member_list<json_number<"x">, json_number<"y">>;
};
}
int main( int argc, char** argv ) {
assert( argc >= 1 );
auto json_doc = daw::filesystem::memory_mapped_file_t<char>( argv[1] );
assert( json_doc.size( ) > 2 );
auto json_range = daw::json::json_array_range<Point>( json_doc );
auto sum_x = 0.0;
auto sum_y = 0.0;
auto count = 0ULL;
for( Point p: json_range ) {
sum_x += p.x;
sum_y += p.y;
++count;
}
sum_x /= static_cast<double>( count );
sum_y /= static_cast<double>( count );
std::cout << "Centre Point (" << sum_x << ", " << sum_y << ")\n";
}
https://jsonlink.godbolt.org/z/xoxEd1z6G
Here is my code for a MetaTraderWrapper.dll:
#define MT4_EXPFUNC __declspec(dllexport)
MT4_EXPFUNC void __stdcall PopMessageString(wchar_t *message)
{
auto result = L"Hello world !";
int n = wcslen( result );
wcscpy_s( message, n + 1, result );
}
On the MQL4-Caller side this Script is used:
#property strict
#import "MetaTraderWrapper.dll"
int PopMessageString( string & );
#import
//
void OnStart(){
string message;
if ( StringInit( message, 64 ) ){
PopMessageString( message );
int n = StringLen( message );
MessageBox( message );
}
}
In this way it works, when a message have been properly initialized with a StringInit() function and enough memory was allocated.
What I need to do is, to allocate the message variable not in MQL4 script, but within the DLL.
In a c++ function, should be something like this:
MT4_EXPFUNC void __stdcall PopMessageString(wchar_t *message)
{
auto result = L"Hello world !";
int n = wcslen( result );
// allocate here, but does not work
message = new wchar_t[n + 1]; // <--------- DOES NOT WORK
//
wcscpy_s( message, n + 1, result );
}
What can I do ?
Get acquainted with Wild Worlds of MQL4:Step 1: forget a string to be string ( it is a struct ... since 2014 )
Internal representation of the string type is a structure of 12 bytes long:
#pragma pack(push,1)
struct MqlString
{
int size; // 32-bit integer, contains size of the buffer, allocated for the string.
LPWSTR buffer; // 32-bit address of the buffer, containing the string.
int reserved; // 32-bit integer, reserved.
};
#pragma pack(pop,1)
So,
having headbanged into this one sunny Sunday afternoon, as the platform undergone a LiveUpdate and suddenly all DLL-call-interfaces using a string stopped work, it took a long way to absorb the costs of such "swift" engineering surprise.
You can re-use the found solution:
use an array of bytes - uchar[] and convert appropriately bytes of returned content on MQL4 side into string by service functions StringToCharArray() resp. CharArrayToString()
The DLL-.mqh-header file may also add these tricks and make these conversions "hidden" from MQL4-code:
#import <aDLL-file> // "RAW"-DLL-call-interfaces
...
// Messages:
int DLL_message_init( int &msg[] );
int DLL_message_init_size ( int &msg[], int size );
int DLL_message_init_data ( int &msg[], uchar &data[], int size );
...
#import
// ------------------------------------------------------ // "SOFT"-wrappers
...
int MQL4_message_init_data ( int &msg[], string data, int size ) { uchar dataChar[]; StringToCharArray( data, dataChar );
return ( DLL_message_init_data ( msg, dataChar, size ) );
}
Always be pretty carefull with appropriate deallocations, not to cause memory leaks.
Always be pretty cutious when new LiveUpdate changes the code-base and introduces new compiler + new documentation. Re-read whole documentation, as many life-saving details come into the help-file only after some next update and many details are hidden or reflected indirectly in chapters, that do not promise such information on a first look -- so, become as ready as D'Artagnan or red-scarfed pioneer -- you never know, where the next hit comes from :)
I have a string whose length is 1600 and I know that it contains 200 double. When I print out the string I get the following :Y���Vz'#��y'#��!U�}'#�-...
I would like to convert this string to a vector containing the 200 doubles.
Here is the code I tried (blobString is a string 1600 characters long):
string first_eight = blobString.substr(0, sizeof(double)); // I get the first 8 values of the string which should represent the first double
double double_value1
memcpy(&double_value1, &first_eight, sizeof(double)); // First thing I tried
double* double_value2 = (double*)first_eight.c_str(); // Second thing I tried
cout << double_value1 << endl;
cout << double_value2 << endl;
This outputs the following:
6.95285e-310
0x7ffd9b93e320
--- Edit solution---
The second method works all I had to do was look to where double_value1 was pointing.
cout << *double_value2 << endl;
Here's an example that might get you closer to what you need. Bear in mind that unless the numbers in your blob are in the exact format that your particular C++ compiler expects, this isn't going to work like you expect. In my example I'm building up the buffer of doubles myself.
Let's start with our array of doubles.
double doubles[] = { 0.1, 5.0, 0.7, 8.6 };
Now I'll build an std::string that should look like your blob. Notice that I can't simply initialize a string with a (char *) that points to the base of my list of doubles, as it will stop when it hits the first zero byte!
std::string double_buf_str;
double_buf_str.append((char *)doubles, 4 * sizeof(double));
// A quick sanity check, should be 32
std::cout << "Length of double_buf_str "
<< double_buf_str.length()
<< std::endl;
Now I'll reinterpret the c_str() pointer as a (double *) and iterate through the four doubles.
for (auto i = 0; i < 4; i++) {
std::cout << ((double*)double_buf_str.c_str())[i] << std::endl;
}
Depending on your circumstances you might consider using a std::vector<uint8_t> for your blob, instead of an std::string. C++11 gives you a data() function that would be the equivalent of c_str() here. Turning your blob directly into a vector of doubles would give you something even easier to work with--but to get there you'd potentially have to get dirty with a resize followed by a memcpy directly into the internal array.
I'll give you an example for completeness. Note that this is of course not how you would normally initialize a vector of doubles...I'm imagining that my double_blob is just a pointer to a blob containing a known number of doubles in the correct format.
const int count = 200; // 200 doubles incoming
std::vector<double> double_vec;
double_vec.resize(count);
memcpy(double_vec.data(), double_blob, sizeof(double) * count);
for (double& d : double_vec) {
std::cout << d << std::endl;
}
#Mooning Duck brought up the great point that the result of c_str() is not necessarily aligned to an appropriate boundary--which is another good reason not to use std::string as a general purpose blob (or at least don't interpret the internals until they are copied somewhere that guarantees a valid alignment for the type you are interested in). The impact of trying to read a double from a non-aligned location in memory will vary depending on architecture, giving you a portability concern. In x86-based machines there will only be a performance impact AFAIK as it will read across alignment boundaries and assemble the double correctly (you can test this on a x86 machine by writing then reading back a double from successive locations in a buffer with an increasing 1-byte offset--it'll just work). In other architectures you'll get a fault.
The std::vector<double> solution will not suffer from this issue due to guarantees about the alignment of newed memory built into the standard.
Pursuing a job, I was asked to solve a problem on HackerRank.com, to write a function that accepts a string, counts the characters in it and returns the most common character found. I wrote my solution, got the typos fixed, and it works with my test cases and theirs, except it fails "Test 7". Because its an interview deal, HackerRank doesn't tell me the failure details, just that it failed.
I used far too much time trying to figure out why. I've triple checked for off-by-one errors, wrote the code for 8 bit chars but tried accepting 16 bit values without changing the result. Here's my code. I cannot give the error, just that there is one.
Could it be multi-byte characters?
How can I create a testcase with a 2 byte or 3 byte character?
I put in some display dump code and what comes out is exactly what you'd expect. I have Mac XCode IDE on my desktop, any suggestions are welcome!
/*
* Complete the function below.
*/
char func(string theString) {
// I wonder what I'm doing wrong. 256 doesn't work any better here.
const int CHARS_RECOGED = 65536; // ie 0...65535 - even this isn't good enough to fix test case 7.
unsigned long alphaHisto[CHARS_RECOGED];
for (int count = 0; count < CHARS_RECOGED; count++ ) {
alphaHisto[ count ] = 0;
} // for int count...
cout << "size: " << theString.size() << endl;
for (int count = 0; count < theString.size(); count++) {
// unsigned char uChar = theString.at(count); // .at() better protected than [] - and this works no differently...
unsigned int uChar = std::char_traits<char>::to_int_type(theString.at(count)); // .at() better protected than []
alphaHisto[ uChar ]++;
} // for count...
unsigned char mostCommon = -1;
unsigned long totalMostCommon = 0;
for (int count = 0; count < CHARS_RECOGED; count++ ) {
if (alphaHisto[ count ] > totalMostCommon){
mostCommon = count;
totalMostCommon = alphaHisto[ count ];
} // if alphahisto
} // for int count...
for (int count = 0; count < CHARS_RECOGED; count++ ) {
if (alphaHisto[ count ] > 0){
cout << (char)count << " " << count << " " << alphaHisto[ count ] << endl;
} // if alphaHisto...
} // for int count...
return (char) mostCommon;
}
// Please provide additional test cases:
// Input Return
// thequickbrownfoxjumpsoverthelazydog e
// the quick brown fox jumps over the lazy dog " "
// theQuickbrownFoxjumpsoverthelazydog e
// the Quick BroWn Fox JuMpS OVER THe lazy dog " "
// the_Quick_BroWn_Fox.JuMpS.OVER..THe.LAZY.DOG "."
If the test is anything to take serious, the charset should be specified. Without, it´s probably safe to assume that one byte is one char. Just as side note, to support charsets with multibyte chars, exchanging 256 with 65536 is far from enough, but even without multibyte chars, you could exchange 256 with 1<<CHAR_BITS because a "byte" may have more than 8 bit.
I´m seeing a more important problem with
unsigned int uChar = std::char_traits<char>::to_int_type(theString.at(count));
First, it´s unnecessary complex:
unsigned int uChar = theString.at(count);
should be enough.
Now remember that std::string::at returns a char, and your variable is unsigned int. What char means without explicitely stating if it is signed or unsigned depends on the compiler (ie. if it is signed char or unsigned char). Now, char values between 0 and 127 will be saved without changes in the target variable, but that´s only half of the value range: If char is unsigned, 128-255 will work fine too, but signed chars, ie. between -128 and -1, won´t map to unsigned 128-255 if the target variable is bigger than the char. With a 4 byte integer, you´ll get some huge values which aren´t valid indices for your array => problem. Solution: Use char, not int.
unsigned char uChar = theString.at(count);
Another thing, :
for (int count = 0; count < theString.size(); count++)
theString.size() returns a size_t which may have differend size and/or signedness compared to int, with huge string lengths there could be problems because of that. Accordingly, the char-counting numbers could be size_t too instead of unsigned long...
And the least likely problem source, but if this runs on machines without two-complement,
it´ll probably fail spectacularly (altough I didn´t thought it through in detail)
I have a function I've written to convert from a 64-bit integer to a base 62 string. Originally, I achieved this like so:
char* charset = " 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
int charsetLength = strlen(charset);
std::string integerToKey(unsigned long long input)
{
unsigned long long num = input;
string key = "";
while(num)
{
key += charset[num % charsetLength];
num /= charsetLength;
}
return key;
}
However, this was too slow.
I improved the speed by providing an option to generate a lookup table. The table is about 624 strings in size, and is generated like so:
// Create the integer to key conversion lookup table
int lookupChars;
if(lookupDisabled)
lookupChars = 1;
else
largeLookup ? lookupChars = 4 : lookupChars = 2;
lookupSize = pow(charsetLength, lookupChars);
integerToKeyLookup = new char*[lookupSize];
for(unsigned long i = 0; i < lookupSize; i++)
{
unsigned long num = i;
int j = 0;
integerToKeyLookup[i] = new char[lookupChars];
while(num)
{
integerToKeyLookup[i][j] = charset[num % charsetLength];
num /= charsetLength;
j++;
}
// Null terminate the string
integerToKeyLookup[i][j] = '\0';
}
The actual conversion then looks like this:
std::string integerToKey(unsigned long long input)
{
unsigned long long num = input;
string key = "";
while(num)
{
key += integerToKeyLookup[num % lookupSize];
num /= lookupSize;
}
return key;
}
This improved speed by a large margin, but I still believe it can be improved. Memory usage on a 32-bit system is around 300 MB, and more than 400 MB on a 64-bit system. It seems like I should be able to reduce memory and/or improve speed, but I'm not sure how.
If anyone could help me figure out how this table could be further optimized, I'd greatly appreciate it.
Using some kind of string builder rather than repeated concatenation into 'key' would provide a significant speed boost.
You may want to reserve memory in advance for your string key. This may get you a decent performance gain, as well as a gain in memory utilization. Whenever you call the append operator on std::string, it may double the size of the internal buffer if it has to reallocate. This means each string may be taking up significantly more memory than is necessary to store the characters. You can avoid this by reserving memory for the string in advance.
I agree with Rob Walker - you're concentrating on improving performance in the wrong area. The string is the slowest part.
I timed the code (your original is broken, btw) and your original (when fixed) was 44982140 cycles for 100000 lookups and the following code is about 13113670.
const char* charset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
#define CHARSET_LENGTH 62
// maximum size = 11 chars
void integerToKey(char result[13], unsigned long long input)
{
char* p = result;
while(input > 0)
{
*p++ = charset[input % CHARSET_LENGTH];
input /= CHARSET_LENGTH;
}
// null termination
*p = '\0';
// need to reverse the output
char* o = result;
while(o + 1 < p)
swap(*++o, *--p);
}
This is almost a textbook case of how not to do this. Concatenating strings in a loop is a bad idea, both because appending isn't particularly fast, and because you're constantly allocating memory.
Note: your question states that you're converting to base-62, but the code seems to have 63 symbols. Which are you trying to do?
Given a 64-bit integer, you can calculate that you won't need any more than 11 digits in the result, so using a static 12 character buffer will certainly help improve your speed. On the other hand, it's likely that your C++ library has a long-long equivalent to ultoa, which will be pretty optimal.
Edit: Here's something I whipped up. It allows you to specify any desired base as well:
std::string ullToString(unsigned long long v, int base = 64) {
assert(base < 65);
assert(base > 1);
static const char digits[]="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/";
const int max_length=65;
static char buffer[max_length];
buffer[max_length-1]=0;
char *d = buffer + max_length-1;
do {
d--;
int remainder = v % base;
v /= base;
*d = digits[remainder];
} while(v>0);
return d;
}
This only creates one std::string object, and doesn't move memory around unnecessarily. It currently doesn't zero-pad the output, but it's trivial to change it to do that to however many digits of output you want.
You don't need to copy input into num, because you pass it by value. You can also compute the length of charset in compiletime, there's no need to compute it in runtime every single time you call the function.
But these are very minor performance issues. I think the the most significant help you can gain is by avoiding the string concatenation in the loop. When you construct the key string pass the string constructor the length of your result string so that there is only one allocation for the string. Then in the loop when you concatenate into the string you will not re-allocate.
You can make things even slightly more efficient if you take the target string as a reference parameter or even as two iterators like the standard algorithms do. But that is arguably a step too far.
By the way, what if the value passed in for input is zero? You won't even enter the loop; shouldn't key then be "0"?
I see the value passed in for input can't be negative, but just so we note: the C remainder operator isn't a modulo operator.
Why not just use a base64 library? Is really important that 63 equals '11' and not a longer string?
size_t base64_encode(char* outbuffer, size_t maxoutbuflen, const char* inbuffer, size_t inbuflen);
std::string integerToKey(unsigned long long input) {
char buffer[14];
size_t len = base64_encode(buffer, sizeof buffer, (const char*)&input, sizeof input);
return std::string(buffer, len);
}
Yes, every string will end with an equal size. If you don't want it to, strip off the equal sign. (Just remember to add it back if you need to decode the number.)
Of course, my real question is why are you turning a fixed width 8byte value and not using it directly as your "key" instead of the variable length string value?
Footnote: I'm well aware of the endian issues with this. He didn't say what the key will be used for and so I assume it isn't being used in network communications between machines of disparate endian-ness.
If you could add two more symbols so that it is converting to base-64, your modulus and division operations would turn into a bit mask and shift. Much faster than a division.
If all you need is a short string key, converting to base-64 numbers would speed up things a lot, since div/mod 64 is very cheap (shift/mask).