Read/write large object from postgres using pqxx - c++

The main pqxx API works with columns as text. So how to access binary data from large objects (LOB) using the pqxx library?

There are a couple of ways. The first way, is to translate the data to/from bytea, and work through the common pqxx api. If you know how to work with bytea, probably this is your way. Here is example how to insert a string as lob, plain sql, no c++ code:
select lo_from_bytea(0, 'this is a test'::bytea);
...
select encode(lo_get(190850), 'escape'); -- here 190850 is the oid for lob created by the first line.
The other option is to use iostream API provided by pqxx library. There is no much examples of how to use it, so here we go:
// write lob
auto conn = std::make_shared<pqxx::connection>(url);
auto tran = std::make_shared<pqxx::work>(*conn);
auto stream = std::make_shared<pqxx::olostream>(*tran, oid);
stream->write(data, size);
stream->flush();
stream.reset();
tran->commit();
// read lob
stream = std::make_shared<pqxx::ilostream>(*tran, oid);
...
sszie_t get_chunk(shard_ptr<> stream, char *buf, size_t max_len)
{
while (!stream->eof() && len < max_len && stream->get(buf[len])) {
len++;
}
return (len > 0 || !stream->eof()) ? len : -1;
}
Note: there is a bug in pqxx::ilostream, you can get truncated data if 0xff byte in the data will hit the inner buffer boundary, it will mistakenly considered as EOF character. The bug was fixed in the february 2020, and for now this fix didn't get to all distributions.

Related

QTcpSocket and TCP padding

Good day.
I am sending a custom protocol for logging via TCP which looks like this:
Timestamp (uint32_t -> 4 bytes)
Length of message (uint8_t -> 1 byte)
Message (char -> Length of message)
The Timestamp is converted to BigEndian for the transport and everything goes out correctly, except for one little detail: Padding
The Timestamp is sent on its own, however instead of just sending the timestamp (4 bytes) my application (using BSD sockets under Ubuntu) automatically appends two bytes of padding to the message.
Wireshark recognizes this correctly and marks the two extraneous bytes as padding, however the QTcpSocket (Qt 5.8, mingw 5.3.0) apparently assumes that the two extra bytes are actually payload, which obviously messes up my protocol.
Is there any way for me to 'teach' QTcpSocket to ignore the padding (like it should) or any way to get rid of the padding?
I'd like to avoid to do the whole 'create a sufficiently large buffer and preassemble the entire packet in it so it will be sent out in one go'-method if possible.
Thank you very much.
Because it was asked, the code used to send the data is:
return
C->sendInt(entry.TS) &&
C->send(&entry.LogLen, 1) &&
C->send(&entry.LogMsg, entry.LogLen);
where sendInt is declared as (Src being the parameter):
Src = htonl(Src);
return send(&Src, 4);
where 'send' is declared as (Source and Len being the parameters):
char *Src = (char *)Source;
while(Len) {
int BCount = ::send(Sock, Src, Len, 0);
if(BCount < 1) return false;
Src += BCount;
Len -= BCount;
}
return true;
::send is the standard BSD send function.
Reading is done via QTcpSocket:
uint32_t timestamp;
if (Sock.read((char *)&timestamp, sizeof(timestamp)) > 0)
{
uint8_t logLen;
char message[256];
if (Sock.read((char *)&logLen, sizeof(logLen)) > 0 &&
logLen > 0 &&
Sock.read(message, logLen) == logLen
) addToLog(qFromBigEndian(timestamp), message);
}
Sock is the QTcpSocket instance, already connected to the host and addToLog is the processing function.
Also to be noted, the sending side needs to run on an embedded system, using QTcpServer is therefor not an option.
Your read logic appears to be incorrect. You have...
uint32_t timestamp;
if (Sock.read((char *)&timestamp, sizeof(timestamp)) > 0)
{
uint8_t logLen;
char message[256];
if (Sock.read((char *)&logLen, sizeof(logLen)) > 0 &&
logLen > 0 &&
Sock.read(message, logLen) == logLen
) addToLog(qFromBigEndian(timestamp), message);
}
From the documentation for QTcpSocket::read(data, MaxSize) it...
Reads at most maxSize bytes from the device into data, and returns the
number of bytes read
What if one of your calls to Sock.read reads partial data? You essentially discard that data rather than buffering it for reuse next time.
Assuming you have a suitably scoped QByteArray...
QByteArray data;
your reading logic should be more along the lines of...
/*
* Append all available data to `data'.
*/
data.append(Sock.readAll());
/*
* Now repeatedly read/trim messages from data until
* we have no further complete messages.
*/
while (contains_complete_log_message(data)) {
auto message = read_message_from_data(data);
data = data.right(data.size() - message.size());
}
/*
* At this point `data' may be non-empty but doesn't
* contain enough data for a complete message.
*/
If the length of the padding is always fixed then just add socket->read(2); to ignore the 2 bytes.
On the other hand it might be just the tip of the iceberg. What are you using to read and write?
You should not invoke send three times but only once. For conversion into BigEndian you might use the Qt functions and write everything into a single buffer and only call send once. It is not what you want, but I assume it is what you'll need to do and it should be easy, as you already know the size of you message. You also will not need to leave the Qt world for sending the messages.

what the schema of reading big xml data using "Memory Mapped Files"?

i have a big xml file( osm map data file to parse). the initial code to process is like this:
FILE* file = fopen(fileName.c_str(), "r");
size_t BUF_SIZE = 10 * 1024 * 1024;
char* buf = new char[BUF_SIZE];
string contents;
while (!feof(file))
{
int ret = fread(buf, BUF_SIZE, 1, file);
assert(ret != -1);
contents.append(buf);
}
size_t pos = 0;
while (true)
{
pos = contents.find('<', pos);
if (pos == string::npos) break;
// Case: found new node.
if (contents.substr(pos, 5) == "<node")
{
do something;
}
// Case: found new way.
else if (contents.substr(pos, 4) == "<way")
{
do something;
}
}
then here people tell me i should use memory mapping file to process those "big data file",
detail is here:
how to read to huge file into buffer,
i mean when it is a fixed size and not very large, may i could load one time into memory and append the content to a string object, then i could apply find(), method and other string method to extract the node content of a xml file. ( the code in the beginning of my question use this method and i test that will produce right result). Then if the file is very large, how apply those methods (not using xml library such as libxml)?
in one word, for small xml file, i could load the whole content to a std::string and apply the find(), substr() operation and got wanted information in the xml file. when the xml file is very large, when i need use the memory mapping file to cope with. then could append the whole content to a std::string, how could i parse the file not using exsit xml library?
hope i have clearly express my question.
If you're using std::string members to get the data you need, you're almost certainly not parsing the XML in the traditional sense of parsing XML. (That is, you're very probably not making any use of XML's hierarchical structure. Although you are extracting data from XML, "parsing XML" means something much more specific to most people.)
That said, the C equivalents of the std::string members you seem to be OK with, such as memcmp and the GNU extension memmem, just take pointers and lengths. Read their documentation and use them in place of their std:;string-member equivalents.

C++ XOR encryption

After reading several white papers on cryptography and runtime PE crypters, I decided to write my own. It's very simple and only for educational purposes.
Here is the GitHub repo: https://github.com/Jyang772/XOR_Crypter
I have two questions.
First, why do I have to keep changing my file permissions to start
every outputted .exe (File created by Builder.exe not the compiler)? It creates a file that is Shared. I have to right click it and select share with Nobody. Does this have something to do with the File Access and Security Rights? I am using CreateFile() and Readfile to read and write the input and output files.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
Second, I can't seem to get XOR encryption to work. It seems pretty
straight forward for what I have done. The byte sizes are the same. While I was investigating, I had the Builder and the Stub each output a file with the file data unencrypted. They are the same. Then I tried with the data encrypted. There is no doubt the data is encrypted with the cipher, however it shows up blank when it is decrypted by the stub later on. I'm confused.
Here is my XOR implementation:
fs = byte size
Rsize = byte size
Should be the same.
Builder:
char cipher[] ="penguin";
for (int i = 0; i < fs; i++)
{
FB[i] ^= cipher[i % strlen(cipher)]; // Simple Xor chiper
}
Stub:
char cipher[] = "penguin";
for (int i = 0; i < Rsize; i++)
{
RData[i] ^= cipher[i % strlen(cipher)];
}
If I were to comment out the encryption function in the Builder and Stub, the crypted file runs fine. Uhh, except with the permissions error.
I'm also trying to include a options menu where the user can select the encryption method used. Perhaps I might have done something wrong there? The Builder.exe adds one byte containing the user's choice to the end of FB buffer. Stub.exe reads that and determines which encryption method is used to decrypt the data.
First off, with XOR "encryption", your "encrypt" and "decrypt" functions should be the same:
void xor_crypt(const char *key, int key_len, char *data, int data_len)
{
for (int i = 0; i < data_len; i++)
data[i] ^= key[ i % key_len ];
}
You should be able to use this same function in both the "XOR Crypter" program as well as your "Stub" program.
It's not a very C++ style; ordinarily you'd use std::string or std::vector. For example:
void xor_crypt(const std::string &key, std::vector<char>& data)
{
for (size_t i = 0; i != data.size(); i++)
data[i] ^= key[ i % key.size() ];
}
Then in the program that calls this, you'd declare:
std::string key = "penguin";
and you'd read your file in like so:
std::vector<char> file_data; // With your current program, make this a global.
fs = GetFileSize(efile, NULL);
file_data.resize(fs); // set vector length equal to file size
// Note: Replace &( file_data[0] ) with file_data.data() if you have C++11 support
ReadFile(efile, (LPVOID)( &( file_data[0] )), fs, &bt, NULL);
if (fs != bt)
// error reading file: report it here.
Then you would simply encrypt with xor_crypt( key, file_data );. To write the XOR-crypted data to your resource, I believe you'd call your existing function with:
// replace &( file_data[0] ) with file_data.data() if C++11
WriteToResources(output, 1, (BYTE *)&( file_data[0] ), file_data.size() );
I suspect the real issue is with the Windows APIs you're using. Does LoadResource give you mutable data, or are you required to copy it? I don't know the Windows API, but I wouldn't be surprised if LoadResource gives you a read-only copy.
If you do need to make your own copy in order to modify the resource, then in your "Stub" program recovering the XOR-crypted resource should look something like this:
std::vector<char> RData;
void Resource(int id)
{
size_t Rsize;
HRSRC hResource = FindResource(NULL, MAKEINTRESOURCE(1), RT_RCDATA);
HGLOBAL temp = LoadResource(NULL, hResource);
Rsize = SizeofResource(NULL, hResource);
RData.resize(RSize);
memcpy( (void*)&(RData[0]), temp, RSize ); // replace &RData[0] with RData.data() if C++11
}
and the decryption in your "Stub" should just be xor_crypt( key, RData );.
I have one last thought. The biggest bug I see in your "Stub" program is this line:
switch (RData[strlen(RData)-1])
Once you've XOR-crypted your data, some of the bytes will become zero. The strlen() function will not return the index of the last byte in your RData as a result. And, there's a different, more subtle error: This returns the last byte of the string, not the last byte of the resource. I can't really see how this line was ever correct; rather, I suspect your program was working when encryption was disabled in spite of itself, by falling through to the default of the switch-case.
If you really intend to distinguish between different types of data based on the last byte of the resource payload, then you really should just use the size returned by the Windows API to find that byte.
If you switch to using vector<char> as I suggest above, then you can find that with RData.back(). Otherwise, if you continue using char *, then that byte would be RData[RSize - 1].
Depending on your content data, you write char option in allocated memory pointed by FB or after it (buffer overrun) in "C++ Builder/main.cpp" when calling strcat(FB, choice).
Fix: allocate enough space in FB for data + option char. As you are dealing with binary data you should not use string functions (ex: strcat).
FB = new char[fs + 1];
memcpy(FB +fs, option, 1); // copy the option at end

How to extract values from boost::python::list to C++

I am creating .so files in linux so that i can import them in python scripts and start using them. I need to pass data from python to c++ layer so that i can use them. I am not able to extract the values despite referring to many posts. I have given the reference code below.
u8 => unsigned char
#include "cp2p_layer.h"
#include <boost/python.hpp>
using namespace boost::python;
BOOST_PYTHON_MODULE(cp2p_hal)
{
class_<SCSICommandsB>("SCSICommandsB")
.def("Write10", &SCSICommandsB::Write10)
;
}
The following code is from cp2p_layer.cpp. I can get the length of the list but the data is always black
u16 SCSICommandsB::Write10 (u8 lun, u8 config, u32 LBA, u16 transferLen, u8 control, u8 groupNo, boost::python::list pythonList)
{
u16 listLen;
u8* pDataBuf = new u8[transferLen];
listLen = boost::python::len(pythonList);
if( listLen != transferLen)
{
cout<<"\nwarning: The write10 cdb has transfer length "<<transferLen<<"that doesnt match with data buffer size "<<listLen<<"\n";
}
for(int i = 0; i < listLen; i++)
{
pDataBuf[i] = boost::python::extract<u8>( (pythonList)[i] );
cout<<boost::python::extract<u8>( (pythonList)[i] )<<"-";
//cout<<pDataBuf[i]<<".";
}
cout<<"\n";
cout<<"info: inside write10 boost len:"<<listLen<<"\n";
return oScsi.Write10 (lun, config, LBA, transferLen, control, groupNo, pDataBuf);
}
When i execute the python script as
#!/usr/bin/python
import cp2p_hal
scsiCmds = cp2p_hal.SCSICommandsB()
plist = [0,1,2,3,4,5,6,7,8,9]
print len(plist)
scsiCmds.Write10(0,0,0,10,0,0,plist)
The output comes as
10
-------- -
info: inside write10 boost len:10
Any help is much appreciated. I also have questions regarding how to read the data from the c++ layer once we have executed the read command. I will create a new post once i get this done. Thanks in advance.
The problem is only in your printing of the values. A u8 in C++ is an unsigned char, and cout will output the corresponding ASCII character. Your characters (0-9) are unprintable, except for ASCII 9, which happens to be a tab. Which explains the space before the final hyphen in your output.
How to fix it? Cast to an int before outputting:
cout << static_cast<int>(boost::python::extract<u8>(pythonList[i])) << "-";

How to convert a number to a bytearray in bit endian order

I am trying to uncompress some data created in VB6 using the zlib API.
I have read this is possible with the qUncompress function:
http://doc.trolltech.com/4.4/qbytearray.html#qUncompress
I have read the data in from QDataStream via readRawBytes into a char
array, which I then converted to a QByteArray for decompression. I
have the compressed length and the expected decompressed length but am not getting
anything back from qUncompress.
However I need to prepend the expected decompressed length in big endian format. Has anybody done this and have an example?
I haven't used VB6 in ages, so I hope this is approximately correct. I think that vb6 used () for array indexing. If I got anything wrong, please let me know.
Looking at the qUncompress docs, you should have put your data in your QByteArray starting at byte 5 (I'm going to assume that you left the array index base set to 1 for this example).
Let's say the array is named qArr, and the expected uncompressed size is Size.
In a "big-endian" representation, the first byte is at the first address.
qArr(1) = int(Size/(256*256*256))
qArr(2) = 255 And int(Size/256*256)
qArr(3) = 255 And int(Size/256)
qArr(4) = 255 And int(Size)
Does that make sense?
If you needed little endian, you could just reverse the order of the indexes (qArr(4) - qArr(1)) and leave the calculations the same.
This is how I can convert arbitary data from one format to another.
Private Type LongByte
H1 As Byte
H2 As Byte
L1 As Byte
L2 As Byte
End Type
Private Type LongType
L As Long
End Type
Function SwapEndian(ByVal LongData as Long) as Long
Dim TempL As LongType
Dim TempLB As LongByte
Dim TempVar As Long
TempL.L = LongData
LSet TempLB = TempL
'Swap is a subroutine I wrote to swap two variables
Swap TempLB.H1, TempLB.L2
Swap TempLB.H2, TempLB.L1
LSet TempL = TempLB
TempVar = TempL.L
SwapEndian = TempVar
End Function
If you are dealing with FileIO then you can use the Byte fields of TempLB
The trick is using LSET an obscure command of VB6
If you are using .NET then doing the process is much easier. Here the trick is using a MemoryStream to retrieve and set the individual bytes. Now you could do math for int16/int32/int64. But if you are dealing with with floating point data, using LSET or the MemoryStream is much clearer and easier to debug.
If you are using Framework version 1.1 or beyond then you have the BitConvertor Class which uses arrays of bytes.
Private Structure Int32Byte
Public H1 As Byte
Public H2 As Byte
Public L1 As Byte
Public L2 As Byte
Public Function Convert() As Integer
Dim M As New MemoryStream()
Dim bR As IO.BinaryReader
Dim bW As New IO.BinaryWriter(M)
Swap(H1, L2)
Swap(H2, L1)
bW.Write(H1)
bW.Write(H2)
bW.Write(L1)
bW.Write(L2)
M.Seek(0, SeekOrigin.Begin)
bR = New IO.BinaryReader(M)
Convert = bR.ReadInt32()
End Function
End Structure
It looks like you want a C chunk of code that uncompresses some zlib compressed data.
In that case is it possible for you to actually use zlib and just feed the zlib data to it. The zlib homepage: http://www.zlib.net/.
If I got it wrong, could you be specific what is the language that should be used for uncompressing the data and why zlib would not be a choice?
//int length;
byte[] bigEndianBytes = BitConverter.GetBytes(IPAddress.HostToNetworkOrder(length))
Conversely:
//byte[] bigEndianBytes;
int length = IPAddress.NetworkToHostOrder(BitConverter.ToInt32(bigEndianBytes))
It wasn't clear to me from your question whether you want to prepend the length in VB so that it is suitable for direct use by qUncompress or whether you wanted to use the VB produced data as it is now and prepend the length in C++ before calling qUncompress.
Mike G has posted a VB solution. If you want to do it in C++ then you have two choices, either add the length at the start of the QByteArray or call zlib's uncompress directly. In both cases the Qt source for qCompress and qUncompress (corelib/tools/qbytearray.cpp) are a good reference.
This is how qCompress adds the length (nbytes) to bazip, the compressed data:
bazip[0] = (nbytes & 0xff000000) >> 24;
bazip[1] = (nbytes & 0x00ff0000) >> 16;
bazip[2] = (nbytes & 0x0000ff00) >> 8;
bazip[3] = (nbytes & 0x000000ff);
where bazip is the result QByteArray
Alternatively if you want to call uncompress directly, instead of using the qUncompress wrapper the call it uses is
baunzip.resize(len);
res = ::uncompress((uchar*)baunzip.data(), &len,
(uchar*)data+4, nbytes-4);
where baunzip is a QByteArray. In your case you would drop the +4 and -4 since your data does not have the length prepended to it.
Thank you for all your help, it was useful.
The I got the code working with:
char slideStr[currentCompressedLen];
int slideByteRead = in.readRawData(slideStr, currentCompressedLen);
QByteArray aInCompBytes = QByteArray(slideStr, slideByteRead);
aInCompBytesPlusLen = aInCompBytes;
aInCompBytesPlusLen.prepend(QByteArray::number(currentUnCompressedLen));
aInUnCompBytes.resize(currentUnCompressedLen);
aInUnCompBytes = qUncompress(aInCompBytesPlusLen);