C/C++ HDF5 Read string attribute - c++

A colleague of mine used labview to write an ASCII string as an attribute in an HDF5 file. I can see that the attribute exist, and read it, but I can't print it.
The attribute is, as shown in HDF Viewer:
Date = 2015\07\09
So "Date" is its name.
I'm trying to read the attribute with this code
hsize_t sz = H5Aget_storage_size(dateAttribHandler);
std::cout<<sz<<std::endl; //prints 16
hid_t atype = H5Aget_type(dateAttribHandler);
std::cout<<atype<<std::endl; //prints 50331867
std::cout<<H5Aread(dateAttribHandler,atype,(void*)date)<<std::endl; //prints 0
std::cout<<date<<std::endl; //prints messy characters!
//even with an std::string
std::string s(date);
std::cout<<s<<std::endl; //also prints a mess
Why is this happening? How can I get this string as a const char* or std::string?
I tried also using the type atype = H5Tcopy (H5T_C_S1);, and that didn't work too...
EDIT:
Here I provide a full, self-contained program as it was requested:
#include <string>
#include <iostream>
#include <fstream>
#include <hdf5/serial/hdf5.h>
#include <hdf5/serial/hdf5_hl.h>
std::size_t GetFileSize(const std::string &filename)
{
std::ifstream file(filename.c_str(), std::ios::binary | std::ios::ate);
return file.tellg();
}
int ReadBinFileToString(const std::string &filename, std::string &data)
{
std::fstream fileObject(filename.c_str(),std::ios::in | std::ios::binary);
if(!fileObject.good())
{
return 1;
}
size_t filesize = GetFileSize(filename);
data.resize(filesize);
fileObject.read(&data.front(),filesize);
fileObject.close();
return 0;
}
int main(int argc, char *argv[])
{
std::string filename("../Example.hdf5");
std::string fileData;
std::cout<<"Success read file into memory: "<<
ReadBinFileToString(filename.c_str(),fileData)<<std::endl;
hid_t handle;
hid_t magFieldsDSHandle;
hid_t dateAttribHandler;
htri_t dateAtribExists;
handle = H5LTopen_file_image((void*)fileData.c_str(),fileData.size(),H5LT_FILE_IMAGE_DONT_COPY | H5LT_FILE_IMAGE_DONT_RELEASE);
magFieldsDSHandle = H5Dopen(handle,"MagneticFields",H5P_DEFAULT);
dateAtribExists = H5Aexists(magFieldsDSHandle,"Date");
if(dateAtribExists)
{
dateAttribHandler = H5Aopen(magFieldsDSHandle,"Date",H5P_DEFAULT);
}
std::cout<<"Reading file done."<<std::endl;
std::cout<<"Open handler: "<<handle<<std::endl;
std::cout<<"DS handler: "<<magFieldsDSHandle<<std::endl;
std::cout<<"Attributes exists: "<<dateAtribExists<<std::endl;
hsize_t sz = H5Aget_storage_size(dateAttribHandler);
std::cout<<sz<<std::endl;
char* date = new char[sz+1];
std::cout<<"mem bef: "<<date<<std::endl;
hid_t atype = H5Aget_type(dateAttribHandler);
std::cout<<atype<<std::endl;
std::cout<<H5Aread(dateAttribHandler,atype,(void*)date)<<std::endl;
fprintf(stderr, "Attribute string read was '%s'\n", date);
date[sz] = '\0';
std::string s(date);
std::cout<<"mem aft: "<<date<<std::endl;
std::cout<<s<<std::endl;
H5Dclose(magFieldsDSHandle);
H5Fclose(handle);
return 0;
}
Printed output of this program:
Success read file into memory: 0
Reading file done.
Open handler: 16777216
DS handler: 83886080
Attributes exists: 1
16
mem bef:
50331867
0
Attribute string read was '�P7'
mem aft: �P7
�P7
Press <RETURN> to close this window...
Thanks.

It turned out that H5Aread has to be called with a reference of the char pointer... so pointer of a pointer:
H5Aread(dateAttribHandler,atype,&date);
Keep in mind that one doesn't have to reserve memory for that. The library will reserve memory, and then you can free it with H5free_memory(date).
This worked fine.
EDIT:
I learned that this is the case only when the string to be read has variable length. If the string has a fixed length, then one has to manually reserve memory with size length+1 and even manually set the last char to null (to get a null-terminated string. There is a function in the hdf5 library that checks whether a string is fixed in length.

I discovered that if you do not allocate date and pass the &date to H5Aread, then it works. (I use the C++ and python APIs, so I do not know the C api very well.) Specifically change:
char* date = 0;
// std::cout<<"mem bef: "<<date<<std::endl;
std::cout << H5Aread(dateAttribHandler, atype, &date) << std::endl;
And you should see 2015\07\09 printed.
You may want to consider using the C++ API. Using the C++ API, your example becomes:
std::string filename("c:/temp/Example.hdf5");
H5::H5File file(filename, H5F_ACC_RDONLY);
H5::DataSet ds_mag = file.openDataSet("MagneticFields");
if (ds_mag.attrExists("Date"))
{
H5::Attribute attr_date = ds_mag.openAttribute("Date");
H5::StrType stype = attr_date.getStrType();
std::string date_str;
attr_date.read(stype, date_str);
std::cout << "date_str= <" << date_str << ">" << std::endl;
}

As a simpler alternative to existing APIs, your use-case could be solved as follows in C using HDFql:
// declare variable 'value'
char *value;
// register variable 'value' for subsequent use (by HDFql)
hdfql_variable_register(&value);
// read 'Date' (from 'MagneticFields') and populate variable 'value' with it
hdfql_execute("SELECT FROM Example.hdf5 MagneticFields/Date INTO MEMORY 0");
// display value stored in variable 'value'
printf("Date=%s\n", value);
FYI, besides C, the code above can be used in C++, Python, Java, C#, Fortran or R with minimal changes.

Related

base64 encoding removing carriage return from dos header

I have been trying to encode the binary data of an application as base64 (specifically boosts base64), but I have run into an issue where the carriage return after the dos header is not being encoded correctly.
it should look like this:
This program cannot be run in DOS mode.[CR]
[CR][LF]
but instead its outputting like this:
This program cannot be run in DOS mode.[CR][LF]
it seems this first carriage return is being skipped, which then causes the DOS header to be invalid when attempting to run the program.
the code for the base64 algorithm I am using can be found at: https://www.boost.org/doc/libs/1_66_0/boost/beast/core/detail/base64.hpp
Thanks so much!
void load_file(const char* filename, char** file_out, size_t& size_out)
{
FILE* file;
fopen_s(&file, filename, "r");
if (!file)
return false;
fseek(file, 0, SEEK_END);
size = ftell(file);
rewind(file);
*out = new char[size];
fread(*out, size, 1, file);
fclose(file);
}
void some_func()
{
char* file_in;
size_t file_in_size;
load_file("filename.bin", &file_in, file_in_size);
auto encoded_size = base64::encoded_size(file_in_size);
auto file_encoded = new char[encoded_size];
memset(0, file_encoded, encoded_size);
base64::encode(file_encoded, file_in, file_in_size);
std::ofstream orig("orig.bin", std::ios_base::binary);
for (int i = 0; i < file_in_size; i++)
{
auto c = file_in[i];
orig << c; // DOS header contains a NULL as the 3rd char, don't allow it to be null terminated early, may cause ending nulls but does not affect binary files.
}
orig.close();
std::ofstream encoded("encoded.txt"); //pass this output through a base64 to file website.
encoded << file_encoded; // for loop not required, does not contain nulls (besides ending null) will contain trailing encoded nulls.
encoded.close();
auto decoded_size = base64::decoded_size(encoded_size);
auto file_decoded = new char[decoded_size];
memset(0, file_decoded, decoded_size); // again trailing nulls but it doesn't matter for binary file operation. just wasted disk space.
base64::decode(file_decoded, file_encoded, encoded_size);
std::ofstream decoded("decoded.bin", std::ios_base::binary);
for (int i = 0; i < decoded_size; i++)
{
auto c = file_decoded[i];
decoded << c;
}
decoded.close();
free(file_in);
free(file_encoded);
free(file_decoded);
}
The above code will show that the file reading does not remove the carriage return, while the encoding of the file into base64 does.
Okay thanks for adding the code!
I tried it, and indeed there was "strangeness", even after I simplified the code (mostly to make it C++, instead of C).
So what do you do? You look at the documentation for the functions. That seems complicated since, after all, detail::base64 is, by definition, not part of public API, and "undocumented".
However, you can still read the comments at the functions involved, and they are pretty clear:
/** Encode a series of octets as a padded, base64 string.
The resulting string will not be null terminated.
#par Requires
The memory pointed to by `out` points to valid memory
of at least `encoded_size(len)` bytes.
#return The number of characters written to `out`. This
will exclude any null termination.
*/
std::size_t
encode(void* dest, void const* src, std::size_t len)
And
/** Decode a padded base64 string into a series of octets.
#par Requires
The memory pointed to by `out` points to valid memory
of at least `decoded_size(len)` bytes.
#return The number of octets written to `out`, and
the number of characters read from the input string,
expressed as a pair.
*/
std::pair<std::size_t, std::size_t>
decode(void* dest, char const* src, std::size_t len)
Conclusion: What Is Wrong?
Nothing about "dos headers" or "carriage returns". Perhaps maybe something about "rb" in fopen (what's the differences between r and rb in fopen), but why even use that:
template <typename Out> Out load_file(std::string const& filename, Out out) {
std::ifstream ifs(filename, std::ios::binary); // or "rb" on your fopen
ifs.exceptions(std::ios::failbit |
std::ios::badbit); // we prefer exceptions
return std::copy(std::istreambuf_iterator<char>(ifs), {}, out);
}
The real issue is: your code ignored all return values from encode/decode.
The encoded_size and decoded_size values are estimations that will give you enough space to store the result, but you have to correct it to the actual size after performing the encoding/decoding.
Here's my fixed and simplified example. Notice how the md5sums checkout:
Live On Coliru
#include <boost/beast/core/detail/base64.hpp>
#include <fstream>
#include <iostream>
#include <vector>
namespace base64 = boost::beast::detail::base64;
template <typename Out> Out load_file(std::string const& filename, Out out) {
std::ifstream ifs(filename, std::ios::binary); // or "rb" on your fopen
ifs.exceptions(std::ios::failbit |
std::ios::badbit); // we prefer exceptions
return std::copy(std::istreambuf_iterator<char>(ifs), {}, out);
}
int main() {
std::vector<char> input;
load_file("filename.bin", back_inserter(input));
// allocate "enough" space, using an upperbound prediction:
std::string encoded(base64::encoded_size(input.size()), '\0');
// encode returns the **actual** encoded_size:
auto encoded_size = base64::encode(encoded.data(), input.data(), input.size());
encoded.resize(encoded_size); // so adjust the size
std::ofstream("orig.bin", std::ios::binary)
.write(input.data(), input.size());
std::ofstream("encoded.txt") << encoded;
// allocate "enough" space, using an upperbound prediction:
std::vector<char> decoded(base64::decoded_size(encoded_size), 0);
auto [decoded_size, // decode returns the **actual** decoded_size
processed] // (as well as number of encoded bytes processed)
= base64::decode(decoded.data(), encoded.data(), encoded.size());
decoded.resize(decoded_size); // so adjust the size
std::ofstream("decoded.bin", std::ios::binary)
.write(decoded.data(), decoded.size());
}
Prints. When run on "itself" using
g++ -std=c++20 -O2 -Wall -pedantic -pthread main.cpp -o filename.bin && ./filename.bin
md5sum filename.bin orig.bin decoded.bin
base64 -d < encoded.txt | md5sum
It prints
d4c96726eb621374fa1b7f0fa92025bf filename.bin
d4c96726eb621374fa1b7f0fa92025bf orig.bin
d4c96726eb621374fa1b7f0fa92025bf decoded.bin
d4c96726eb621374fa1b7f0fa92025bf -

Is it poosible to open the same file using wfstream and fstream

Actually i have a requirement wherein i need to open the same file using wfstream file instance at one part of the code and open it using fstream instance at the other part of the code. I need to access a file where the username is of type std::wstring and password is of type std::string. how do i get the values of both the variables in the same part of the code?
Like you can see below i need to get the values for username and password from the file and assign it to variables.
type conversion cannot be done. Please do not give that solution.
......file.txt.......
username-amritha
password-rajeevan
the code is written as follows:
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
int main()
{
std::string y;
unsigned int l;
std::wstring username;
std::wstring x=L"username";
std::wstring q;
std::string password;
std::string a="password";
std::cout<<"enter the username:";
std::wcin>>username;
std::cout<<"enter the password:";
std::cin>>password;
std::wfstream fpp("/home/aricent/Documents/testing.txt",std::ios::in | std::ios::out );
std::getline(fpp,q);
if (q.find(x, 0) != std::string::npos) {
std::wstring z=q.substr(q.find(L"-") + 1) ;
std::wcout<<"the username is:"<<z;
fpp.seekg( 0, std::ios::beg );
fpp<<q.replace(x.length()+1, z.length(), username);
}
fpp.close();
std::fstream fp("/home/aricent/Documents/testing.txt",std::ios::in | std::ios::out );
std::getline(fp,y);
if (y.find(a, 0) != std::string::npos)
{
unsigned int len=x.length()+1;
unsigned int leng=username.length();
l=len+leng;
fp.seekg(l+1);
std::string b=y.substr(y.find("-") + 1) ;
fp<<y.replace(a.length()+1, b.length(), password);
}
fp.close();
}
It's not recommended to open multiple streams to a same file simultaneously. On the other hand, if you don't write to the file, but only read (and thus, would be using ifstream and wifstream), that's probably safe.
Alternatively, you can simply open a wfstream, read the username, close the stream, open a fstream and read the password.
If you have the choice, avoid mixed encoding files entirely.
You should not try to open the same file with two descriptors. Even if it worked (read only mode for example), both descriptors would not be synchronised, so you would read first characters on one, and next same characters on second.
So IMHO, you should stick to one single solution. My advice is to use a character stream to process the file, and use a codecvt to convert from a narrow string to a wide wstring when you need it.
An example conversion function could be (ref: cplusplus.com: codecvt::in):
std::wstring wconv(const std::string& str, const std::locale mylocale) {
// define a codecvt facet for the locale
typedef std::codecvt<wchar_t,char,std::mbstate_t> facet_type;
const facet_type& myfacet = std::use_facet<facet_type>(mylocale);
// define a mbstate to use in codecvt::in
std::mbstate_t mystate = std::mbstate_t();
size_t l = str.length();
const char * ix = str.data(), *next; // narrow character pointers
wchar_t *wc, *wnext; // wide character pointers
// use a wide char array of same length than the narrow char array to convert
wc = new wchar_t[str.length() + 1];
// conversion call
facet_type::result result = myfacet.in(mystate, ix, ix + l,
next, wc, wc + l, wnext);
// should test for error conditions
*wnext = 0; // ensure the wide char array is properly null terminated
std::wstring wstr(wc); // store it in a wstring
delete[] wc; // destroy the char array
return wstr;
}
This code should test for abnormal conditions, and use try catch to be immune to exceptions but it is left as exercise for the reader :-)
A variant of the above using codecvt::out could be used to convert from wide string to narrow string.
In above code, I would use (assuming nconv is the function using codecvt::out to convert from wide string to narrow string):
...
#include <locale>
...
std::cin>>password;
std::locale mylocale;
std::fstream fp("/home/aricent/Documents/testing.txt",std::ios::in | std::ios::out );
std::getline(fp,y);
q = wconv(y, mylocale);
...
fp<<nconv(q.replace(x.length()+1, z.length(), username));
}
std::getline(fp, y);
...

how to read a particular string from a buffer

i have a buffer
char buffer[size];
which i am using to store the file contents of a stream(suppose pStream here)
HRESULT hr = pStream->Read(buffer, size, &cbRead );
now i have all the contents of this stream in buffer which is of size(suppose size here). now i know that i have two strings
"<!doctortype html" and ".html>"
which are present somewhere (we don't their loctions) inside the stored contents of this buffer and i want to store just the contents of the buffer from the location
"<!doctortype html" to another string ".html>"
in to another buffer2[SizeWeDontKnow] yet.
How to do that ??? (actually contents from these two location are the contents of a html file and i want to store the contents of only html file present in this buffer). any ideas how to do that ??
You can use strnstr function to find the right position in your buffer. After you've found the starting and ending tag, you can extract the text inbetween using strncpy, or use it in place if the performance is an issue.
You can calculate needed size from the positions of the tags and the length of the first tag nLength = nPosEnd - nPosStart - nStartTagLength
Look for HTML parsers for C/C++.
Another way is to have a char pointer from the start of the buffer and then check each char there after. See if it follows your requirement.
If that's the only operation which operates on HTML code in your app, then you could use the solution I provided below (you can also test it online - here). However, if you are going to do some more complicated parsing, then I suggest using some external library.
#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
int main()
{
const char* beforePrefix = "asdfasdfasdfasdf";
const char* prefix = "<!doctortype html";
const char* suffix = ".html>";
const char* postSuffix = "asdasdasd";
unsigned size = 1024;
char buf[size];
sprintf(buf, "%s%sTHE STRING YOU WANT TO GET%s%s", beforePrefix, prefix, suffix, postSuffix);
cout << "Before: " << buf << endl;
const char* firstOccurenceOfPrefixPtr = strstr(buf, prefix);
const char* firstOccurenceOfSuffixPtr = strstr(buf, suffix);
if (firstOccurenceOfPrefixPtr && firstOccurenceOfSuffixPtr)
{
unsigned textLen = (unsigned)(firstOccurenceOfSuffixPtr - firstOccurenceOfPrefixPtr - strlen(prefix));
char newBuf[size];
strncpy(newBuf, firstOccurenceOfPrefixPtr + strlen(prefix), textLen);
newBuf[textLen] = 0;
cout << "After: " << newBuf << endl;
}
return 0;
}
EDIT
I get it now :). You should use strstr to find the first occurence of the prefix then. I edited the code above, and updated the link.
Are you limited to C, or can you use C++?
In the C library reference there are plenty of useful ways of tokenising strings and comparing for matches (string.h):
http://www.cplusplus.com/reference/cstring/
Using C++ I would do the following (using buffer and size variables from your code):
// copy char array to std::string
std::string text(buffer, buffer + size);
// define what we're looking for
std::string begin_text("<!doctortype html");
std::string end_text(".html>");
// find the start and end of the text we need to extract
size_t begin_pos = text.find(begin_text) + begin_text.length();
size_t end_pos = text.find(end_text);
// create a substring from the positions
std::string extract = text.substr(begin_pos,end_pos);
// test that we got the extract
std::cout << extract << std::endl;
If you need C string compatibility you can use:
char* tmp = extract.c_str();

How to serialize to char* using Google Protocol Buffers?

I want to serialize my protocol buffer to a char*. Is this possible? I know one can serialize to file as per:
fstream output("/home/eamorr/test.bin", ios::out | ios::trunc | ios::binary);
if (!address_book.SerializeToOstream(&output)) {
cerr << "Failed to write address book." << endl;
return -1;
}
But I'd like to serialize to a C-style char* for transmission across a network.
How to do this? Please bear in mind that I'm very new to C++.
That's easy:
size_t size = address_book.ByteSizeLong();
void *buffer = malloc(size);
address_book.SerializeToArray(buffer, size);
Check documentation of MessageLite class also, it's parent class of Message and it contains useful methods.
You can serailze the output to a ostringstream and use stream.str() to get the string and then access the c-string with string.c_str().
std::ostringstream stream;
address_book.SerializeToOstream(&stream);
string text = stream.str();
char* ctext = text.c_str();
Don't forget to include sstream for std::ostringstream.
You can use ByteSizeLong() to get the number of bytes the message will occupy and then SerializeToArray() to populate an array with the encoded message.
Solution with a smart pointer for the array:
size_t size = address_book.ByteSizeLong();
std::unique_ptr<char[]> serialized(new char[size]);
address_book.SerializeToArray(&serialized[0], static_cast<int>(size));
Still one more line of code to take care of the fact that the serialized data can contain zero's.
std::ostringstream stream;
address_book.SerializeToOstream(&stream);
string text = stream.str();
char* ctext = text.c_str(); // ptr to serialized data buffer
// strlen(ctext) would wrongly stop at the 1st 0 it encounters.
int data_len = text.size(); // length of serialized data

LZ77 - algorithm - resolution

I was reading about this algorithm... And I coded a class to compress, I have not coded the decompressing class yet...
What do you think about the code?
I think i've got a problema... My codification is : "position | length", but I believe this method will keep me on problems when ill decompressing, because I wont know if the numbers of positions and length are 2, 3, 4 digits... :S
some advice will be accepted... :D
Any suggestions will be accepted.
Main file:
#include <iostream>
#include "Compressor.h"
int main() {
Compressor c( "/home/facu/text.txt", 3);
std::cout << c.get_TEXT_FILE() << std::endl;
std::cout << c.get_TEXT_ENCONDED() << std::endl;
c.save_file_encoded();
return 0;
}
header file :
#ifndef _Compressor_H_
#define _Compressor_H_
#include <utility>
#include <string>
typedef unsigned int T_UI;
class Compressor
{
public:
//Constructor
Compressor( const std::string &PATH, const T_UI minbytes = 3 );
/** GET BUFFERS **/
std::string get_TEXT_FILE() const;
std::string get_TEXT_ENCONDED() const;
/** END GET BUFFERS **/
void save_file_encoded();
private:
/** BUFFERS **/
std::string TEXT_FILE; // contains the text from an archive
std::string TEXT_ENCODED; // contains the text encoded
std::string W_buffer; // contains the string to analyze
std::string W_inspection; // contains the string where will search matches
/** END BUFFERS **/
T_UI size_of_minbytes;
T_UI size_w_insp; // The size of window inspection
T_UI actual_byte;
std::pair< T_UI, T_UI> v_codes; // Values to code text
// Utilitaries functions
void change_size_insp(){ size_w_insp = TEXT_FILE.length() ; }
bool inspection_empty() const;
std::string convert_pair() const;
// Encode algorythm
void lz77_encode();
};
#endif
implementation file :
#include <iostream>
#include <fstream>
using std::ifstream;
using std::ofstream;
#include <string>
#include <cstdlib>
#include <sstream>
#include "Compressor.h"
Compressor::Compressor(const std::string& PATH, const T_UI minbytes)
{
std::string buffer = "";
TEXT_FILE = "";
ifstream input_text( PATH.c_str(), std::ios::in );
if( !input_text )
{
std::cerr << "Can't open the text file";
std::exit( 1 );
}
while( !input_text.eof() )
{
std::getline( input_text, buffer );
TEXT_FILE += buffer;
TEXT_FILE += "\n";
buffer.clear();
}
input_text.close();
change_size_insp();
size_of_minbytes = minbytes;
TEXT_ENCODED = "";
W_buffer = "";
W_inspection = "";
v_codes.first = 0;
v_codes.second = 0;
actual_byte = 0;
lz77_encode();
}
std::string Compressor::get_TEXT_FILE() const
{
return TEXT_FILE;
}
std::string Compressor::get_TEXT_ENCONDED() const
{
return TEXT_ENCODED;
}
bool Compressor::inspection_empty() const
{
return ( size_w_insp != 0 );
}
std::string Compressor::convert_pair() const
{
std::stringstream out;
out << v_codes.first;
out << "|";
out << v_codes.second;
return out.str();
}
void Compressor::save_file_encoded()
{
std::string path("/home/facu/encoded.txt");
ofstream out_txt( path.c_str(),std::ios::out );
out_txt << TEXT_ENCODED << "\n";
out_txt.close();
}
void Compressor::lz77_encode()
{
while( inspection_empty() )
{
W_buffer = TEXT_FILE.substr( actual_byte, 1);
if( W_inspection.find( W_buffer ) == W_inspection.npos )
{
// Cant find any byte from buffer
TEXT_ENCODED += W_buffer;
W_inspection += W_buffer;
W_buffer.clear();
++actual_byte;
--size_w_insp;
}
else
{
// We founded any byte from buffer in inspection
v_codes.first = W_inspection.find( W_buffer );
v_codes.second = 1;
while( W_inspection.find( W_buffer ) != W_inspection.npos )
{
++actual_byte;
--size_w_insp;
v_codes.second++;
W_inspection += TEXT_FILE[actual_byte - 1];
W_buffer += TEXT_FILE[actual_byte];
}
++actual_byte;
--size_w_insp;
if( v_codes.second > size_of_minbytes )
TEXT_ENCODED += convert_pair();
else
TEXT_ENCODED += W_buffer;
W_buffer.clear();
}
}
}
Thank you!
Im coding the descompressing class :)
I generally recommend writing the decompressor first, and then writing the compressor to match it.
I recommend getting a compressor and corresponding decompressor working with a fixed-size copy items first, and only afterwards -- if necessary -- tweak them to produce/consume variable-size copy items.
Many LZ77-like algorithms use a fixed size in the compressed file to represent both the position and length;
often one hex digit for length and 3 hex digits for position, a total of 2 bytes.
The "|" between the position and the copy-length is unnecessary.
If you are really trying to implement the original LZ77 algorithm,
your compression algorithm needs to always emit the fixed-length copy-length (even when it is zero), the fixed-length position (when the length is zero, you might as well stick zero here also), and a fixed-length literal value.
Some LZ77-like file formats are divided into "items" that are either a fixed-length copy-length,position pair, or else one or more literal values.
If you go that route, the compressor must somehow first tell the decompressor whether the upcoming item represents literal value(s) or a copy-length, position pair.
One of many ways to do this is to reserving a special "0" position value that, rather than indicating some position in the output decompressed stream like all other position values, instead indicates the next few literal values in the input compressed file.
Nearly all LZ77-like algorithms store an "offset" backwards from the current location in the plaintext, rather than a "position" forwards from the beginning of the plaintext.
For example, "1" represents the most recently-decoded plaintext byte, not the first-decoded plaintext byte.
How is it possible to for the decoder to tell where one integer ends, and the next one begins, when the compressed file contains a series of integers?
There are 3 popular answers:
Use a fixed-length code, where you've set in stone at compile-time how long each integer will be. (simplest)
Use a variable-length code, and reserve a special symbol like "|" to indicate end-of-code.
Use a variable-length prefix code.
other approaches, such as range coding. (most complicated)
https://en.wikibooks.org/wiki/Data_Compression
Jacob Ziv and Abraham Lempel; A Universal Algorithm for Sequential Data Compression, IEEE Transactions on Information Theory, 23(3), pp.337-343, May 1977.
The simple lz77 code
"Header-only c++ library of compression algorithms that does not have any external dependencies"
https://github.com/wkoroy/compressholib