Extract geometric objects (lines, circles,...) from a pdf using PDFMM - c++

I have a PDF containing several geometric objects (mostly lines) in different sizes and color. I want to extract them in the following form, e.g. for lines:
(startx, starty)
(endx, endy)
width
color
Optinal a "z" Position determining which object is drawn first. The language of my choice is C++ and I thought about PoDoFo, respectively PDFMM, as it should be more accessible. However I am total lost how to acess this information...
I found the following reference:
PDF parsing in C++ (PoDoFo)
however I was not able to make the PdfTokenizer work. The Tokenizer.TryReadNextToken needs a InputStreamDevice object, and I do not know how to get it.
For example: I create a single page with just one line in pdfmm. And now I want to extract this information:
#include <pdfmm/pdfmm.h>
int main()
{
try {
PdfMemDocument document;
document.Load("test.pdf");
PdfPage* page = document.GetPages().CreatePage(PdfPage::CreateStandardPageSize(PdfPageSize::A4));
// Draw single line
PdfPainter painter;
painter.SetCanvas(page);
painter.GetGraphicsState().SetLineWidth(10);
painter.DrawLine(0, 0, page->GetRect().GetWidth(), page->GetRect().GetHeight());
painter.FinishDrawing();
// Loop over all token of page
PdfTokenizer token(true);
char* stoken = nullptr;
PdfVariant var;
PdfContentType type;
while (token.TryReadNextToken( ???? ,stoken,type)) {
}
}
catch (PdfError& err)
{
err.PrintErrorMsg();
return (int)err.GetError();
}
}
If anybody could push me in the correct direction, this would be awesome! And if somebody has a good documentation about the structure of a pdf and/or a good tutorial of pdfmm / PoDoFo, this would also highly appreciated...

Related

showing the full content of ImageType in DCMTK

I'm trying to read a number of Siemens DICOM images with DCMTK, some of which are mosaic images. I'm looking for a quick way to find those.
What I can see with mosaic images is that this is specified in the ImageType tag, e.g.
$ dcmdump ${im0} | grep ImageType
(0008,0008) CS [ORIGINAL\PRIMARY\ASL\NONE\ND\NORM\MOSAIC] # 40, 7 ImageType
Most of the tags are easily read with findAndGetOFString() (or similar for floats etc), but if I do
tmpdata->findAndGetOFString(DCM_ImageType, tmpstring);
std::cout << "image type: " << tmpstring << "\n";
for DcmDataset* tmpdata and OFString tmpstring, then the content of tmpstring is only ORIGINAL so the rest of the value is never printed.
In dcmdump it is printed, but there the value of DCM_ImageType never seems to be stored in a string, which I do need it to be.
Would there be a similar command to findAndGetOFString() for 'code strings'? Maybe I'm missing something obvious!
Image Type (0008,0008) is a multi-valued attribute. That is, it may include several values which are separated by the backslash character. Note, that "officially", the backslash is not part of the attribute's value. It is a delimiter between several values of the attribute. This is what you have. So in terms of DICOM, there is no "one value" but multiple ones.
The DCMTK API allows you to handle this (of course).
findAndGetOFString() has a third parameter ("index") to define which of the multiple values you want to obtain.
The behavior that you probably expect is what findAndGetOFStringArray() does.
As an alternative, you could iterate through the multiple values of the attribute by obtaining the "Value Multiplicity" first and then loop through the values like
DcmElement* element = tmpdata->findAndGetElement(DCM_ImageType);
int numberOfValues = element->getVM();
for(int index = 0; index < numberOfValues; index++)
{
OFString valueAtIndex;
element->GetOfString(valueAtIndex, index);
/// ... your concatenation goes here...
}

I am trying to understand these 2 blocks of code, could someone help please?

I am trying to understand some code and I was hoping someone could give me a basic overview of what this code means. The first bit of code I don't understand is this:
// loading file path
static string resourceRoot;
#define RESOURCE_PATH(p) (char*)((resourceRoot+"/"+string(p)).c_str())
The second bit of code I am trying to fully understand is this:
void Draw::loadMeshFromFile(cMesh* mesh,string name)
{
bool fileload;
resourceRoot = m_Path.toStdString();
string str1 = "Head/";
string str2 = ".3ds";
fileload = mesh->loadFromFile(str1+name+str2);
if(!fileload)
{
printf("Error - 3D Model failed to load correctly.\n");
return;
}
Any help would be truly appreciated. I am trying to strengthen my programming, so I'd be glad for any response. Thanks
The first part is a macro which helps you to create a path for a specified resource, relative yo some root resource path, you can set in your code.
You basically would use it like this:
static string resourceRoot;
#define RESOURCE_PATH(p) (resourceRoot+"/"+string(p))
void load(cMesh* mesh)
{
// Probably set somewhere else
resourceRoot = "/opt/resources";
mesh->loadFromFile(RESOURCE_PATH("relarespath/myresource").c_str());
}
or
void load(cMesh* mesh)
{
// Probably set somewhere else
resourceRoot = "/opt/resources";
string myresourcename = RESOURCE_PATH("relarespath/myresource");
mesh->loadFromFile(myresourcename.c_str());
}
Since accessing the pointer in the original code is potentially dangerous, I removed it.

How to have c++ webscraper scrape through html until it hits a float/int

I am trying to use a C++ scraper in my ui to cipher through WSJ stock information to get some balance sheet info back, I have it for where it searchers for specific text in the page source ie "Pe/Ratio" and then i manually counted how many chars are in between it and the actual number on the website.
Here is the picture of the code
// P/E Ratio
size_t indexpeRatio = html.find("P/E Ratio ") + 116;
string s_peRatio = html.substr(indexpeRatio, 5);
peRatio = stod(s_peRatio);
After manually doing that it simply stores the number and I output it to my UI. My Issue is that sometimes the number of characters in between change depending on which company i choose to evaluate. I am wondering if there is a way to use the .find() function to find the "Pe/Ratio" then output the next float/int,
here is what the html looks like on the site
As of right now sometimes my ui will output parts of the html due to having to use a fixed number of chars
this is an example of my ui output when giving a smaller company to evaluate
Do you all have any recommendations I can use to fix this issue? Thank you in advance!
You could iterate through the characters.
Say you have string html:
#include<ctype.h>
#include<string>
using namespace std;
int main(){
double peratio;
string html;
/*
This is where you do your HTML scraping logic
*/
size_t indexpeRatio=html.find("P/E Ratio");
peratio=find_ratios(html.substr(indexpeRatio,strlen(html)-indexpeRatio));
}
double find_ratios(string html){
int i=0;
std::string output;
bool wasInt=false,isInt=false;
while(html[i]!='\0'&&!wasInt){
if(isdigit(html[i]))
isInt=true;
if(isInt)
if(html[i]!='.'&&!isdigit(html[i])){
wasInt=true;
isInt=false;
}
else output+=html[i];
i++;
}
return stod(output);
}

MFC SDI rich edit 2.0 control bolding words

How would i go about formatting text in a rich edit 2.0 control? As of right now i just have a simple little MFC program with a single view and just one rich edit 2.0 control. It's currently empty but i want to insert some text into it.
The control itself is labeled as StringToChange2 and the member within my class is m_StringToChange2.
TCHAR INIValue2[256] = _T("Here is some random text!");
SetDlgItemText(StringToChange2, INIValue2);
So as it stands now, when i run my program it inserts the text into my control. How can i make a word bold from the whole string?
For example i just want it to say : "Here is some random text!"
As it stands now, i can make the whole control bold but I don't want the whole thing to be bold, just a word.
This Link has a very similar question to what I am asking but there is 2 things wrong with it. First, almost all the comments tell him to use a HTML control which i don't want to turn to yet. Second, the one person who did respond to him has such a long snippet of code i don't understand what's happening. The very last answer recommends he use word pad since it uses RTF?
I tried to insert RTF code into my INIValue2 but it won't take it. Unless I'm using it wrong, which could highly be the case.
I've been stalking MSDN and reading the functions, but my level of expertise with MFC and richedit control is very limited. If someone could post a small example of this, it doesn't even have to relate to my question , but something i could use as a base for.
Edit1: It's not that my INIValue2 doesn't take it, it's that when it appears on my single view - it shows everything - including all the RTF code and header.
You have to format the text using EM_SETCHARFORMAT message. In MFC, you can use CRichEditCtrl::SetSelectionCharFormat
First, declare CRichEditCtrl member data in your dialog or window class
CRichEditCtrl m_richedit;
In OnInitDialog put
m_richedit.SubclassDlgItem(IDC_RICHEDIT21, this);
Apply the CHARFORMAT as follows:
CHARFORMAT cf = { sizeof(cf) };
cf.dwEffects = CFM_BOLD;
cf.dwMask = CFM_BOLD;
m_richedit.SetSel(0,2);
m_richedit.SetSelectionCharFormat(cf);
You can use helper functions to make this easier. For example see this post
To assign RTF text directly, you must use EM_STREAMIN. For some reason MFC has no function for this, so you have to write your own function
DWORD __stdcall callback_rtf_settext(DWORD_PTR dwCookie, LPBYTE pbBuff, LONG cb, LONG *pcb)
{
CString *psBuffer = (CString*)dwCookie;
if (cb > psBuffer->GetLength())
cb = psBuffer->GetLength();
for (int i = 0; i < cb; i++)
*(pbBuff + i) = (BYTE)psBuffer->GetAt(i);
*pcb = cb;
*psBuffer = psBuffer->Mid(cb);
return 0;
}
bool setrtf(CRichEditCtrl &edit, const CString &s)
{
EDITSTREAM es;
edit.SetSel(0, -1);
edit.Clear();
memset(&es, 0, sizeof(es));
es.dwCookie = (DWORD_PTR)&s;
es.pfnCallback = callback_rtf_settext;
edit.StreamIn(SF_RTF, es);
return es.dwError == 0;
}
usage:
setrf(m_richedit, L"\\rtf data...");

Iterating through the pixels of a .bmp in C++

NOTE: I changed the title from .png to .bmp due to a comment suggesting bitmaps instead.
I'm making this simple 2d grid based CMD-game, and I want to make .png levels and turn them into level data for my game.
So basically all I want to know is, how would I iterate through the pixels of a bmp to parse it to some level data.
This is how I did it with a .txt
int x = 0;
int y = 0;
std::ifstream file(filename);
std::string str;
while (std::getline(file, str))
{
x++;
for (char& c : str) {
y++;
updateTile(coordinate(x), coordinate(y), c);
}
}
I couldn't find any helpful threads so I posted this new one, hope I'm not breaking any rules
I don't know if you still want to read png-files, but if you do, check this decoder:
http://lodev.org/lodepng/
It loads a png-file into a vector where 4 chars (bytes) give one pixel(RGBA format). So by loading 4 chars at once, you will get one pixel.
I haven't used it before, but it looks easy to use.