The HPROF parser encountered a violation of the HPROF specification - mat

When I try to open a valid heapdump in MAT, I get this error. How to resolve this.
Error text:
The HPROF parser encountered a violation of the HPROF specification that it could not safely handle. This could be due to file truncation or a bug in the JVM.
Please consider filing a bug at eclipse.org. To continue parsing the dump anyway, you can use -DhprofStrictnessWarning=true or set the strictness mode under Preferences > HPROF Parser > Parser Strictness. See the inner exception for details.
The HPROF parser encountered a violation of the HPROF specification that it could not safely handle. This could be due to file truncation or a bug in the JVM. P
lease consider filing a bug at eclipse.org. To continue parsing the dump anyway, you can use -DhprofStrictnessWarning=true or set the strictness mode under Preferences > HPROF Parser > Parser Strictness. See the inner exception for details.
(Possibly) Invalid HPROF file: Expected to read another 707,569,392 bytes, but only 69,932,894 bytes are available.
(Possibly) Invalid HPROF file: Expected to read another 707,569,392 bytes, but only 69,932,894 bytes are available.

Changethe Preferences > Memory Analyzer > HPROF Parser > Parser Strictness from Strict to Warning

Related

Should JSON lexer be greedy?

I am building a toy JSON parser in C++, just for the learning experience.
While building the lexer, I came across a dilemma: should the lexer be greedy? If so, where is this defined? I could not find any directive in either JSON or ECMA-404.
In particular, while trying to tokenize the following (invalid number):
0.x123
Should my lexer try to parse it as the invalid number "0.x123" (greedy behavior) or the invalid number "0.x" followed by by the valid number "123" (but ultimately parsing it as an invalid sequence of tokens)?
Also, while tokenizing strings, should it be the lexer responsibility to check if the string is valid (for instance if a backslash is only followed by the allowable escape characters) or should I check this constraint in a different semantic analysis step? I guess this is more of an architectural preference, but I am curious about your opinions.
Invalid is invalid. If you can't parse it, bail at the earliest opportunity and raise an error.
There's no need to be greedy here because you'll just waste time processing data that has zero impact on the situation.

Letting exceptions throw for reasons of processing speed?

I am not sure, this could be a question that is more related to opinion.
Lets say I have a very long list of strings (>100 million) that need to be parsed. 0.01% of these strings contain illegal UNICODE characters (such as ASCI-characters).
When it comes to processing speed, would it be considered bad practice of using a Regex expression (for replacing or removing illegal characters) only when an Exception is thrown? (e.g. 'Exception occurred: hexadecimal value 0x02 is an invalid character')
I could compare speed for both options, but I am more concerned about program stability, readability and adaptability of code etc.
I am using VB.Net.
Thanks for answers!

Compiler run-time error reporting with location of error

I'm writing a compiler in C++ (Ubuntu 12.04. with gcc). So far, cumulative error/warning reporting with fairly precise line and column number of error/warning location works fine.  
My project goals include simply learning how to do this, so I'm adding a preprocessing stage (in a first step doing only minimal stuff like string concatenation, comment removal, etc), creating a resulting tmp file. It will not be necessary at this point as I could concatenate strings in my lexer while parsing, and the lexer already handles comments fine, but I'd like to understand how to handle it efficiently and as elegantly as I can. 
Compile time errors are not hard:   
(1) do error check (-> report compile-time errors)
(2) if no errors, preprocess -> tmp file
(3) run parser, etc., on tmp file (which is compile-time error free)
However, I also report run-time errors with line number (eg, for array out of bounds checks for arrays with integer expression bounds). As the error checks will be added to the byte code of my IR when parsing the tmp file only, and this file can significantly differ from the source file (in particular if we start allowing the pasting in of header files, say), how on earth can you reasonably report helpful error location? Is there a standard trick how gcc, say, handles this? The type of bound check mentioned of course doesn't happen for C; but runtime error reporting applies to, say, dynamic resolution of pointers in a hierarchy in C++, and gcc gets the line numbers just fine.
You can record line number information in your temporary file produced by your preprocessor, such as Line Control of cpp.
The C preprocessor informs the C compiler of the location in your source code where each token came from. Presently, this is just the file name and line number.

How do c/c++ compilers know which line an error is on

There is probably a very obvious answer to this, but I was wondering how the compiler knows which line of code my error is on. In some cases it even knows the column.
The only way I can think to do this is to tokenize the input string into a 2D array. This would store [lines][tokens].
C/C++ could be tokenized into 1 long 1D array which would probably be more efficient. I am wondering what the usual parsing method would be that would keep line information.
actually most of it is covered in the dragon book.
Compilers do Lexing/Parsing i.e.: transforming the source code into a tree representation.
When doing so each keyword variable etc. is associated with a line and column number.
However during parsing the exact origin of the failure might get lost and the information might be off.
This is the first step in the long, complicated path towards "Engineering a Compiler" or Compilers Theory
The short answer to that is: there's a module called "front-end" that usually takes care of many phases:
Scanning
Parsing
IR generator
IR optimizer ...
The structure isn't fixed so each compiler will have its own set of modules but more or less the steps involved in the front-end processing are
Scanning - maps character streams into words (also ignores whitespaces/comments) or tokens
Parsing - this is where syntax and (some) semantic analysis take place and where syntax errors are reported
To make this up to you: the compiler knows the location of your error because when something doesn't fit into a structure called "abstract syntax tree" (i.e. it cannot be constructed) or doesn't follow any of the syntax-directed translation rules, well.. there's something wrong and the compiler indicates the location where this didn't happen. If there's a grammar error on just one word/token then even a precise column location can be returned since nothing matched a terminal keyword: a basic token like the if keyword in the C/C++ language.
If you want to know more about this topic my suggestion is to start with the classic academic approach of the "Compiler Book" or "Dragon Book" and then, later on, possibly study an open-source front-end like Clang

Errors when iostringstream::write is called

On this website, the description of the iostringstream::write function says that:
In case of error, the badbit flag is set
What could those errors be?
The obvious error when writing to a stringstream would be if the underlying stringbuffer failed to allocate memory to hold the data being written. Also note, however, that the link you've given is to ostream::write, which could fail for other reasons (e.g., writing to a pipe that's been closed or a file on a disk that's full and/or the write would exceed what the user's allowed).
Aside #1: there's no such thing as an iostringstream -- there's istringstream and ostringstream. The one that combines both is just stringstream.
Aside #2: cplusplus.com isn't particularly highly respected. Some other sites (e.g., cppreference.com) seem to be more dependable/accurate, at least as a general rule (though I feel obliged to point out that I don't use any of the above much myself, so my comments on them aren't anywhere close to the last word).