Influx - string compression - compression

Does influx have dictionary style compression for strings that repeat frequently. I found a mention here: https://www.influxdata.com/blog/new-storage-engine-time-structured-merge-tree/ but unable to find out if the feature is already implemented.

Looks like it was added to the design docs here, but it appears to be more of a proposal:
https://github.com/influxdata/influxdb/commit/06016882abc6f1a9d992fa0b153b7bbc50bfb500
I only see snappy string compression referenced in the source (although it could be somewhere else):
https://github.com/influxdata/influxdb/blob/master/tsdb/tsm1/string.go

Related

How to parse the java comments of a groovy file to html format?

I have a set of .groovy files (Java). All of these files have the same comment format.
I developped a tool with wich I'm able to read those files and applying a REGEX to get all the comments in a list. (Finally i just have to copy paste these comments to .html file)
I would like to know if it's a correct practice in order to generate a HTML page with the comment (a kind of documentation). If not, what would you recommend ?
I read about Doxygen and Javadoc but i'm not sure about using them (if they can be really useful in my case since the comments are already written)
If you can suggest a library in order to generate easily a HTML Webpage or any other advice.
Any help is appreciated.
There exists Groovydoc which is roughly the equivalent of Javadoc, just for Groovy.
As your setup is not that (you already have comments, probably not in Groovydoc format, and you have half the tooling), there are still multiple ways open to you. As you already extract the documentation from groovy, if I were you, I would do a minimal post-formatting, if necessary, and output the documentation as markdown (e.g., github markdown) or asciidoc (e.g., asciidoctor). Then you can use any preferred tool to convert the post-formatted documentation into HTML.
To answer the question "How to parse the java comments" – you shouldn't. If possible, especially in a new project, stick with the standard tooling. In the case of Groovy that's Groovydoc. The normal (non Java/Groovy-Doc style) comments themselves you should never need to extract from the source code. They should be so much context-specific, that without the corresponding code they are anyways useless.

Append to compressed file using zlib

Looking around I have found the question being asked, but not great answers. If this is a stackoverflow duplicate (sorry!)
My goal is to have a zlib compressed file that I append to using C/C++ at different intervals (such as a log file). Due to buffer size constraints I was hoping to avoid having to keep the entire file in memory for appending new items.
Mark Adler's answer was very close to what I needed, but due to already being entrenched in the zlib library and on an embedded device with limited resources I was/am stuck.
I ended up simply appending a delimiter to each section of data (ex: ##delimiter##) and once ready to read the finished file, (different application) it seeks these sections and creates an array object of the compressed sections that are then individually decompressed.
I am still marking Adler's answer as correct, as it was useful info that will be of more help to other programmers.
It sounds like you are trying to keep something like a compressed log, appending small amounts of data each time. For that you can look at gzlog.h and gzlog.c for an example of how to do this.
You can also look at gzappend, which appends data to a gzip file.
These are all easily adaptable to a zlib stream.

CDH4.2.0 Unable to set HBase Compression

Since we've updated our installation of CDH4.1.2 to CDH4.2.0 we're no longer able to create new tables with enabled compression.
We were using SNAPPY Compression successfully before.
Now when we try to execute a create statement like:
create 'tableWithCompression', {NAME => 't1', COMPRESSION => 'SNAPPY'}
an error occurs:
ERROR: Compression SNAPPY is not supported. Use one of LZ4 SNAPPY LZO GZ NONE
We realized that other compression algorithms weren't found either: e.g. same problem with 'GZ'.
ERROR: Compression GZ is not supported. Use one of LZ4 SNAPPY LZO GZ NONE
We've added
"export HBASE_LIBRARY_PATH=/usr/lib/hadoop/lib/native/"
to hbase-env.sh.
Unfortunately this did not fix our problem.
What else can we try?
I'm getting the same. This seems to be a bug in the admin.rb script.
The code in question is this:
if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::COMPRESSION)
compression = arg[org.apache.hadoop.hbase.HColumnDescriptor::COMPRESSION].upcase
unless org.apache.hadoop.hbase.io.hfile.Compression::Algorithm.constants.include?(compression)
raise(ArgumentError, "Compression #{compression} is not supported. Use one of " + org.apache.hadoop.hbase.io.hfile.Compression::Algorithm.constants.join(" "))
else
family.setCompressionType(org.apache.hadoop.hbase.io.hfile.Compression::Algorithm.valueOf(compression))
end
end
Some "p" statements later, I know that. compression is "SNAPPY", and org.apache.hadoop.hbase.io.hfile.Compression::Algorithm.constants is [:LZ4, :SNAPPY, :LZO, :GZ, :NONE].
See the diffrence? We're comparing strings and symbols. The quick fix is to change the line that sets compression to the following:
compression = arg[org.apache.hadoop.hbase.HColumnDescriptor::COMPRESSION].upcase.to_sym
I guess this has to do with there being a ton of different jruby variants and configurations. I suppose in some, the constants are strings, in others symbols. A more permanent fix is to use to_sym on both ends of the comparison.

Is there an open-source implementation of the Dictionary Huffman compression algorithm?

I'm working on a library to work with Mobipocket-format ebook files, and I have LZ77-style PalmDoc decompression and compression working. However, PalmDoc compression is only one of the two currently-used types of text compression being used on ebooks in the wild, the other being Dictionary Huffman aka huffcdic.
I've found a couple of implementations of the huffcdic decoding algorithm, but I'd like to be able to compress to the same format, and so far I haven't been able to find any examples of how to do that yet. Has someone else already figured this out and published the code?
i have been trying to use http://bazaar.launchpad.net/~kovid/calibre/trunk/view/head:/src/calibre/ebooks/compression/palmdoc.c but compression doesnt produce identical results, & there are 3 - 4 descrepencies also read one related thread LZ77 compression of palmdoc

Google Bot information?

Does anyone know any more details about google's web-crawler (aka GoogleBot)? I was curious about what it was written in (I've made a few crawlers myself and am about to make another) and if it parses images and such. I'm assuming it does somewhere along the line, b/c the images in images.google.com are all resized. It also wouldn't surprise me if it was all written in Python and if they used all their own libraries for most everything, including html/image/pdf parsing. Maybe they don't though. Maybe it's all written in C/C++. Thanks in advance-
you can find a bit about how googlebot works here:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=158587
for example the "fetch as googlebot" tool lets you see a page as Googlebot sees it.
The crawler is very likely written in C or C++, at least backrub's crawler was written in one of these.
Be aware that the crawler only takes a snapshot of the page, then stores it in a temporary database for later processing. The indexing and other attached algorithms will extract the data, for example the image references.
Officially allowed languages at Google, I think, are Python/C++/Java.
The bot likely uses all 3 for different tasks.