RDF4j v3.0.0 Export graph without inferred triples in it - rdf4j

in version 3.0.0, since inferred triples are added into an actual graph (not into the default as before), is it somehow possible to export/fetch an actual graph without the inferred triples?
Thanks a lot.

Yes, this is possible.
1. Via the Java API
You can retrieve the statements from named graph http://example.org/graph1 in several ways. Showing two alternatives here:
IRI graph1 = valueFactory.createIRI("http://example.org/graph1");
try(RepositoryConnection conn = repository.getConnection()) {
// option 1: getStatements of everything in a named graph, setting
// includeInferred to false
RepositoryResult<Statement> result = conn.getStatement(null, null, null, false, graph1);
// option 2: using export with an RDFHandler (export never includes inferred triples)
RDFHandler collector = new StatementCollector();
conn.export(graph1, collector);
}
2. Via the RDF4J Workbench
The simplest way to do this in the workbench is to use a SPARQL CONSTRUCT query:
construct from <http://example.org/graph1> where { ?s ?p ?o }
Make sure that the "include inferred statements" option is unchecked before hitting 'Execute'. Then on the query result screen, pick your preferred download format, and hit 'Download'.

Related

Use Arrow schema with parquet StreamWriter

I am attempting to use the C++ StreamWriter class provided by Apache Arrow.
The only example of using StreamWriter uses the low-level Parquet API i.e.
parquet::schema::NodeVector fields;
fields.push_back(parquet::schema::PrimitiveNode::Make(
"string_field", parquet::Repetition::OPTIONAL, parquet::Type::BYTE_ARRAY,
parquet::ConvertedType::UTF8));
fields.push_back(parquet::schema::PrimitiveNode::Make(
"char_field", parquet::Repetition::REQUIRED, parquet::Type::FIXED_LEN_BYTE_ARRAY,
parquet::ConvertedType::NONE, 1));
auto node = std::static_pointer_cast<parquet::schema::GroupNode>(
parquet::schema::GroupNode::Make("schema", parquet::Repetition::REQUIRED, fields));
The end result is a std::shared_ptr<parquet::schema::GroupNode> that can then be passed into StreamWriter.
Is it possible to construct and use "high-level" Arrow schemas with StreamWriter? They are supported when using the WriteTable function (non-streaming) but I've found no examples of its use with the streaming API.
I, of course, can resort to using the low-level API but it is extremely verbose when creating large and complicated schema and I would prefer (but not need) to use the high-level Arrow schema mechanisms.
For example,
std::shared_ptr<arrow::io::FileOutputStream> outfile_;
PARQUET_ASSIGN_OR_THROW(outfile_, arrow::io::FileOutputStream::Open("test.parquet"));
// construct an arrow schema
auto schema = arrow::schema({arrow::field("field1", arrow::int64()),
arrow::field("field2", arrow::float64()),
arrow::field("field3", arrow::float64())});
// build the writer properties
parquet::WriterProperties::Builder builder;
auto properties = builder.build()
// my current best attempt at converting the Arrow schema to a Parquet schema
std::shared_ptr<parquet::SchemaDescriptor> parquet_schema;
parquet::arrow::ToParquetSchema(schema.get(), *properties, &parquet_schema); // parquet_schema is now populated
// and now I try and build the writer - this fails
auto writer = parquet::ParquetFileWriter::Open(outfile_, parquet_schema->group_node(), properties);
The last line fails because parquet_schema->group_node() (which is the only way I'm aware of to get access to the GroupNode for the schema) returns a const GroupNode* whereas ParquetFileWriter::Open) needs a std::shared_ptr<GroupNode>.
I'm not sure casting-away the constness of the returned group node and forcing it into the ::Open() call is an officially supported (or correct) usage of StreamWriter.
Is what I want to do possible?
It seems that you need to use low-level api for StreamWriter.
a very tricky way:
auto writer = parquet::ParquetFileWriter::Open(outfile_, std::shared_ptr<parquet::schema::GroupNode>(const_cast<parquet::schema::GroupNode *>(parquet_schema->group_node())), properties);
you may need to converter schema manually.
source code cpp/src/parquet/arrow/schema.cc may help you.
PARQUET_EXPORT
::arrow::Status ToParquetSchema(const ::arrow::Schema* arrow_schema,
const WriterProperties& properties,
std::shared_ptr<SchemaDescriptor>* out);

How to chain planners from JDBC Adapter SchemaFactory?

I extended JDBC adapter and used a model.json configuration custom schema factory with 1 original schema and 2 derived schemas to add rules and that worked, rules got executed on original schema during planning, but their end-result didn't get chosen as the best option by the Volcano planner because it's too expensive. Rules transformed RelNode to execute on 2 derived schemas. More details below and in code.
1) Can I tell Volcano planner to ignore 1 out of 3 schemas that I passed through custom JDBC SchemaFactory?
I want the parser to work on that 1 original schema, but for the planner to never suggest an optimal (cheapest) plan in that schema (only other 2 derived schemas). 1 original schema is always mapped 1-to-1 with other 2 derived schemas, so the RelNode that my rule returns is always semantically equivalent, just more expensive (security reasons).
2) If that can't work, how can I call HepPlanner instead of default Volcano planner from SchemaFactory that is set in model.json, since that's my starting point?
You can find my entire code on GitHub, I made it publicly available so that everyone can have a better starting point with Calcite than I did.
Here is the link: https://github.com/igrgurina/multicloud_rewriter
Calcite library is amazing, but it's really hard to get into because it lacks examples and tutorials for common tasks.
Ideally, I would have HepPlanner execute my rules that transform them to semantically equivalent expressions that use 2 derived schemas instead of 1 original schema (I have a rule that does that), and then have Volcano planner optimize that using only 2 derived schemas, without having an idea that 1 original schema exists, due to security reasons.
I haven't found any reasonable examples that demonstrate how to do that so any help would be appreciated (please don't post links to Druid example, or Apache Calcite docs website, I went through them a thousand times).
I've managed to make this work by using Hook.PROGRAM and prepending my custom program that executes my rules before all others.
Since Hook is marked as for testing and debugging only in Calcite library, I would say this is not how it's supposed to be done, but I have nothing better at the moment.
Here is a short summary with code sample:
public static class MultiCloudHookManager {
private static final Program PROGRAM = new MultiCloudProgram();
private static Hook.Closeable globalProgramClosable;
public static void addHook() {
if (globalProgramClosable == null) {
globalProgramClosable = Hook.PROGRAM.add(program());
}
}
private static Consumer<Holder<Program>> program() {
return prepend(PROGRAM);
}
// this doesn't have to be in the separate program
private static Consumer<Holder<Program>> prepend(Program program) {
return (holder) -> {
if (holder == null) {
throw new IllegalStateException("No program holder");
}
Program chain = holder.get();
if (chain == null) {
chain = Programs.standard();
}
holder.set(Programs.sequence(program, chain));
};
}
}
The MultiCloudHookManager is then used in SchemaFactory, where you simply call MultiCloudHookManager.addHook() method. In this case, MultiCloudHookManager.PROGRAM is set to MultiCloudProgram, that simply executes a set of rules in HepPlanner.
For full details, refer to the source code in GitHub repository.
This hack solution is inspired by another library.

Connecting nodes of different GraphDef's

From Python, I have a frozen graph.pb that I'm currently using in a C++ environment. Now the data for the input tensor are currently preprocessed on the CPU, but I would like to do this step in another GraphDef to run it on the GPU, but I can't seem to find a way to connect nodes between two GraphDef's.
Lets assume my frozen graph have an input/placeholder named mid that I'd like to connect with the preprocessing steps below
tf::GraphDef create_graph_extension() {
tf::Scope root = tf::Scope::NewRootScope();
auto a = tf::ops::Const(root.WithOpName("in"), {(float) 23.0, (float) 31.0});
auto b = tf::ops::Identity(root.WithOpName("mid"), a);
tf::GraphDef graph;
TF_CHECK_OK(root.ToGraphDef(&graph));
return graph;
}
I usually use session->Extend() to run multiple graphs in the same session, but always making sure their node names are unique. With non-unique node names, that I hoped to connect, I get an error
Failed to install graph:
Invalid argument: GraphDef argument to Extend includes node 'mid', which
was created by a previous call to Create or Extend in this session.
P.s. It seems like it is possible in python at least (link)
You can achieve what you're looking for using the same idea that was suggested for Python - import one GraphDef into another and remap inputs.
In case you do use the C API (which has stability guarantees), you'd want to look at:
TF_GraphImportGraphDef (which is parallel to the tf.import_graph_def call in Python), and
TF_ImportGraphDefOptionsAddInputMapping which serves the same purpose as the input_map argument in Python.
These are implemented on top of the C++ ImportGraphDef function, which you might be able to use directly instead (though that doesn't seem to yet be part of the exported C++ API)
Hope that helps.

Sitecore: Glass Mapper Code First

It is possible to automatically generate Sitecore templates just coding models? I'm using Sitecore 8.0 and I saw Glass Mapper Code First approach but I cant find more information about that.
Not sure why there isn't much info about it, but you can definitely model/code first!. I do it alot using the attribute configuration approach like so:
[SitecoreType(true, "{generated guid}")]
public class ExampleModel
{
[SitecoreField("{generated guid}", SitecoreFieldType.SingleLineText)]
public virtual string Title { get; set; }
}
Now how this works. The SitecoreType 'true' value for the first parameter indicates it may be used for codefirst. There is a GlassCodeFirstDataprovider which has an Initialize method, executed in Sitecore's Initialize pipeline. This method will collect all configurations marked for codefirst and create it in the sql dataprovider. The sections and fields are stored in memory. It also takes inheritance into account (base templates).
I think you first need to uncomment some code in the GlassMapperScCustom class you get when you install the project via Nuget. The PostLoad method contains the few lines that execute the Initialize method of each CodeFirstDataprovider.
var dbs = global::Sitecore.Configuration.Factory.GetDatabases();
foreach (var db in dbs)
{
var provider = db.GetDataProviders().FirstOrDefault(x => x is GlassDataProvider) as GlassDataProvider;
if (provider != null)
{
using (new SecurityDisabler())
{
provider.Initialise(db);
}
}
}
Furthermore I would advise to use code first on development only. You can create packages or serialize the templates as usual and deploy them to other environment so you dont need the dataprovider (and potential risks) there.
You can. But it's not going to be Glass related.
Code first is exactly what Sitecore.PathFinder is looking to achieve. There's not a lot of info publicly available on this yet however.
Get started here: https://github.com/JakobChristensen/Sitecore.Pathfinder

CRecordset::snapshot doesn't work in VS2012 anymore - what's the alternative?

Apparently, in VS2012, SQL_CUR_USE_ODBC is deprecated. [Update: it appears that the cursors library has been removed from VS2012 entirely].
MFC's CDatabase doesn't use it anymore (whereas it was the default for VS2010 and earlier versions of MFC), but instead uses SQL_CUR_USE_DRIVER.
Unfortunately, SQL_CUR_USE_DRIVER doesn't work properly with the Jet ODBC driver (we're interacting with an Access database). The driver initially claims to support positional operations (but not positional updates), but when an attempt is made to actually query the database, all concurrency models fail until the MFC library drops down to read-only interaction with the database (which is not going to fly for us).
Questions
Is this MS's latest attempt to force devs to move away from Jet based datasources and migrate to SQL Express (or the like)?
Is there another modality that we should be using to interact with our Access databases through VS 2012 versions of MFC/ODBC?(1)
See also:
http://social.msdn.microsoft.com/Forums/kk/vcmfcatl/thread/acd84294-c2b5-4016-b4d9-8953f337f30c
Update: Looking at the various options, it seems that the cursor library has been removed from VS2012's ODBC library. Combined with the fact that Jet doesn't correctly support positional updates(2), it makes "snapshot" mode unusable. It does appear to support "dynaset" as long as the underlying tables have a primary key. Unkeyed tables are incompatible with "dynaset" mode(3). So - I can stick with VS 2010, or I can change my tables to include an autonumber or something similar in order to ensure a pkey is available so I can use dynaset mode for the recordsets.
(1) e.g. should we be using a different open type for CRecordset? We currently use CRecordset::snapshot. But I've never really understood the various modes snapshot, dynamic, dynaset. A quick set of "try each" has failed to get a working updatable interface to our access database...
(2) it claims to when queried initially, but then returns errors for all concurrency modes that it previously claimed to support
(3) "dynamic" is also out, since Jet doesn't support it at all (from what I can tell from my tests).
I found a solution that appears to work. I overrode OpenEx the exact same way VS 2012 has it because we need that to call the child version of AllocConnect since it is not virtual in the parent. I also overrode AllocConnect as mentioned. In the derived version of CDatabase, try the following code:
MyCDatabase.h
BOOL OpenEx(LPCTSTR lpszConnectString, DWORD dwOptions = 0);
void AllocConnect(DWORD dwOptions);
MyCDatabase.cpp
BOOL MyCDatabase::OpenEx(LPCTSTR lpszConnectString, DWORD dwOptions)
{
ENSURE_VALID(this);
ENSURE_ARG(lpszConnectString == NULL || AfxIsValidString(lpszConnectString));
ENSURE_ARG(!(dwOptions & noOdbcDialog && dwOptions & forceOdbcDialog));
// Exclusive access not supported.
ASSERT(!(dwOptions & openExclusive));
m_bUpdatable = !(dwOptions & openReadOnly);
TRY
{
m_strConnect = lpszConnectString;
DATA_BLOB connectBlob;
connectBlob.pbData = (BYTE *)(LPCTSTR)m_strConnect;
connectBlob.cbData = (DWORD)(AtlStrLen(m_strConnect) + 1) * sizeof(TCHAR);
if (CryptProtectData(&connectBlob, NULL, NULL, NULL, NULL, 0, &m_blobConnect))
{
SecureZeroMemory((BYTE *)(LPCTSTR)m_strConnect, m_strConnect.GetLength() * sizeof(TCHAR));
m_strConnect.Empty();
}
// Allocate the HDBC and make connection
AllocConnect(dwOptions);
if(!CDatabase::Connect(dwOptions))
return FALSE;
// Verify support for required functionality and cache info
VerifyConnect();
GetConnectInfo();
}
CATCH_ALL(e)
{
Free();
THROW_LAST();
}
END_CATCH_ALL
return TRUE;
}
void MyCDatabase::AllocConnect(DWORD dwOptions)
{
CDatabase::AllocConnect(dwOptions);
dwOptions = dwOptions | CDatabase::useCursorLib;
// Turn on cursor lib support
if (dwOptions & useCursorLib)
{
::SQLSetConnectAttr(m_hdbc, SQL_ATTR_ODBC_CURSORS, (SQLPOINTER)SQL_CUR_USE_ODBC, 0);
// With cursor library added records immediately in result set
m_bIncRecordCountOnAdd = TRUE;
}
}
Please note that you do not want to pass in useCursorLab to OpenEx at first, you need to override it in the hacked version of AllocConnect.
Also note that this is just a hack but it appears to work. Please test all your code to be sure it works as expected but so far it works OK for me.
If anyone else runs into this issue, here's what seems to be the answer:
For ODBC to an Access database, connect using CDatabase mydb; mydb.OpenEx(.., 0), so that you ask the system not to load the cursor library.
Then for the recordsets, use dynaset CMyRecordset myrs; myrs.Open(CRecordset::dynaset, ...).
Finally, you must make sure that your tables have a primary key in order to use dynasets (keysets).
Derive CDatabase and override OpenEx. In your derived class CMyDatabase, replace the call to AllocConnect to MyAllocConnect. Obviously, your MyAllocConnect function should call SQLSetConnectOption with the desired parameter:
// Turn on cursor lib support
if (dwOptions & useCursorLib)
{
AFX_SQL_SYNC(::SQLSetConnectOption(m_hdbc, SQL_ODBC_CURSORS, SQL_CUR_USE_ODBC));
// With cursor library added records immediately in result set
m_bIncRecordCountOnAdd = TRUE;
}
Then use your CMyDatabase class instead of CDatabase for all your queries.