custom cc_toolchain used in bazel rule - build

I've been trying to write a bazel rule to wrap compiling for risc-v source files, does some other stuff, etc, but I've been having some trouble with getting a CcToolchainInfo provider.
I have a rule that works that looks like
rv_cc_toolchain_config = rule(
implementation = _impl,
attrs = {},
provides = [CcToolchainConfigInfo],
)
in order to provide config info. I have the following in toolchains/BUILD:
load(":cc_toolchain_config.bzl", "rv_cc_toolchain_config")
package(default_visibility = ['//visibility:public'])
rv_cc_toolchain_config(name="rv_toolchain_cfg")
cc_toolchain(
name='rv_toolchain',
toolchain_identifier='rv-toolchain',
toolchain_config=':rv_toolchain_cfg',
all_files=':nofile',
strip_files=':nofile',
objcopy_files=':nofile',
dwp_files=':nofile',
compiler_files=':nofile',
linker_files=':nofile',
)
This seems to all work fine; I then have my custom rule to compile with riscv:
def _compile_impl(ctx):
deps = []
cc_toolchain = find_cpp_toolchain(ctx)
print(ctx.attr._cc_toolchain)
compilation_contexts = [dep[CcInfo].compilation_context for dep in deps]
print(type(cc_toolchain))
feature_configuration = cc_common.configure_features( #fails here
ctx = ctx,
cc_toolchain = cc_toolchain,
requested_features = ctx.features, #currently does nothing
unsupported_features = ctx.disabled_features,
)
rv_compile = rule(
_compile_impl,
output_to_genfiles = True,
attrs = {
"srcs": attr.label_list(
doc = "List of source files",
mandatory = False,
allow_files = [".cc", ".cpp", ".h", ".c"],
),
"hdrs": attr.label_list(
doc = "List of header files",
allow_files = [".h"],
),
"_cc_toolchain": attr.label(
#default = Label("#bazel_tools//tools/cpp:current_cc_toolchain"),
default = Label("//toolchains:rv_toolchain")
),
},
provides = [
DefaultInfo,
CcInfo,
],
toolchains = [
"#bazel_tools//tools/cpp:toolchain_type",
],
fragments = ["cpp"]
)
Where I fail when trying to configure the toolchain because cc_toolchain is of type ToolchainInfo and not the required CcToolchainInfo. Does anyone have any insight on how to provide CcToolchainInfo within a rule? Or is there a better way of doing this? Documentation seems to go dark on this.

Oops -- figured this out after trolling through github. Turns out the problem is directly referencing cc_toolchain is incorrect, and that CcToolchainInfo is provided via cc_toolchain_suite
updating toolchains/BUILD to look something like
load(":cc_toolchain_config.bzl", "rv_cc_toolchain_config")
package(default_visibility = ['//visibility:public'])
rv_cc_toolchain_config(name="rv_toolchain_cfg")
filegroup(name = 'empty')
cc_toolchain(
name='rv_toolchain',
toolchain_identifier='sanity-toolchain',
toolchain_config=':rv_toolchain_cfg',
all_files=':empty',
strip_files=':empty',
objcopy_files=':empty',
dwp_files=':empty',
compiler_files=':empty',
linker_files=':empty',
)
cc_toolchain_suite(
name='rv',
toolchains={
'darwin': ':rv_toolchain', #use whatever OS you need here...
}
)
and the rv compile rule to something like
rv_compile = rule(
_compile_impl,
output_to_genfiles = True,
attrs = {
"srcs": attr.label_list(
doc = "List of source files",
mandatory = False,
allow_files = [".cc", ".cpp", ".h", ".c"],
),
"hdrs": attr.label_list(
doc = "List of header files",
allow_files = [".h"],
),
"_cc_toolchain": attr.label(
#default = Label("#bazel_tools//tools/cpp:current_cc_toolchain"),
default = Label("//toolchains:rv")
),
},
provides = [
DefaultInfo,
CcInfo,
],
toolchains = [
"#bazel_tools//tools/cpp:toolchain_type",
],
fragments = ["cpp"]
)
works like a charm :) anyone reading this should also enable expirimental skylark cpp apis as well. if anyone knows how to make cc_toolchain_suite cpu agnostic, i'd love to hear it. cheers.

Related

How to use dagster with great expectations?

The issue
I'm trying out great expectations with dagster, as per this guide
My pipeline seems to execute correctly until it reaches this block:
expectation = dagster_ge.ge_validation_op_factory(
name='ge_validation_op',
datasource_name='dev.data-pipeline-data-storage.data_pipelines.raw_data.sirene_update',
suite_name='suite.data_pipelines.raw_data.sirene_update',
)
if expectation["success"]:
print("Success")
trying to call expectation["success"] results in a
# TypeError: 'SolidDefinition' object is not subscriptable
When I go inside the code of ge_validation_op_factory, there is a _ge_validation_fn that should yield ExpectationResult, but somehow it gets coverted into a SolidDefinition...
Dagster version = 0.15.9;
Great Expectations version = 0.15.44
Code to reproduce the error
In my code, I am trying to interact with an s3 bucket, so it would be a bit tedious to re-create the code for my example but here it is anyway:
In a gx_postprocessing.py
import json
import boto3
import dagster_ge
from dagster import (
op,
graph,
Field,
String,
OpExecutionContext,
)
from typing import List, Dict
#op(
config_schema={
"bucket": Field(
String,
description="s3 bucket name",
),
"path_in_s3": Field(
String,
description="Prefix representing the path to data",
),
"technical_date": Field(
String,
description="date string to fetch data",
),
"file_name": Field(
String,
description="file name that contains the data",
),
}
)
def read_in_json_datafile_from_s3(context: OpExecutionContext):
bucket = context.op_config["bucket"]
path_in_s3 = context.op_config["path_in_s3"]
technical_date = context.op_config["technical_date"]
file_name = context.op_config["file_name"]
object = f"{path_in_s3}/" f"technical_date={technical_date}/" f"{file_name}"
s3 = boto3.resource("s3")
content_object = s3.Object(bucket, object)
file_content = content_object.get()["Body"].read().decode("utf-8")
json_content = json.loads(file_content)
return json_content
#op
def process_example_dq(data: List[Dict]):
return len(data)
#op
def postprocess_example_dq(numrows, expectation):
if expectation["success"]:
return numrows
else:
raise ValueError
#op
def validate_example_dq(context: OpExecutionContext):
expectation = dagster_ge.ge_validation_op_factory(
name='ge_validation_op',
datasource_name='my_bucket.data_pipelines.raw_data.example_update',
suite_name='suite.data_pipelines.raw_data.example_update',
)
return expectation
#graph(
config={
"read_in_json_datafile_from_s3": {
"config": {
"bucket": "my_bucket",
"path_in_s3": "my_path",
"technical_date": "2023-01-24",
"file_name": "myfile_20230124.json",
}
},
},
)
def example_update_evaluation():
output_dict = read_in_json_datafile_from_s3()
nb_items = process_example_dq(data=output_dict)
expectation = validate_example_dq()
postprocess_example_dq(
numrows=nb_items,
expectation=expectation,
)
Do not forget to add great_expectations_poc_pipeline to your __init__.py where the pipelines=[..] are listed.
In this example, dagster_ge.ge_validation_op_factory(...) is returning an OpDefinition, which is the same type of thing as (for example) process_example_dq, and should be composed in the graph definition the same way, rather than invoked within another op.
So instead, you'd want to have something like:
validate_example_dq = dagster_ge.ge_validation_op_factory(
name='ge_validation_op',
datasource_name='my_bucket.data_pipelines.raw_data.example_update',
suite_name='suite.data_pipelines.raw_data.example_update',
)
Then use that op inside your graph definition the same way you currently are (i.e. expectation = validate_example_dq())

shinyWidgets::pickerInput not working as expected when used in conjunction with semantic.dashboard

I wanted to inquire about possible solution(s) for the following issues encountered when using shinyWidgets::pickerInput in combination with semantic.dashboard:
visual output in UI, created using pickerInput, is not a dropdown menu
when clicking on the visual output the entire list of options (passed as input to pickerInput ) shows up in UI, and cannot be closed
See two snapshots: ,
Here is the code used to create this dashboard
if(interactive()){
ui <- semantic.dashboard::dashboardPage(
header = semantic.dashboard::dashboardHeader(
color = "blue",
title = "Dashboard Test",
inverted = TRUE
),
sidebar = semantic.dashboard::dashboardSidebar(
size = "thin",
color = "teal",
semantic.dashboard::sidebarMenu(
semantic.dashboard::menuItem(
tabName = "tabID_main",
"Main Tab"),
semantic.dashboard:: menuItem(
tabName = "tabID_extra",
"Extra Tab")
)
),
body = semantic.dashboard::dashboardBody(
semantic.dashboard::tabItems(
selected = 1,
semantic.dashboard::tabItem(
tabName = "tabID_main",
semantic.dashboard::box(
shiny::h1("Main body"),
title = "A b c",
width = 16,
color = "orange",
shinyWidgets::pickerInput(
inputId = "ID_One",
choices = c("Value One","Value Two","Value Three"),
label = shiny::h5("Value to select"),
selected = "Value Two",
width = "fit",
inline = TRUE),
shiny::verbatimTextOutput(outputId = "res_One")
)
),
semantic.dashboard::tabItem(
tabName = "tabID_extra",
shiny::h1("extra")
)
)
)
)
server <- function(input, output, session) {
output$res_One <- shiny::renderPrint(input$ID_One)
}
shiny::shinyApp(ui, server)
}
I am using
R version 3.6.3 64-bit on Windows computer
R packages as of checkpoint date 2021-05-15
shinyWidget version 0.6.0
semantic.dashboard version 0.2.0
Solution was given on semantic.dashboard Github page by Osenan.
I would like to acknowledge it, and to include it here for convenience.
Essentially the reason for the problem is the clash between Bootstarp and Semantic (Fomantic) UI.
semantic.dashboard uses shiny.semantic components, which in turn suppress Boostrap css and js libraries. Since shinyWidgets::pickerInput needs Boostrap to work, a solution is to load manually js and css Boostrap libraries.
This code can be added right below semantic.dashboard::dashboardBody as follows:
body = semantic.dashboard::dashboardBody(
tags$head(
tags$link(rel = "stylesheet", type = "text/css", id = "bootstrapCSS",
href = "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css"),
tags$script(src = "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js")
),

Power BI Iterative API Loop

I am attempting (and can successfully do so) to connect to an API and loop through several iterations of the API call in order to grab the next_page value, put it in a list and then call the list.
Unfortunately, when this is published to the PBI service I am unable to refresh there and indeed 'Data Source Settings' tells me I have a 'hand-authored query'.
I have attempted to follow Chris Webbs' blog post around the usage of query parameters and relative path, but if I use this I just get a constant loop of the first page that's hit.
The Start Epoch Time is a helper to ensure I only grab data less than 3 months old.
let
iterations = 10000, // Number of MAXIMUM iterations
url = "https://www.zopim.com/api/v2/" & "incremental/" & "chats?fields=chats(*)" & "&start_time=" & Number.ToText( StartEpochTime ),
FnGetOnePage =
(url) as record =>
let
Source1 = Json.Document(Web.Contents(url, [Headers=[Authorization="Bearer MY AUTHORIZATION KEY"]])),
data = try Source1[chats] otherwise null, //get the data of the first page
next = try Source1[next_page] otherwise null, // the script ask if there is another page*//*
res = [Data=data, Next=next]
in
res,
GeneratedList =
List.Generate(
()=>[i=0, res = FnGetOnePage(url)],
each [i]<iterations and [res][Data]<>null,
each [i=[i]+1, res = FnGetOnePage([res][Next])],
each [res][Data])
Lookups
If Source1 exists, but [chats] may not, you can simplify
= try Source1[chats] otherwise null
to
= Source1[chats]?
Plus it you don't lose non-lookup errors.
m-spec-operators
Chris Web Method
should be something closer to this.
let
Headers = [
Accept="application/json"
],
BaseUrl = "https://www.zopim.com", // very important
Options = [
RelativePath = "api/v2/incremental/chats",
Headers = [
Accept="application/json"
],
Query = [
fields = "chats(*)",
start_time = Number.ToText( StartEpocTime )
],
Response = Web.Contents(BaseUrl, Options),
Result = Json.Document(Response) // skip if it's not JSON
in
Result
Here's an example of a reusable Web.Contents function
helper function
let
/*
from: <https://github.com/ninmonkey/Ninmonkey.PowerQueryLib/blob/master/source/WebRequest_Simple.pq>
Wrapper for Web.Contents returns response metadata
for options, see: <https://learn.microsoft.com/en-us/powerquery-m/web-contents#__toc360793395>
Details on preventing "Refresh Errors", using 'Query' and 'RelativePath':
- Not using Query and Relative path cause refresh errors:
<https://blog.crossjoin.co.uk/2016/08/23/web-contents-m-functions-and-dataset-refresh-errors-in-power-bi/>
- You can opt-in to Skip-Test:
<https://blog.crossjoin.co.uk/2019/04/25/skip-test-connection-power-bi-refresh-failures/>
- Debugging and tracing the HTTP requests
<https://blog.crossjoin.co.uk/2019/11/17/troubleshooting-web-service-refresh-problems-in-power-bi-with-the-power-query-diagnostics-feature/>
update:
- MaybeErrResponse: Quick example of parsing an error result.
- Raw text is returned, this is useful when there's an error
- now response[json] does not throw, when the data isn't json to begin with (false errors)
*/
WebRequest_Simple
= (
base_url as text,
optional relative_path as nullable text,
optional options as nullable record
)
as record =>
let
headers = options[Headers]?, //or: ?? [ Accept = "application/json" ],
merged_options = [
Query = options[Query]?,
RelativePath = relative_path,
ManualStatusHandling = options[ManualStatusHandling]? ?? { 400, 404, 406 },
Headers = headers
],
bytes = Web.Contents(base_url, merged_options),
response = Binary.Buffer(bytes),
response_metadata = Value.Metadata( bytes ),
status_code = response_metadata[Response.Status]?,
response_text = Text.Combine( Lines.FromBinary(response,null,null, TextEncoding.Utf8), "" ),
json = Json.Document(response),
IsJsonX = not (try json)[HasError],
Final = [
request_url = metadata[Content.Uri](),
response_text = response_text,
status_code = status_code,
metadata = response_metadata,
IsJson = IsJsonX,
response = response,
json = if IsJsonX then json else null
]
in
Final,
tests = {
WebRequest_Simple("https://httpbin.org", "json"), // expect: json
WebRequest_Simple("https://www.google.com"), // expect: html
WebRequest_Simple("https://httpbin.org", "/headers"),
WebRequest_Simple("https://httpbin.org", "/status/codes/406"), // exect 404
WebRequest_Simple("https://httpbin.org", "/status/406"), // exect 406
WebRequest_Simple("https://httpbin.org", "/get", [ Text = "Hello World"])
},
FinalResults = Table.FromRecords(tests,
type table[
status_code = Int64.Type, request_url = text,
metadata = record,
response_text = text,
IsJson = logical, json = any,
response = binary
],
MissingField.Error
)
in
FinalResults

Generated header not found

I'm trying to use Bazel to build a cpp project that use Flatbuffers.
But my map_schema_generated.h generated with flatc is not found.
My tree:
|
|_ data
| |_ maps
| |_ BUILD
| |_ map_schema.fbs
|
|_ src
| |_ map
| |_ BUILD
| |_ map.hpp
| |_ map.cpp
|
|_ tools
| |_ BUILD
| |_ generate_fbs.bzl
|
|_ WORKSPACE
tools/generate_fbs.bzl:
def _impl(ctx):
output = ctx.outputs.out
input = ctx.files.srcs
print("generating", output.basename)
ctx.action(
use_default_shell_env = True,
outputs = [output],
inputs = input,
progress_message="Generating %s with %s" % (output.path, input[0].path),
command="flatc -o %s --cpp %s" % (output.dirname, input[0].path)
)
generate_fbs = rule(
implementation=_impl,
output_to_genfiles = True,
attrs={
"srcs": attr.label_list(allow_files=True, allow_empty=False),
"out": attr.output()
},
)
data/maps/BUILD:
load("//tools:generate_fbs.bzl", "generate_fbs")
generate_fbs(
name = "schema",
srcs = ["map_schema.fbs"],
out = "map_schema_generated.h",
visibility = ["//visibility:public"]
)
src/map/BUILD:
cc_library(
name = "map",
srcs = [
"//data/maps:map_schema_generated.h",
"map.hpp",
"map.cpp"
]
)
src/map/map.cc has #include "map_schema_generated.h".
The command line I use to build: bazel build //src/map.
If I find in bazel-*, I got:
bazel-genfiles/data/maps/map_schema_generated.h
bazel-out/k8-fastbuild/genfiles/data/maps/map_schema_generated.h
bazel-my-workspace-name/bazel-out/k8-fastbuild/genfiles/data/maps/map_schema_generated.h
And if I cat these files, I can see that they are well generated.
All the information that I found are about Tensorflow, and are not really helpful.
Best regards,
The problem is that your cc_library actually doesn't really recognize your generated header as requiring any special action (like adding -I flag for the location it's in). It gets generate and lives in the build tree, but not anywhere the compiler (preprocessor) would look for it working on map.cpp. (Run build with -s for a bit more insight about what and how happened).
Now about how to address this, there might be a better way, but this would appear to work. I guess this functionality could also be rolled into generate_fbs rule.
In data/maps/BUILD I've added "header only" library as follows:
cc_library(
name = "map_schema_hdr",
hdrs = [":map_schema_generated.h"],
include_prefix = ".", # to manipulate -I of dependenices
visibility = ["//visibility:public"]
)
In src/map/BUILD I would then use this header only library as dependency of map:
cc_library(
name = "map",
srcs = [
"map.cpp"
"map.hpp"
],
deps = [
"//data/maps:map_schema_hdr",
]
)
To play a bit more with the idea of having a single rule (macro) for convenience, I've made the following changes:
tools/generate_fbs.bzl now reads:
def _impl(ctx):
output = ctx.outputs.out
input = ctx.files.srcs
print("generating", output.basename)
ctx.action(
use_default_shell_env = True,
outputs = [output],
inputs = input,
progress_message="Generating %s with %s" % (output.path, input[0].path),
command="/bin/cp %s %s" % (input[0].path, output.path)
)
_generate_fbs = rule(
implementation=_impl,
output_to_genfiles = True,
attrs={
"srcs": attr.label_list(allow_files=True, allow_empty=False),
"out": attr.output()
},
)
def generate_fbs(name, srcs, out):
_generate_fbs(
name = "_%s" % name,
srcs = srcs,
out = out
)
native.cc_library(
name = name,
hdrs = [out],
include_prefix = ".",
visibility = ["//visibility:public"],
)
With that, I could have data/maps/BUILD:
load("//tools:generate_fbs.bzl", "generate_fbs")
generate_fbs(
name = "schema",
srcs = ["map_schema.fbs"],
out = "map_schema_generated.h",
)
And src/map/BUILD contains:
cc_library(
name = "map",
srcs = [
"map.cpp",
"map.hpp",
],
deps = [
"//data/maps:schema",
]
)
And bazel build //src/map builds bazel-bin/src/map/libmap.a and bazel-bin/src/map/libmap.so.
Instead of #include "map_schema_generated.h" in src/map/map.cpp, I could have write `#include "data/maps/map_schema_generated.h".
I think it is the cleanest way to make it works.

Bazel: how to glob headers into one include path

In Buck, one might write:
exported_headers = subdir_glob([
("lib/source", "video/**/*.h"),
("lib/source", "audio/**/*.h"),
],
excludes = [
"lib/source/video/codecs/*.h",
],
prefix = "MediaLib/")
This line would make those headers available under MediaLib/. What would be the equivalent in Bazel?
I ended up writing a rule to do this. It provides something similar to the output of a filegroup, and could be combined with cc_library in a macro.
def _impl_flat_hdr_dir(ctx):
path = ctx.attr.include_path
d = ctx.actions.declare_directory(path)
dests = [ctx.actions.declare_file(path + "/" + h.basename)
for h in ctx.files.hdrs]
cmd = """
mkdir -p {path};
cp {hdrs} {path}/.
""".format(path=d.path, hdrs=" ".join([h.path for h in ctx.files.hdrs]))
ctx.actions.run_shell(
command = cmd,
inputs = ctx.files.hdrs,
outputs = dests + [d],
progress_message = "doing stuff!!!"
)
return struct(
files = depset(dests)
)
flat_hdr_dir = rule(
_impl_flat_hdr_dir,
attrs = {
"hdrs": attr.label_list(allow_files = True),
"include_path": attr.string(mandatory = True),
},
output_to_genfiles = True,
)
So I did not test it but comming from the documentation it should be similar to:
cc_library(
name = "foo",
srcs = glob([
"video/**/*.h",
"audio/**/*.h",
],
excludes = [ "lib/source/video/codecs/*.h" ]
),
include_prefix = "MediaLib/"
)
https://docs.bazel.build/versions/master/be/c-cpp.html#cc_library.include_prefix
https://docs.bazel.build/versions/master/be/functions.html#glob