I am about to embark on a project using Apache Hadoop/Hive which will involve a collection of hive query scripts to produce data feeds for various down stream applications. These scripts seem like ideal candidates for some unit testing - they represent the fulfillment of an API contract between my data store and client applications, and as such, it's trivial to write what the expected results should be for a given set of starting data. My issue is how to run these tests.
If I was working with SQL queries, I could use something like SQLlite or Derby to quickly bring up test databases, load test data and run a collection of query tests against them. Unfortunately, I am unaware of any such tools for Hive. At the moment, my best thought is to have the test framework bring up a hadoop local instance and run Hive against that, but I've never done that before and I'm not sure it will work, or be the right path.
Also, I'm not interested in a pedantic discussion about if what I am doing is unit testing or integration testing - I just need to be able to prove my code works.
Hive has special standalone mode, specifically design for the testing purposes. In this case it can run without hadoop. I think it is exactly what you need.
There is a link to the documentation:
http://wiki.apache.org/hadoop/Hive/HiveServer
I'm working as part of a team to support a big data and analytics platform, and we also have this kind of issue.
We've been searching for a while and we found two pretty promising tools: https://github.com/klarna/HiveRunner https://github.com/bobfreitas/HadoopMiniCluster
HiveRunner is a framework built on top of JUnit to test Hive Queries. It starts a standalone HiveServer with in memory HSQL as the metastore. With it you can stub tables, views, mock samples, etc.
There are some limitations on Hive versions though, but I definitely recommend it
Hope it helps you =)
You may also want to consider the following blog post which describes automating unit testing using a custom utility class and ant: http://dev.bizo.com/2011/04/hive-unit-testing.html
I know this is an old thread, but just in case someone comes across it. I have followed up on the whole minicluster & hive testing, and found that things have changed with MR2 and YARN, but in a good way. I have put together an article and github repo to give some help in it:
http://www.lopakalogic.com/articles/hadoop-articles/hive-testing/
Hope it helps!
Related
Lets say I want write a unit test for the example show here :
https://github.com/confluentinc/kafka-streams-examples/blob/5.1.2-post/src/main/java/io/confluent/examples/streams/WikipediaFeedAvroLambdaExample.java
I tried the following methods, both which did not work out for me:
1) Use TopologyTestDriver.
This class is pretty useful as long as schema registry is not involved.
I tried making use of MockSchemaRegistryClient but it didn't work out.
And even if it does work out, it requires that I create my own serializers which kind of defeats the purpose of schema registry.
2) Use EmbeddedSingleNodeKafkaCluster defined in the same project.
https://github.com/confluentinc/kafka-streams-examples/blob/5.1.2-post/src/test/java/io/confluent/examples/streams/kafka/EmbeddedSingleNodeKafkaCluster.java
Now this class is really handy and seems to have embedded kafka cluster and schema registry. But it does not seem to be available in any artifact. Consequently I tried copying the class but ran into further import issues.
Unable to download this particular artifact : io.confluent:kafka-schema-registry-client:5.0.0:tests
Has anyone able to make progress with the above mentioned options? Or even a completely different solution?
For doing this I ended up doing this small test library based on testcontainers: https://github.com/vspiliop/embedded-kafka-cluster. Starts a fully configurable docker based Kafka cluster (broker, zookeeper and Confluent Schema Registry) as part of your tests. Check out the example unit and cucumber tests.
Jenkins has nice plugin for test results.
Test Results Analyzer Plugin (plugin url).
How to attach the same dashboard (with test results) for GitLab?
Here is my googling result:
Gitlab has "pages" feature but it depends on artifact life and only the last one is shown.
https://docs.gitlab.com/ee/ci/junit_test_reports.html
On the other hand there is an open and long story issue about this requirement for gitlab besides lots of closed ones.
https://gitlab.com/gitlab-org/gitlab-ce/issues/34102
Here the gitlab's info in this comparison page:
https://about.gitlab.com/comparison/gitlab-vs-jenkins.html
"Many languages use frameworks that automatically run tests on your code and create a report: one example is the JUnit format that is common to different tools. GitLab supports browsing artifacts and you can download reports, but we’re still working on a proper way to integrate them directly into the product."
I found a project for a complete solution and fully supported. I just started with this one and for now it is unbelievable. Just upload your reports to it. There are a lot of plug in support.
http://reportportal.io/
I would like to know how it is possible to perform unit tests on the ETL developed on Talend.
My ETLs performs files reading, files generation, and connection with SAP system. (read/write IDOC).
Is there any tools? Is what it takes to develop a small java Test Framework?
Yes, Mohcine, Talend introduced in version 6 test case automation which is part of its overall Continuous Integration framework. You right click on a component in a job and select "Create Test Case". It will create a skeleton test case job. You can extend this test case job to perform a variety of tests, including db connectivity and results. It will take some to learn the tool to make it useful, but worth the effort. Also, this feature may only be available in the subscription version of Talend, I am not sure if its available in Open Studio.
Here is an example: diagram is a very simple job that loads a file into a db table.
Here is the test case I created by first generating the skeleton, then modifying it for my specific purposes.
Here is the assert where I match the number of rows read from the file with the number of rows inserted into the db table.
For further info check out this tutorial.
let me begin by saying I 'm a coldfusion newbie.
I 'm trying to research if its possible to do the following and what would be the best approach to achieve it.
Whenever a developer checks in code into SVN, I would like to do a get all the new changes/files and do an auto build to check if the code can be deployed successfully to production server. I guess there are two parts to it, one syntax checking and second integration test(if functionality is working as expected). For the later part some unit test tools would have to be used.
Can someone comment on their experience doing something similar for coldfusion.
Sorry for being a bit vague...I know its a very open-ended question but any feedback would be appreciated.
Thanks
There's a project called "Cloudy With A Chance of Tests" that purports to do what you require. In particular it brings together a number of other CFML code analysis projects (VarScope & QueryParam) to check code, as well as unit testing. I am not currently using it myself but did have a look at it some time ago (more than 12 months) and it appeared to be quite good.
https://github.com/mhenke/Cloudy-With-A-Chance-Of-Tests
Personally I run MXUnit tests in Jenkins using the instructions from the MXUnit site - available here:
http://wiki.mxunit.org/display/default/Continuous+Integration+--+Running+tests+with+Jenkins
Essentially this is set up as an ant task in Jenkins, which executes the MXUnit tests and reports back the results.
We're not doing fully continuos integration, but we have a process which automates some of the drudgery of our builds:
replace the site's application.cf(m|c) with one that tells users that the app is being deployed (we had QA staff raising defects that were due to re-deployments)
read a database manifest XML which lists all SQL scripts which make up the current release. We concatenate the scripts into a single upgrade script, suitable for shipping
execute the SQL script against the server's DB, noting any errors. The concatenation process also adds a line of SQL after each imported script that white to a runlog table, so we can see what ran, how long it took and which build it was associated with. If you're looking to replicate this step, take a look at Liquibase
deploy the latest code
make an http call to a ?reset=true type URL to tell the app to re-initialize
execute any tests
The build is requested manually through the build servers we have, but you click a button, make tea and it's done.
We've just extended the above to cope with multiple servers in a cluster and it ticks along nicely. I think the above suggestion of using the Jenkins SVN plugin to automate the process sounds like the way to go.
My company uses a lot of different web services on daily bases. I find that I repeat same steps over and over again on daily bases.
For example, when I start a new project, I perform the following actions:
Create a new client & project in Liquid Planner.
Create a new client Freshbooks
Create a project in Github or Codebasehq
Developers to Codebasehq or Github who are going to be working on this project
Create tasks in Ticketing system on Codebasehq and tasks in Liquid Planner
This is just when starting new projects. When I have to track tasks, it gets even trickier because I have to monitor tasks in 2 different systems.
So my question is, is there a tool that I can use to create a web service that will automate some of these interactions? Ideally, it would be something that would allow me to graphically work with the web service API and produce an executable that I can run on a server.
I don't want to build it from scratch. I know, I can do it with Python or RoR, but I don't want to get that low level.
I would like to add my sources and pass data around from one service to another. What could I use? Any suggestions?
Progress DataXtend Semantic Integrator lets you build WebServices through an Eclipse based GUI.
It is a commercial product, and I happen to work for the company that makes it. In some respects I think it might be overkill for you, as it's really an enterprise-level data mapping tool for mapping disparate data sources (web services, databases, xml files, COBOL) to a common model, as opposed to a simple web services builder, and it doesn't really support your github bits, anymore than normal Eclipse plugins would.
That said, I do believe there are Mantis plugins for github to do task tracking, and I know there's a git plugin for Eclipse that works really well (jgit).
Couldn't you simply use Selenium to execute some of this tasks for you? Basically as long as you can do something from the browser, Selenium will also be able to do. Selenium comes with a language called "selenese", so you can even use it to programmatically create an "API" with your tasks.
I know this is a different approach to what you're originally looking for, but I've been using selenium for a number of tasks, and found it's even good to execute ANT tasks or unit tests.
Hope this helps you
What about Apache Camel?
Camel lets you create the Enterprise Integration Patterns to implement routing and mediation rules in either a Java based Domain Specific Language (or Fluent API), via Spring based Xml Configuration files or via the Scala DSL. This means you get smart completion of routing rules in your IDE whether in your Java, Scala or XML editor.
Apache Camel uses URIs so that it can easily work directly with any kind of Transport or messaging model such as HTTP, ActiveMQ, JMS, JBI, SCA, MINA or CXF Bus API together with working with pluggable Data Format options.