Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automated testing for zipkin-gcp #130

Closed
4 tasks done
saturnism opened this issue Jul 30, 2019 · 45 comments
Closed
4 tasks done

Add automated testing for zipkin-gcp #130

saturnism opened this issue Jul 30, 2019 · 45 comments
Assignees

Comments

@saturnism
Copy link
Collaborator

saturnism commented Jul 30, 2019

Add automated testing for zipkin-gcp against a stackdriver project.

@codefromthecrypt
Copy link
Member

Then you can emulate the existing zipkin-example-foo tests by sending a generated trace https://github.com/openzipkin/zipkin-js-example/blob/master/.circleci/config.yml

then you can use the rest api of stackdriver to check, or just check /metrics instead if that's too hard

@codefromthecrypt
Copy link
Member

cc @openzipkin/devops-tooling

@meltsufin
Copy link
Collaborator

Are we using Circle CI or Travis for testing this project? I only see Travis config and it skips test.
I can add a service account key to either for enabling some integration tests against Stackdriver.
Also, which tests would you want to run?

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Aug 2, 2019 via email

@meltsufin
Copy link
Collaborator

So, I understood you correctly, we should use Travis -- sure comfortable with that -- and test by basically assembling a server and sending a trace through it and verifying it. We can then figure out when we want the test to run. I think we should keep the test in this repo, since it is where the code that we're testing is maintained.
That sounds good.

@saturnism saturnism assigned saturnism and unassigned saturnism Aug 2, 2019
@saturnism
Copy link
Collaborator Author

@elefeint created a test project and service account that we can use to testing. i'll take a quick stab at this!

@saturnism saturnism self-assigned this Aug 20, 2019
@saturnism
Copy link
Collaborator Author

@adriancole I noticed that the travis ci is configured to only work on a release tag, and tests are skipped. there is a similar setup for zipkin-aws.

Secondly, all the unit tests works w/o a real stackdriver backend (that's nice). i don't think we want to change that behavior.

I feel we can add new integration tests in src/it for relevant modules, and only run on verify and/or w/ a flag.

Should we enable it in the master branch? or, enable tests and integration tests on release tag?

@saturnism
Copy link
Collaborator Author

oops, i'm mistaken; the ci works against master, and the tests are triggered from publish.sh

@saturnism
Copy link
Collaborator Author

For some reason, tests in sender-stackdriver are not run.

https://travis-ci.org/openzipkin/zipkin-gcp/jobs/575119300#L1640

[INFO] Results:
[INFO] 
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] --- maven-jar-plugin:3.1.2:jar (default-jar) @ zipkin-sender-stackdriver ---
[INFO] Building jar: /home/travis/build/openzipkin/zipkin-gcp/sender-stackdriver/target/zipkin-sender-stackdriver-0.13.5-SNAPSHOT.jar
[INFO] 

@elefeint
Copy link
Contributor

I was wondering about that. I think it's because JUnit5 is on the classpath, but the imports are not Jupiter imports, so no tests are found.


[DEBUG]    io.zipkin.zipkin2:zipkin-tests:jar:2.16.1:test                                            
[DEBUG]       org.junit.jupiter:junit-jupiter:jar:5.5.1:test                                         
[DEBUG]          org.junit.jupiter:junit-jupiter-api:jar:5.5.1:test                                  
[DEBUG]             org.apiguardian:apiguardian-api:jar:1.1.0:test                                   
[DEBUG]             org.opentest4j:opentest4j:jar:1.2.0:test                                         
[DEBUG]             org.junit.platform:junit-platform-commons:jar:1.5.1:test                         
[DEBUG]          org.junit.jupiter:junit-jupiter-params:jar:5.5.1:test                               
[DEBUG]          org.junit.jupiter:junit-jupiter-engine:jar:5.5.1:test                               
[DEBUG]             org.junit.platform:junit-platform-engine:jar:1.5.1:test  

@elefeint
Copy link
Contributor

Yeah. propagation tests run fine because the project does NOT import io.zipkin.zipkin2:zipkin-tests.

@saturnism
Copy link
Collaborator Author

saturnism commented Aug 23, 2019

@elefeint and I were able to make all the tests run as-is with Junit 4 annotation by adding
org.junit.vintage:junit-vintage-engine dependency.

Also found out that there are existing IT* integration tests that are just not ran unless if surefire is configured to run them. I'll follow the same convention rather than having a separate src/it directory.

Will send in a few PRs:

  1. To add the vintage dependency just so all the tests will run.
  2. It may make sense to add vintage dependency to zipkin-tests instead in the long run
  3. Update failsafe configuration to run integration tests for the verify phase
  4. Add more integration tests that sends data to Stackdriver

wdyt, @adriancole ?

@codefromthecrypt
Copy link
Member

I think the vintage engine being in zipkin-tests is a very good idea as not running tests by accident is quite nasty. I'm sure @anuraaga agrees. We didn't intend to stop those..

@codefromthecrypt
Copy link
Member

the stuff you mention makes sense btw. yeah for example, zipkin-reporter has failsafe properly configured, I guess we don't here

@codefromthecrypt
Copy link
Member

openzipkin/zipkin#2778

codefromthecrypt pushed a commit to openzipkin/zipkin that referenced this issue Aug 24, 2019
This prevents us from interfering with JUnit 4 runtimes.

See openzipkin/zipkin-gcp#130
codefromthecrypt pushed a commit to openzipkin/zipkin that referenced this issue Aug 24, 2019
@saturnism
Copy link
Collaborator Author

Alright! we are almost there. After this service account, will see how we do this for docker-zipkin-gcp and/or another full server test.

@saturnism
Copy link
Collaborator Author

Trying to add integration test to stackdriver storage as well. I can't seem to find how auth token was added to the call. Any ideas?

Secondly, for docker-zipkin-gcp, what's the build/release process like for that one? E.g., is it automatically triggered from the release from this repo, or is there a separate process? I think adding integration tests directly to docker-zipkin-gcp makes sense as part of the release step.

@saturnism
Copy link
Collaborator Author

Trying to add integration test to stackdriver storage as well. I can't seem to find how auth token was added to the call. Any ideas?

Ah, I found it through the autoconfiguration: https://github.com/openzipkin/zipkin-gcp/blob/master/autoconfigure/storage-stackdriver/src/main/java/zipkin/autoconfigure/storage/stackdriver/ZipkinStackdriverStorageAutoConfiguration.java#L127

@saturnism
Copy link
Collaborator Author

For Zipkin GCP stackdriver module:

wget https://jitpack.io/com/github/openzipkin/zipkin-gcp/zipkin-autoconfigure-storage-stackdriver/master-SNAPSHOT/zipkin-autoconfigure-storage-stackdriver-master-SNAPSHOT-module.jar

@saturnism
Copy link
Collaborator Author

wget wget https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar
unzip zipkin-server-master-SNAPSHOT-exec.jar -d zipkin
wget https://jitpack.io/com/github/openzipkin/zipkin-gcp/zipkin-autoconfigure-storage-stackdriver/master-SNAPSHOT/zipkin-autoconfigure-storage-stackdriver-master-SNAPSHOT-module.jar
unzip zipkin-autoconfigure-storage-stackdriver-master-SNAPSHOT-module.jar -d stackdriver

STORAGE_TYPE=stackdriver \
STACKDRIVER_PROJECT_ID=... \
GOOGLE_APPLICATION_CREDENTIALS=... \
java -Dloader.path=stackdriver -Dspring.profiles.active=stackdriver -cp zipkin/ org.springframework.boot.loader.PropertiesLauncher

To validate:

curl -v localhost:9411/actuator/health
*   Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 9411 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connection failed
* connect to 127.0.0.1 port 9411 failed: Connection refused
* Failed to connect to localhost port 9411: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 9411: Connection refused
rayt-macbookpro2:kubernetes rayt$ curl -v localhost:9411/actuator/health
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9411 (#0)
> GET /actuator/health HTTP/1.1
> Host: localhost:9411
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/vnd.spring-boot.actuator.v2+json
< content-length: 347
< 
* Connection #0 to host localhost left intact
{"status":"DOWN","details":{"zipkin":{"status":"DOWN","details":{"StackdriverStorage{wise-coyote-82123}":{"status":"DOWN","details":{"error":"com.linecorp.armeria.common.grpc.protocol.ArmeriaStatusException: Requested entity was not found."}}}},"diskSpace":{"status":"UP","details":{"total":499963170816,"free":62758592512,"threshold":10485760}}}}

Actuator health endpoint returns 200 OK even though status is DOWN. :/ I guess we'll have to parse output w/ jq and look for UP

curl --silent localhost:9411/actuator/health| jq -r .details.zipkin.status

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Sep 6, 2019 via email

@anuraaga
Copy link
Collaborator

anuraaga commented Sep 6, 2019

I agree jitpack is fine too but any reason not to use the jfrog snapshots?

https://oss.jfrog.org/artifactory/oss-snapshot-local/

@saturnism
Copy link
Collaborator Author

i'm ok w either repo. but, to @anuraaga 's point in openzipkin-attic/docker-zipkin#86 (comment) - when should this run? I'm thinking only on master commits (and new tags?), since PRs can't get access to credentials anyways.

@anuraaga
Copy link
Collaborator

anuraaga commented Sep 6, 2019

Ah was thinking PR at first but the credentials are definitely an issue - master commits SGTM

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Sep 6, 2019 via email

abesto pushed a commit to abesto/zipkin that referenced this issue Sep 10, 2019
@saturnism
Copy link
Collaborator Author

From PR comment @anuraaga :

I think a similar server test could be done by using maven project and adding the snapshot repo and dependency on the exec jar. Then you could just call ZipkinServer.main. A lot of the logic here like the massive wait-for-it would be simpler in Java.

What do you think of that approach?

I considered staying w/ the JVM and explored using a JUnit test, or as a IntegrationTest. There were a number of issues that needs to be resolved/thought through:

  • zipkin-gcp-parent brings in logback, it doesn't work well w/ Zipkin Server. When running this in a test scope - slf4j fails fast and nothing is started.
  • if we don't inherit the parent, then we have to specify all the versions. How do we keep versions up-to-date?
  • Similarly, if we don't use Jitpack, depending on snapshot still requires a version, such as 2.16.3-SNAPSHOT. What if 2.16.3 is released? We need a consistent way to update that as well.
  • Finally if we run this under test scope, we may be testing the wrong thing as test scope brings in more dependencies than the runtime. I'm not sure what they are going to be.

I feel it's best to test this the way it's meant to be assembled and run. wait-for-it is a simple script that we don't need to maintain. It's straight forward outside of that and tests the actual runtime conditions (outside of a container).

I'm also thinking - this test only tests against the latest of Zipkin Server. However, if Zipkin server produced a breaking change, and we never really committed any code here - then we'll still miss it.

The only time we are certain of which versions we want to put together is when we produce the Docker container. Would it make sense to remove this test here, and simply to the container test?

Or, it might make more sense to have a completely separate repository w/ only tests of different zipkin server + autoconfigurations.

@codefromthecrypt
Copy link
Member

cc @openzipkin/core

Let's all please never suggest or support zipkin integrations via dependencies. Following that, there's zero point in testing via dependencies except in the core server tree itself.

Those doing gitter support know we are plagued with ad-hoc questions about custom servers via dependencies, sometimes spamming us on multiple channels about custom servers. This happens to the point of us wanting to refactor our entire README strategy. cc @jorgheymans who is trying to help sort this part.

Refresh of the general sort of problems

A. people will get mixed signals, seeing us test an integration eventhough we actively don't support it.
B. It hides real integration problems
C. it doesn't follow our README here or in any similar repos like zipkin-aws

Anyway please please please don't contribute to our support burden. Test integrated things as documented and supported, don't setup false tests that look like they pass but don't, or setup tests that act as an accidental gateway for people to start asking us to support custom servers.

@codefromthecrypt
Copy link
Member

PS to clarify what's desired imho vs what's not. I like almost exactly what Ray did on the PR because it is simple to understand, with least logic. I really would not like to move testing to another repo. There's a glitch in the PR but otherwise it is right on.

We have so many problems to deal with, above all we need our integration tests to not become new problems or introduce complex repository dependencies that no-one knows how to do.

This follows the history since 2015 where @abesto has always made our shell scripts act as if humans type the commands as sometimes we need to (I certainly do). This has helped avoid disaster multiple occasions and also increases the quickness of resolving problems. For example, if we move this sort of "README testing" into java, it would only increase the complexity, limit people who would want to help to java people, and also not actually test the README, or things that act like it such as our docker build. If nothing else, the history of this repo should teach us this lesson.. the multitude of integration problems with boringssl etc. Let's please KISS

@anuraaga
Copy link
Collaborator

All the reasons make sense to me to stick with the current approach. And yeah I think even better would be to just run the test when building the docker container. I think we can get the current one in for now and keep or remove it later if there's something better. I'll look into the docker test on Wednesday

@codefromthecrypt
Copy link
Member

.. I think even better would be to just run the test when building the docker container.

that's a great idea! the container is built on change to master now, so yeah similar to how we run tests before uploading snapshots, this looks like the right thing to do before master tag

@saturnism
Copy link
Collaborator Author

saturnism commented Sep 17, 2019

#150 (comment)

So, I think the autoconfiguration fails against Zipkin master because the master was updated to use Armeria 0.91. I think it introduced non-backward compatible change.

This is validated by changing zipkin-gcp's armeria.version to 0.91.0, and the unit tests starts to fail too:

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   StackdriverSpanConsumerTest.accept » NoSuchMethod zipkin2.storage.stackdriver....
[ERROR]   StackdriverSpanConsumerTest.verifyCheckReturnsFailureWhenServiceFailsForUnknownReason » NoSuchMethod
[ERROR]   StackdriverSpanConsumerTest.verifyCheckReturnsFailureWhenServiceFailsWithKnownGrpcFailure » NoSuchMethod
[ERROR]   StackdriverSpanConsumerTest.verifyCheckReturnsOkWhenExpectedValidationFailure » NoSuchMethod
[ERROR]   StackdriverSpanConsumerTest.verifyCheckReturnsOkWhenServiceSucceeds » NoSuchMethod
[INFO] 
[ERROR] Tests run: 6, Failures: 0, Errors: 5, Skipped: 0

I guess it was useful to know that Zipkin GCP currently doesn't work w/ Zipkin master. But since we don't know what version combination will be assembled by the container, we should defer the test when the combination is known, i.e., when we are building the container.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Sep 17, 2019 via email

@saturnism
Copy link
Collaborator Author

should we abandon the server test here and supersede in openzipkin-attic/docker-zipkin#86 ?

@codefromthecrypt
Copy link
Member

you mean only test integration via docker? (ex no separate test integrating with java ...)

@saturnism
Copy link
Collaborator Author

@adriancole ah just got back from vacation. i did mean only docker. but i see that you also found testing here to be useful. thnx for updating the PR!

@saturnism
Copy link
Collaborator Author

finally got some downtime and cycle to finish this up w/ the docker hub test hook discussed in openzipkin-attic/docker-zipkin#86

@saturnism
Copy link
Collaborator Author

docker hub seems to be much harder to test/validate my changes... log entries are not streamed... no log output after 10min :(

@saturnism
Copy link
Collaborator Author

@anuraaga everything is merged; but i don't think we have autotest turned on. Want to trigger the test manually and make sure everything works?

@anuraaga
Copy link
Collaborator

@saturnism Nope don't think there was anything wrong with the configuration - looked at the webhook logs in GitHub and don't see anything corresponding to the master commit. Guess GitHub might have had a random blurp so I pushed another mostly no-op commit which is building now - https://cloud.docker.com/u/openzipkin/repository/registry-1.docker.io/openzipkin/zipkin-gcp/builds/2df22c28-c7eb-4155-92ee-bebe135a99c9

@saturnism
Copy link
Collaborator Author

Oh good - thanks! Great to see this working. I feel we finished all the tests :D Can finally close now.

@anuraaga
Copy link
Collaborator

anuraaga commented Nov 5, 2019

@saturnism We had one last question on whether we keep the current Travis Jitpack-based server integration test or not now that we have the docker container one. I personally lean towards removing it since the Docker test is very similar without relying on yet another artifact store. What do you think?

@saturnism
Copy link
Collaborator Author

i thought @adriancole wanted to keep it as part of the CI for new code, so that it's caught earlier than container creation, against the master branch of the server. Also, the check on GitHub is slightly more stringent (i.e., it validates the payload response).

The code is already written. Unless there is significant overhead, it doesn't hurt to keep. I'm agnostic in the end though.

@anuraaga
Copy link
Collaborator

anuraaga commented Nov 6, 2019

I think that was before baking the master builds of the container - now that docker always builds and tests and sends emails on breakage I guess it's enough. But true that it's already code that's been written, my main concern is using Jitpack. Let's leave it and if it ever flakes even once delete it then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants