Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to consider a lightweight version? #38

Closed
cknowles opened this issue Apr 14, 2017 · 23 comments
Closed

Possible to consider a lightweight version? #38

cknowles opened this issue Apr 14, 2017 · 23 comments

Comments

@cknowles
Copy link

We've just been setting this app up and it's been pretty good going so far, really appreciate how easy this has been to connect up. I have a proposal and interested to gain some feedback on it.

Currently the memory usage at rest is circa 300MB per replica without pumping any traces to it, the Docker image size sits at 681MB and boot time is around 30 seconds when we assigned moderate resource limits in Kubernetes. We haven't been running it for long enough to determine memory usage during peak. Since those figures do not compare favourably with other cluster wide services we are running, I wondered if you would you consider a lightweight version? Perhaps working on it together? The sorts of things we are comparing to are Kubernetes internals, fluentd and datadog.

Obviously creating a new version could be quite an effort so if there are any recommendations to bring the memory footprint and boot times down with the current app I'm happy to look into those instead. For the image size, the latest docker image is 3 months old, would that be reduced at all by publishing the latest?

✗ docker images | grep stackdriver
gcr.io/stackdriver-trace-docker/zipkin-collector   latest   11755ce530fb   3 months ago   681 MB

If you would consider a lightweight version, I was thinking that a Go app based on a similar base image to Kubernetes contrib services or some of the k8s system services, such as kube-dns, would be a good starting point. Most of those are based on alpine or busybox with the occasional debian, same as this project.

@codefromthecrypt
Copy link
Member

#36 is an attempt to use the same layers that we use upstream (also alpine). I just noticed a build bug where we are over 100MiB there and will look into it. We used to be under 100.

@codefromthecrypt
Copy link
Member

actually zipkin v1.22 is under 100MiB according to dockerhub it is 94 MB. I'd expect if we implemented #36, we would be slightly under or at 100MiB image size.

The resident memory size is tunable via java arguments and can be lower, too.

@cknowles
Copy link
Author

That'd be great, need any help? Could be worth adding something like this to the build to detect large changes.

Do you have any specific recommendations on Java memory sizes and on which garbage collector to use? Or we'd need to investigate further? I was thinking about submitting a Kubernetes Helm Chart for this project since I already have one internally so it would be good to set decent defaults (or set them in the Dockerfile here).

@codefromthecrypt
Copy link
Member

yes! if you could help add a step to https://github.com/openzipkin/docker-zipkin to check on max image size (for the main zipkin image), that would be fantastic. In fact, we have no pull request checker at the moment, so help wanted in general there.

https://github.com/fabric8io/kubernetes-zipkin is generally used by folks, though I'm not sure if the images are the same there as here. https://github.com/openzipkin/zipkin-kubernetes didn't take off, but there was plans to pull generic stuff into there as well.

Here is mostly around java code reorg, so unless you are keen on that you can let one of us move the ball.

@cknowles
Copy link
Author

Sure, will look into the image size checks soon and thanks for the links. I'll leave the project specifics to you until I have some more experience running this project.

@rochdev
Copy link

rochdev commented Nov 6, 2017

Any advancement on this? I think a Go rewrite would probably be the way to go for this kind of small service. I would expect ~20-30MB of RAM for a collector as opposed to the 600MB it currently uses on our test cluster.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Nov 7, 2017 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Nov 7, 2017 via email

@codefromthecrypt
Copy link
Member

ps here's the existing issue on host collector (which wouldn't have the same modularity constraints as a normal collector, which would have ingress from multiple sources such as kafka)

might be happy to notice I suggested go :P
openzipkin/zipkin#1778

@codefromthecrypt
Copy link
Member

added an issue about overall concerns. Let's bear in mind that this repository will be retargeted such that it can compose easier, for example allowing ingress via pubsub etc. It is easy to map an http request in to a request out, but usually there's more going on than this.

@cknowles
Copy link
Author

cknowles commented Nov 7, 2017

For now we've restricted to 300MB as below and running several replicas to deal with the load. We don't necessarily need to reduce that but do wonder what the best fit GC policies are and how to determine the appropriate limits. The setup I have inside the k8s Deployment is:

env:
  - name: JVM_OPTS
    value: "-Xms64m -Xmx128m -XX:MaxMetaspaceSize=64m"
resources:
  limits:
    cpu: 500m # CPU bursts high on container start but then settles down
    memory: 300Mi
  requests:
    cpu: 50m
    memory: 300Mi

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Nov 7, 2017 via email

@cknowles
Copy link
Author

cknowles commented Nov 7, 2017

Yeah, I've had it setup like that for a few weeks already and it's running stable at current load so happy to continue with it for now until there are further recommendations.

@rochdev
Copy link

rochdev commented Nov 7, 2017

@adriancole Didn't mean to come out as flaming or to undermine the effort that has been put in this library. Thanks a lot for the suggestions :) I will try it at 300MB and follow the issue you have created to see if we can get it down even more.

Our system is quite small and has generally low load, which is why I am trying to save memory on system services. For example, a cluster can sometimes only consist of 2 nodes with 4-8gb of ram so 600MB (x2 replicas) for a single service can be a bit much, especially considering most of our production services run much below that.

@codefromthecrypt
Copy link
Member

I put some time into the server, which seems to do a lot better than before. Sorry it did kindof suck earlier.. we weren't watching close enough openzipkin/zipkin#1806 (comment)

@codefromthecrypt
Copy link
Member

zipkin-gcp is now a 10MiB layer on the normal zipkin image (which includes support for kafka, rabbitmq as well the other stuff)

020fdae15c75: Downloading [=================================>                 ]  6.985MB/10.32MB

https://github.com/openzipkin/docker-zipkin-gcp

@cknowles
Copy link
Author

Nice!

Are there any recommendations for memory and CPU constraints? I've tried updating to the openzipkin/zipkin-gcp:0.1 image and I could not gain much stability with minimal traffic. I was trying JAVA_OPTS of -Xms64m -Xmx384m -XX:MaxMetaspaceSize=128m and k8s mem request/limit of 512Mi and cpu request/limit of 200m (guaranteed QoS).

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Feb 27, 2018 via email

@cknowles
Copy link
Author

@adriancole sure, make sense thanks. I'll try a few of those toggles and try to work out what combinations work.

@cknowles
Copy link
Author

@adriancole I got around to playing with various options with openzipkin/zipkin-gcp:0.6. With COLLECTOR_SAMPLE_RATE to set zero and STORAGE_TYPE set to mem the baseline consumption appears to be around 1.1GB per container, at least when it's not restricted in any way. Is that the same as you've experienced?

So far from running the collector across a set of clusters, restricting the memory tends to make it OOM occasionally or worse not boot and go into crash loop backoff. Not sure of the best way to follow up on this, essentially I'd just like to find an acceptable Kubernetes+JVM boxed memory config even if it means allocating 1GB min to these pods for now. Happy to try more things out or provide more concrete stats where needed.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jun 19, 2018 via email

@cknowles
Copy link
Author

@adriancole fully understood, thanks for the help you gave so far! No need for you to spend more time on it, I'll do some more digging myself. What I'd love to be able to do at the end of this is send a PR to add some guidance to the docs about memory constraints. I'm familiar with tweaking the JVM opts, the part I'm less familiar with is zipkin but what you've mentioned in your reply regarding the max spans is likely enough for me to go on.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jun 19, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants