Possible to consider a lightweight version? #38

cknowles · 2017-04-14T05:37:48Z

We've just been setting this app up and it's been pretty good going so far, really appreciate how easy this has been to connect up. I have a proposal and interested to gain some feedback on it.

Currently the memory usage at rest is circa 300MB per replica without pumping any traces to it, the Docker image size sits at 681MB and boot time is around 30 seconds when we assigned moderate resource limits in Kubernetes. We haven't been running it for long enough to determine memory usage during peak. Since those figures do not compare favourably with other cluster wide services we are running, I wondered if you would you consider a lightweight version? Perhaps working on it together? The sorts of things we are comparing to are Kubernetes internals, fluentd and datadog.

Obviously creating a new version could be quite an effort so if there are any recommendations to bring the memory footprint and boot times down with the current app I'm happy to look into those instead. For the image size, the latest docker image is 3 months old, would that be reduced at all by publishing the latest?

✗ docker images | grep stackdriver
gcr.io/stackdriver-trace-docker/zipkin-collector   latest   11755ce530fb   3 months ago   681 MB

If you would consider a lightweight version, I was thinking that a Go app based on a similar base image to Kubernetes contrib services or some of the k8s system services, such as kube-dns, would be a good starting point. Most of those are based on alpine or busybox with the occasional debian, same as this project.

The text was updated successfully, but these errors were encountered:

codefromthecrypt · 2017-04-14T05:57:28Z

#36 is an attempt to use the same layers that we use upstream (also alpine). I just noticed a build bug where we are over 100MiB there and will look into it. We used to be under 100.

codefromthecrypt · 2017-04-14T06:07:45Z

actually zipkin v1.22 is under 100MiB according to dockerhub it is 94 MB. I'd expect if we implemented #36, we would be slightly under or at 100MiB image size.

The resident memory size is tunable via java arguments and can be lower, too.

cknowles · 2017-04-14T06:17:40Z

That'd be great, need any help? Could be worth adding something like this to the build to detect large changes.

Do you have any specific recommendations on Java memory sizes and on which garbage collector to use? Or we'd need to investigate further? I was thinking about submitting a Kubernetes Helm Chart for this project since I already have one internally so it would be good to set decent defaults (or set them in the Dockerfile here).

codefromthecrypt · 2017-04-14T06:24:46Z

yes! if you could help add a step to https://github.com/openzipkin/docker-zipkin to check on max image size (for the main zipkin image), that would be fantastic. In fact, we have no pull request checker at the moment, so help wanted in general there.

https://github.com/fabric8io/kubernetes-zipkin is generally used by folks, though I'm not sure if the images are the same there as here. https://github.com/openzipkin/zipkin-kubernetes didn't take off, but there was plans to pull generic stuff into there as well.

Here is mostly around java code reorg, so unless you are keen on that you can let one of us move the ball.

cknowles · 2017-04-14T06:44:57Z

Sure, will look into the image size checks soon and thanks for the links. I'll leave the project specifics to you until I have some more experience running this project.

rochdev · 2017-11-06T21:00:57Z

Any advancement on this? I think a Go rewrite would probably be the way to go for this kind of small service. I would expect ~20-30MB of RAM for a collector as opposed to the 600MB it currently uses on our test cluster.

codefromthecrypt · 2017-11-07T00:44:51Z

You can feel free to write your own service, but maybe adjusting mem arguments so that it doesnt allocate 600mb is a more sensible start

…

On 7 Nov 2017 5:00 am, "Roch Devost" ***@***.***> wrote: Any advancement on this? I think a Go rewrite would probably be the way to go for this kind of small service. I would expect ~20-30MB of RAM for a collector as opposed to the 600MB it currently uses on our test cluster. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#38 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD614g4iwKoN-qFNAgwjEKy06s0KOqtks5sz3OJgaJpZM4M9ePz> .

codefromthecrypt · 2017-11-07T00:52:08Z

I have written daemons in java that start with 32m ex denominator proxy while at netflix. What is your deployment strategy such that hundreds of mb are a problem for your architecture? Are you running one per host? If so maybe agent is a better way to classify the issue. Meanwhile sticking to technical constraints vs flaming java is a better way to get something flexible. This doesnt preclude a go option just lets try to stick with what the problems are. Hope this makes sense. Regardless i will try running the server with less than a hundred megs of ram and hopefully you can meanwhile help elaborate the background behind your low mem constraints.

…

On 7 Nov 2017 8:44 am, "Adrian Cole" ***@***.***> wrote: You can feel free to write your own service, but maybe adjusting mem arguments so that it doesnt allocate 600mb is a more sensible start On 7 Nov 2017 5:00 am, "Roch Devost" ***@***.***> wrote: > Any advancement on this? I think a Go rewrite would probably be the way > to go for this kind of small service. I would expect ~20-30MB of RAM for a > collector as opposed to the 600MB it currently uses on our test cluster. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#38 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAD614g4iwKoN-qFNAgwjEKy06s0KOqtks5sz3OJgaJpZM4M9ePz> > . >

codefromthecrypt · 2017-11-07T03:06:12Z

ps here's the existing issue on host collector (which wouldn't have the same modularity constraints as a normal collector, which would have ingress from multiple sources such as kafka)

might be happy to notice I suggested go :P
openzipkin/zipkin#1778

codefromthecrypt · 2017-11-07T04:25:47Z

added an issue about overall concerns. Let's bear in mind that this repository will be retargeted such that it can compose easier, for example allowing ingress via pubsub etc. It is easy to map an http request in to a request out, but usually there's more going on than this.

cknowles · 2017-11-07T04:37:43Z

For now we've restricted to 300MB as below and running several replicas to deal with the load. We don't necessarily need to reduce that but do wonder what the best fit GC policies are and how to determine the appropriate limits. The setup I have inside the k8s Deployment is:

env:
  - name: JVM_OPTS
    value: "-Xms64m -Xmx128m -XX:MaxMetaspaceSize=64m"
resources:
  limits:
    cpu: 500m # CPU bursts high on container start but then settles down
    memory: 300Mi
  requests:
    cpu: 50m
    memory: 300Mi

codefromthecrypt · 2017-11-07T04:47:36Z

probably best to keep things as-is if you can swing it for a week or two. Once we restructure the project you can use our tools like standard dashboards to watch things. That way, we can try varying a thing or two (and it would apply to all) https://github.com/openzipkin/docker-zipkin#prometheus but more specifically... openzipkin/zipkin#1779 on GC parameters For example, there are alternative http listeners out there.. being a part of the normal modular server build will allow us to run experiments beyond GC parameters.

cknowles · 2017-11-07T04:58:22Z

Yeah, I've had it setup like that for a few weeks already and it's running stable at current load so happy to continue with it for now until there are further recommendations.

rochdev · 2017-11-07T11:54:30Z

@adriancole Didn't mean to come out as flaming or to undermine the effort that has been put in this library. Thanks a lot for the suggestions :) I will try it at 300MB and follow the issue you have created to see if we can get it down even more.

Our system is quite small and has generally low load, which is why I am trying to save memory on system services. For example, a cluster can sometimes only consist of 2 nodes with 4-8gb of ram so 600MB (x2 replicas) for a single service can be a bit much, especially considering most of our production services run much below that.

codefromthecrypt · 2017-11-23T15:01:03Z

I put some time into the server, which seems to do a lot better than before. Sorry it did kindof suck earlier.. we weren't watching close enough openzipkin/zipkin#1806 (comment)

codefromthecrypt · 2018-02-26T01:35:10Z

zipkin-gcp is now a 10MiB layer on the normal zipkin image (which includes support for kafka, rabbitmq as well the other stuff)

020fdae15c75: Downloading [=================================>                 ]  6.985MB/10.32MB

https://github.com/openzipkin/docker-zipkin-gcp

cknowles · 2018-02-27T04:38:04Z

Nice!

Are there any recommendations for memory and CPU constraints? I've tried updating to the openzipkin/zipkin-gcp:0.1 image and I could not gain much stability with minimal traffic. I was trying JAVA_OPTS of -Xms64m -Xmx384m -XX:MaxMetaspaceSize=128m and k8s mem request/limit of 512Mi and cpu request/limit of 200m (guaranteed QoS).

codefromthecrypt · 2018-02-27T05:28:56Z

this image is effectively the same as before, just in an easier to control and affect way. One experiment you can do is to not use the "stackdriver" storage type or set COLLECTOR_SAMPLE_RATE=0 to get a baseline of what's possible with basic http gearing in place. you can also look at the prometheus setup if that helps. So, the idea is to separate the server (which we can do now) from the grcp egress (stackdriver) part. Might make something easier to solve. If it ends up server related (ex changing the stackdriver off doesn't impact the QoS) than whatever changes could help anyone using the server, and affect https://github.com/openzipkin/zipkin/issues or https://github.com/openzipkin/docker-zipkin/issues make sense?

cknowles · 2018-02-27T06:54:05Z

@adriancole sure, make sense thanks. I'll try a few of those toggles and try to work out what combinations work.

cknowles · 2018-06-19T03:16:21Z

@adriancole I got around to playing with various options with openzipkin/zipkin-gcp:0.6. With COLLECTOR_SAMPLE_RATE to set zero and STORAGE_TYPE set to mem the baseline consumption appears to be around 1.1GB per container, at least when it's not restricted in any way. Is that the same as you've experienced?

So far from running the collector across a set of clusters, restricting the memory tends to make it OOM occasionally or worse not boot and go into crash loop backoff. Not sure of the best way to follow up on this, essentially I'd just like to find an acceptable Kubernetes+JVM boxed memory config even if it means allocating 1GB min to these pods for now. Happy to try more things out or provide more concrete stats where needed.

codefromthecrypt · 2018-06-19T07:15:54Z

the zipkin-gcp image does not restrict or tune memory in any way, so it goes to defaults Severely restricting memory surely could cause things not to boot. The in-memory provider is for test purposes, it is not tuned in a way to reduce the amount of memory allocated to store things. The data structures in use would be bloated compared to what a normal in-memory cache database would do. In other words, while I wouldn't expect a leak, I also wouldn't expect a 1-1 relationship between size of json in and heap size. This is somewhat explained in the the notes written about this by @joel-airspring who implemented the memory bounding: # Maximum number of spans to keep in memory. When exceeded, oldest traces (and their spans) will be purged. # A safe estimate is 1K of memory per span (each span with 2 annotations + 1 binary annotation), plus # 100 MB for a safety buffer. You'll need to verify in your own environment. # Experimentally, it works with: max-spans of 500000 with JRE argument -Xmx600m. zipkin.storage.mem.max-spans: 500000 is the default So, the only knob you can change is the maximum amount of spans to retain. However the count is indirectly related to size as spans could be large or small depending on what your apps put into a span. It might not seem obvious, but I'm literally the only person hired to work on the entire ecosystem. The questions you ask about are somewhat routine java + containers, questions that are highly specific to the container sizes etc, but might be answerable if focus were given. There's an infinitely large population of folks who can troubleshoot heap size and answer nuanced questions about this than me personally. As this is a test setup, frankly I can't spend any more time on this. There's a lot more important issues that affect production setups, things recommended for production which certainly aren't mem, but that doesn't mean others can't help you. Maybe @alicefr who is testing a new JRE image, @joel-airspring who implemented this first, or someone from google can help answer some of how memory ends up working on docker/kubernetes etc specifically the tuning of JVM containers heap size and how that shows up on OS level tools that report memory.

cknowles · 2018-06-19T07:41:53Z

@adriancole fully understood, thanks for the help you gave so far! No need for you to spend more time on it, I'll do some more digging myself. What I'd love to be able to do at the end of this is send a PR to add some guidance to the docs about memory constraints. I'm familiar with tweaking the JVM opts, the part I'm less familiar with is zipkin but what you've mentioned in your reply regarding the max spans is likely enough for me to go on.

codefromthecrypt · 2018-06-19T07:53:51Z

a PR with guidance would be great. likely would be best in the openzipkin/zipkin repo and could be linked here or elsewhere (if about the in-mem thing) The only other breadcrumb that's easy to give is this one which is the issue that led to the in-memory bounding thing being merged openzipkin/zipkin#1631 best luck and thanks for the offer to help document your efforts.

codefromthecrypt mentioned this issue Nov 7, 2017

Document some suggestions for those who are strapped for memory openzipkin/zipkin#1779

Open

codefromthecrypt closed this as completed Feb 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to consider a lightweight version? #38

Possible to consider a lightweight version? #38

cknowles commented Apr 14, 2017

codefromthecrypt commented Apr 14, 2017

codefromthecrypt commented Apr 14, 2017

cknowles commented Apr 14, 2017

codefromthecrypt commented Apr 14, 2017

cknowles commented Apr 14, 2017

rochdev commented Nov 6, 2017

codefromthecrypt commented Nov 7, 2017 via email

codefromthecrypt commented Nov 7, 2017 via email

codefromthecrypt commented Nov 7, 2017

codefromthecrypt commented Nov 7, 2017

cknowles commented Nov 7, 2017

codefromthecrypt commented Nov 7, 2017 via email

cknowles commented Nov 7, 2017

rochdev commented Nov 7, 2017

codefromthecrypt commented Nov 23, 2017

codefromthecrypt commented Feb 26, 2018

cknowles commented Feb 27, 2018

codefromthecrypt commented Feb 27, 2018 via email

cknowles commented Feb 27, 2018

cknowles commented Jun 19, 2018

codefromthecrypt commented Jun 19, 2018 via email

cknowles commented Jun 19, 2018

codefromthecrypt commented Jun 19, 2018 via email

Possible to consider a lightweight version? #38

Possible to consider a lightweight version? #38

Comments

cknowles commented Apr 14, 2017

codefromthecrypt commented Apr 14, 2017

codefromthecrypt commented Apr 14, 2017

cknowles commented Apr 14, 2017

codefromthecrypt commented Apr 14, 2017

cknowles commented Apr 14, 2017

rochdev commented Nov 6, 2017

codefromthecrypt commented Nov 7, 2017 via email

codefromthecrypt commented Nov 7, 2017 via email

codefromthecrypt commented Nov 7, 2017

codefromthecrypt commented Nov 7, 2017

cknowles commented Nov 7, 2017

codefromthecrypt commented Nov 7, 2017 via email

cknowles commented Nov 7, 2017

rochdev commented Nov 7, 2017

codefromthecrypt commented Nov 23, 2017

codefromthecrypt commented Feb 26, 2018

cknowles commented Feb 27, 2018

codefromthecrypt commented Feb 27, 2018 via email

cknowles commented Feb 27, 2018

cknowles commented Jun 19, 2018

codefromthecrypt commented Jun 19, 2018 via email

cknowles commented Jun 19, 2018

codefromthecrypt commented Jun 19, 2018 via email