-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to consider a lightweight version? #38
Comments
#36 is an attempt to use the same layers that we use upstream (also alpine). I just noticed a build bug where we are over 100MiB there and will look into it. We used to be under 100. |
actually zipkin v1.22 is under 100MiB according to dockerhub it is 94 MB. I'd expect if we implemented #36, we would be slightly under or at 100MiB image size. The resident memory size is tunable via java arguments and can be lower, too. |
That'd be great, need any help? Could be worth adding something like this to the build to detect large changes. Do you have any specific recommendations on Java memory sizes and on which garbage collector to use? Or we'd need to investigate further? I was thinking about submitting a Kubernetes Helm Chart for this project since I already have one internally so it would be good to set decent defaults (or set them in the Dockerfile here). |
yes! if you could help add a step to https://github.com/openzipkin/docker-zipkin to check on max image size (for the main zipkin image), that would be fantastic. In fact, we have no pull request checker at the moment, so help wanted in general there. https://github.com/fabric8io/kubernetes-zipkin is generally used by folks, though I'm not sure if the images are the same there as here. https://github.com/openzipkin/zipkin-kubernetes didn't take off, but there was plans to pull generic stuff into there as well. Here is mostly around java code reorg, so unless you are keen on that you can let one of us move the ball. |
Sure, will look into the image size checks soon and thanks for the links. I'll leave the project specifics to you until I have some more experience running this project. |
Any advancement on this? I think a Go rewrite would probably be the way to go for this kind of small service. I would expect ~20-30MB of RAM for a collector as opposed to the 600MB it currently uses on our test cluster. |
You can feel free to write your own service, but maybe adjusting mem
arguments so that it doesnt allocate 600mb is a more sensible start
…On 7 Nov 2017 5:00 am, "Roch Devost" ***@***.***> wrote:
Any advancement on this? I think a Go rewrite would probably be the way to
go for this kind of small service. I would expect ~20-30MB of RAM for a
collector as opposed to the 600MB it currently uses on our test cluster.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD614g4iwKoN-qFNAgwjEKy06s0KOqtks5sz3OJgaJpZM4M9ePz>
.
|
I have written daemons in java that start with 32m ex denominator proxy
while at netflix. What is your deployment strategy such that hundreds of mb
are a problem for your architecture? Are you running one per host? If so
maybe agent is a better way to classify the issue.
Meanwhile sticking to technical constraints vs flaming java is a better way
to get something flexible. This doesnt preclude a go option just lets try
to stick with what the problems are.
Hope this makes sense. Regardless i will try running the server with less
than a hundred megs of ram and hopefully you can meanwhile help elaborate
the background behind your low mem constraints.
…On 7 Nov 2017 8:44 am, "Adrian Cole" ***@***.***> wrote:
You can feel free to write your own service, but maybe adjusting mem
arguments so that it doesnt allocate 600mb is a more sensible start
On 7 Nov 2017 5:00 am, "Roch Devost" ***@***.***> wrote:
> Any advancement on this? I think a Go rewrite would probably be the way
> to go for this kind of small service. I would expect ~20-30MB of RAM for a
> collector as opposed to the 600MB it currently uses on our test cluster.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#38 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAD614g4iwKoN-qFNAgwjEKy06s0KOqtks5sz3OJgaJpZM4M9ePz>
> .
>
|
ps here's the existing issue on host collector (which wouldn't have the same modularity constraints as a normal collector, which would have ingress from multiple sources such as kafka) might be happy to notice I suggested go :P |
added an issue about overall concerns. Let's bear in mind that this repository will be retargeted such that it can compose easier, for example allowing ingress via pubsub etc. It is easy to map an http request in to a request out, but usually there's more going on than this. |
For now we've restricted to 300MB as below and running several replicas to deal with the load. We don't necessarily need to reduce that but do wonder what the best fit GC policies are and how to determine the appropriate limits. The setup I have inside the k8s Deployment is: env:
- name: JVM_OPTS
value: "-Xms64m -Xmx128m -XX:MaxMetaspaceSize=64m"
resources:
limits:
cpu: 500m # CPU bursts high on container start but then settles down
memory: 300Mi
requests:
cpu: 50m
memory: 300Mi |
probably best to keep things as-is if you can swing it for a week or two.
Once we restructure the project you can use our tools like standard
dashboards to watch things. That way, we can try varying a thing or two
(and it would apply to all)
https://github.com/openzipkin/docker-zipkin#prometheus
but more specifically... openzipkin/zipkin#1779
on GC parameters
For example, there are alternative http listeners out there.. being a part
of the normal modular server build will allow us to run experiments beyond
GC parameters.
|
Yeah, I've had it setup like that for a few weeks already and it's running stable at current load so happy to continue with it for now until there are further recommendations. |
@adriancole Didn't mean to come out as flaming or to undermine the effort that has been put in this library. Thanks a lot for the suggestions :) I will try it at 300MB and follow the issue you have created to see if we can get it down even more. Our system is quite small and has generally low load, which is why I am trying to save memory on system services. For example, a cluster can sometimes only consist of 2 nodes with 4-8gb of ram so 600MB (x2 replicas) for a single service can be a bit much, especially considering most of our production services run much below that. |
I put some time into the server, which seems to do a lot better than before. Sorry it did kindof suck earlier.. we weren't watching close enough openzipkin/zipkin#1806 (comment) |
zipkin-gcp is now a 10MiB layer on the normal zipkin image (which includes support for kafka, rabbitmq as well the other stuff)
|
Nice! Are there any recommendations for memory and CPU constraints? I've tried updating to the |
this image is effectively the same as before, just in an easier to control
and affect way. One experiment you can do is to not use the "stackdriver"
storage type or set COLLECTOR_SAMPLE_RATE=0 to get a baseline of what's
possible with basic http gearing in place. you can also look at the
prometheus setup if that helps.
So, the idea is to separate the server (which we can do now) from the grcp
egress (stackdriver) part. Might make something easier to solve.
If it ends up server related (ex changing the stackdriver off doesn't
impact the QoS) than whatever changes could help anyone using the server,
and affect
https://github.com/openzipkin/zipkin/issues
or
https://github.com/openzipkin/docker-zipkin/issues
make sense?
|
@adriancole sure, make sense thanks. I'll try a few of those toggles and try to work out what combinations work. |
@adriancole I got around to playing with various options with So far from running the collector across a set of clusters, restricting the memory tends to make it OOM occasionally or worse not boot and go into crash loop backoff. Not sure of the best way to follow up on this, essentially I'd just like to find an acceptable Kubernetes+JVM boxed memory config even if it means allocating 1GB min to these pods for now. Happy to try more things out or provide more concrete stats where needed. |
the zipkin-gcp image does not restrict or tune memory in any way, so
it goes to defaults
Severely restricting memory surely could cause things not to boot. The
in-memory provider is for test purposes, it is not tuned in a way to
reduce the amount of memory allocated to store things. The data
structures in use would be bloated compared to what a normal in-memory
cache database would do. In other words, while I wouldn't expect a
leak, I also wouldn't expect a 1-1 relationship between size of json
in and heap size. This is somewhat explained in the the notes written
about this by @joel-airspring who implemented the memory bounding:
# Maximum number of spans to keep in memory. When exceeded, oldest
traces (and their spans) will be purged.
# A safe estimate is 1K of memory per span (each span with 2
annotations + 1 binary annotation), plus
# 100 MB for a safety buffer. You'll need to verify in your own environment.
# Experimentally, it works with: max-spans of 500000 with JRE argument -Xmx600m.
zipkin.storage.mem.max-spans: 500000 is the default
So, the only knob you can change is the maximum amount of spans to
retain. However the count is indirectly related to size as spans could
be large or small depending on what your apps put into a span.
It might not seem obvious, but I'm literally the only person hired to
work on the entire ecosystem. The questions you ask about are somewhat
routine java + containers, questions that are highly specific to the
container sizes etc, but might be answerable if focus were given.
There's an infinitely large population of folks who can troubleshoot
heap size and answer nuanced questions about this than me personally.
As this is a test setup, frankly I can't spend any more time on this.
There's a lot more important issues that affect production setups,
things recommended for production which certainly aren't mem, but that
doesn't mean others can't help you.
Maybe @alicefr who is testing a new JRE image, @joel-airspring who
implemented this first, or someone from google can help answer some of
how memory ends up working on docker/kubernetes etc specifically the
tuning of JVM containers heap size and how that shows up on OS level
tools that report memory.
|
@adriancole fully understood, thanks for the help you gave so far! No need for you to spend more time on it, I'll do some more digging myself. What I'd love to be able to do at the end of this is send a PR to add some guidance to the docs about memory constraints. I'm familiar with tweaking the JVM opts, the part I'm less familiar with is zipkin but what you've mentioned in your reply regarding the max spans is likely enough for me to go on. |
a PR with guidance would be great. likely would be best in the
openzipkin/zipkin repo and could be linked here or elsewhere (if about the
in-mem thing) The only other breadcrumb that's easy to give is this one
which is the issue that led to the in-memory bounding thing being merged
openzipkin/zipkin#1631
best luck and thanks for the offer to help document your efforts.
|
We've just been setting this app up and it's been pretty good going so far, really appreciate how easy this has been to connect up. I have a proposal and interested to gain some feedback on it.
Currently the memory usage at rest is circa 300MB per replica without pumping any traces to it, the Docker image size sits at 681MB and boot time is around 30 seconds when we assigned moderate resource limits in Kubernetes. We haven't been running it for long enough to determine memory usage during peak. Since those figures do not compare favourably with other cluster wide services we are running, I wondered if you would you consider a lightweight version? Perhaps working on it together? The sorts of things we are comparing to are Kubernetes internals, fluentd and datadog.
Obviously creating a new version could be quite an effort so if there are any recommendations to bring the memory footprint and boot times down with the current app I'm happy to look into those instead. For the image size, the latest docker image is 3 months old, would that be reduced at all by publishing the latest?
If you would consider a lightweight version, I was thinking that a Go app based on a similar base image to Kubernetes contrib services or some of the k8s system services, such as kube-dns, would be a good starting point. Most of those are based on alpine or busybox with the occasional debian, same as this project.
The text was updated successfully, but these errors were encountered: