Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.91.0 DB Crash #55

Open
evanreichard opened this issue Dec 15, 2023 · 17 comments
Open

v1.91.0 DB Crash #55

evanreichard opened this issue Dec 15, 2023 · 17 comments

Comments

@evanreichard
Copy link

evanreichard commented Dec 15, 2023

Postgres Pod Crash:

chmod: changing permissions of '/var/run/postgresql': Operation not permitted

PostgreSQL Database directory appears to contain a database; Skipping initialization

postgres: could not access the server configuration file "/bitnami/postgresql/data/postgresql.conf": No such file or directory

Chart Values:

    env:
      DB_PASSWORD:
        valueFrom:
          secretKeyRef:
            name: postgres-secrets
            key: password
    image:
      tag: v1.91.0
    immich:
      persistence:
        library:
          existingClaim: va-unraid-photos-rw
    postgresql:
      enabled: true
      auth:
        existingSecret: postgres-secrets
    redis:
      enabled: true

The postgres statefulset is appropriately configured with the following image:

        image: docker.io/tensorchord/pgvecto-rs:pg14-v0.1.11

Chart version: immich-0.3.0

@inglemr
Copy link

inglemr commented Dec 15, 2023

I ran into this as well. For what it is worth, in the meantime, I was able to resolve this by creating the missing config file "/bitnami/postgresql/data/postgresql.conf" and "/bitnami/postgresql/data/pg_hba.conf". I simply took a default config file and it allowed the database to come back online.

@evanreichard
Copy link
Author

evanreichard commented Dec 16, 2023

Thanks @inglemr

Inevitably I needed the following:

postgresql.conf

listen_addresses = '*'

pg_hba.conf

local    all             all                                     trust
host     all             all        10.0.0.0/8                   md5
host     all             all        127.0.0.1/32                 trust
host     all             all        ::1/128                      trust

I tried using the default in the bitnami images:

# Export Default Configuration
docker run -it --entrypoint /bin/bash bitnami/postgresql:14.10.0 -c "cat /opt/bitnami/postgresql/conf/postgresql.conf" > postgresql.conf
docker run -it --entrypoint /bin/bash bitnami/postgresql:14.10.0 -c "cat /opt/bitnami/postgresql/conf/pg_hba.conf" > pg_hba.conf

And while postgres came online, the other services couldn't connect to it. So I modified the above to include the changes I first mentioned.

If that's all you do, you'll need to also ensure a conf.d directory exists (referenced by the default postgresql.conf file):

kubectl exec -n immich immich-helmrelease-postgresql-0 -- mkdir /bitnami/postgresql/data/conf.d

And to copy the local file to the pod:

kubectl -n immich cp postgresql.conf immich-helmrelease-postgresql-0:/bitnami/postgresql/data
kubectl -n immich cp pg_hba.conf immich-helmrelease-postgresql-0:/bitnami/postgresql/data

@ViktorBarzin
Copy link

Same for me. Described solution also worked. However immich-server still expects TYPESENSE_API_KEY and crash loops....

/usr/src/app/node_modules/@nestjs/config/dist/config.module.js:78
                throw new Error(`Config validation error: ${error.message}`);
                ^

Error: Config validation error: "TYPESENSE_API_KEY" is required
    at ConfigModule.forRoot (/usr/src/app/node_modules/@nestjs/config/dist/config.module.js:78:23)
    at Object.<anonymous> (/usr/src/app/dist/infra/infra.module.js:49:27)
    at Module._compile (node:internal/modules/cjs/loader:1241:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1295:10)
    at Module.load (node:internal/modules/cjs/loader:1091:32)
    at Module._load (node:internal/modules/cjs/loader:938:12)
    at Module.require (node:internal/modules/cjs/loader:1115:19)
    at require (node:internal/modules/helpers:130:18)
    at Object.<anonymous> (/usr/src/app/dist/infra/index.js:19:14)
    at Module._compile (node:internal/modules/cjs/loader:1241:14)

Node.js v20.8.1

@frankprimarily
Copy link

This seems to be caused by using a docker images based on postgres/postgres and injecting it into a chart dependency that expects a bitnami version.

@dbeltman
Copy link

Thanks @inglemr

Inevitably I needed the following:


pg_hba.conf

local    all             all                                     trust
host     all             all        10.0.0.0/8                   md5
host     all             all        127.0.0.1/32                 trust
host     all             all        ::1/128                      trust

I had to mod this a little to match my pod CIDR, and put "password" instead of "md5" since the server kept nagging about no pg_hba.conf entry for "no encryption"

@alexandresoro
Copy link

alexandresoro commented Dec 19, 2023

This one caught me off guard too.
This is indeed because bitnami images have a lot of init done.

The config paths are passed at postgres startup here: https://github.com/bitnami/containers/blob/7bc9cc3e18c0c08c8019d3f6c8dd6bc0f926051e/bitnami/postgresql/16/debian-11/rootfs/opt/bitnami/scripts/postgresql/run.sh#L19
And the config is built dynamically at startup from https://github.com/bitnami/containers/blob/7bc9cc3e18c0c08c8019d3f6c8dd6bc0f926051e/bitnami/postgresql/16/debian-11/rootfs/opt/bitnami/scripts/libpostgresql.sh#L170
So yes, copying the content from /opt/bitnami/postgresql/conf to /bitnami/postgresql/data does the trick.
If your instance is running and the volume is mounted, I guess you can simply copy-paste that.

Also, from what I see, it seems that the default UID is different: 1001 for the bitnami image, and 999 for the default postgres one.
You might wish to adjust that with something like:

postgresql:
  image:
    repository: tensorchord/pgvecto-rs
    tag: pg16-v0.1.11

  primary:
    podSecurityContext:
      fsGroup: 999

    containerSecurityContext:
      runAsUser: 999

  volumePermissions:
    enabled: true # By setting this, you make sure that the existing data get chmod'd to UID 999

@frankprimarily
Copy link

As this is caused by replacing the used image, this might only be an issue for a migration to the 1.91+ immich release.
Has anyone tried to deploy from scratch (empty pvc)? If this is working, we probably can close this issue - or we implement some kind of migration script.

@LordGaav
Copy link

LordGaav commented Dec 21, 2023

Switching from a Bitnami to a normal PostgreSQL image is a really bad idea, every user that updates this Helm chart is going to run into this exact issue. This caught me off guard because there was no breaking change warning about this, only about Immich itself which mentions docker-compose but not Helm, and the section about 'normal' Postgres images also does not mention it.

I'm looking into what needs to be done to fix this for myself, because this also involves a PostgreSQL update from 11 to 14.

Edit: in the end I ended up rolling back the Helm chart to 0.2.0 and the immich tag to v1.90.2 . There is no easy way to install the pgvector.rs extension in the Bitnami image without also running PostgreSQL as root as far as I can see.

@SoarinFerret
Copy link

quick repo I threw up with a bitnami compatible build of pgvecto.rs: https://github.com/SoarinFerret/bitnami-postgres-pgvecto-rs

This does NOT run the PostgreSQL as root - so should fix your concerns @LordGaav

if you update your values.yaml to the following:

postgressql:
  image:
    registry: ghcr.io
    repository: soarinferret/bitnami-postgres-pgvecto-rs
    tag: pg14.5-v0.1.11

you can update to the latest release. Hope this helps someone. I think long term my plan will be to stop running this in k8s, so I don't intend to keep the repo updated.

Here is the dockerfile for anyone who may want to build themselves:

ARG PGVECTORS_TAG=pg14-v0.1.11-amd64
ARG BITNAMI_TAG=14.5.0-debian-11-r6
FROM scratch as nothing
FROM tensorchord/pgvecto-rs-binary:${PGVECTORS_TAG} as binary

FROM docker.io/bitnami/postgresql:${BITNAMI_TAG}
COPY --from=binary /pgvecto-rs-binary-release.deb /tmp/vectors.deb
USER root
RUN apt-get install -y /tmp/vectors.deb && rm -f /tmp/vectors.deb && \
     mv /usr/lib/postgresql/*/lib/vectors.so /opt/bitnami/postgresql/lib/ && \
     mv usr/share/postgresql/*/extension/vectors* opt/bitnami/postgresql/share/extension/
USER 1001
ENV POSTGRESQL_EXTRA_FLAGS="-c shared_preload_libraries=vectors.so"

@LordGaav
Copy link

LordGaav commented Jan 1, 2024

Can confirm your image works @SoarinFerret , thanks. I only had to do CREATE EXTENSION vectors on the immich database.

@alexbarcelo
Copy link

alexbarcelo commented Jan 1, 2024

I had some long hours of frustration with strange permissions errors, version mismatching and some issues with @SoarinFerret fix (don't get me wrong, thanks a lot for providing the image, I am not sure why it was not working me, it worked for other people).

My more drastic fix was to perform a backup & restore:

  • Start with a pgdump style backup (I have a daily one from k8up.io, but you can do one manually)
  • Update the chart to 0.3.x
  • Stop immich-microservices and immich-server (very important! if you don't stop them they will migrate and dirty your database during the next steps... been there, done that, started again).
  • Stop the postgres StatefulSet
  • Clean up the PVC of the StatefulSet (there, it's dangerous to go alone, take this)
  • Start the postgres (will be a pristine installation)
  • Load the previous backup
  • Start all services

I hope someone benefits from this. If you already have backups in place, this whole procedure can be done in under 10 minutes. Otherwise, you may need some back&forth (maybe some rolling back and trying again). Good luck!

EDIT: DISCLAIMER: Those steps can break your installation and/or you can lose data, make sure to know what you are doing, make a backup before even trying, if you are not sure of the procedure train and dry-run it with a playground environment in order to try it in a safe manner, understand all the steps before attempting them, etc.

@Nepoxx
Copy link

Nepoxx commented Jan 2, 2024

Thanks @inglemr

Inevitably I needed the following:

postgresql.conf

listen_addresses = '*'

pg_hba.conf

local    all             all                                     trust
host     all             all        10.0.0.0/8                   md5
host     all             all        127.0.0.1/32                 trust
host     all             all        ::1/128                      trust

I tried using the default in the bitnami images:

# Export Default Configuration
docker run -it --entrypoint /bin/bash bitnami/postgresql:14.10.0 -c "cat /opt/bitnami/postgresql/conf/postgresql.conf" > postgresql.conf
docker run -it --entrypoint /bin/bash bitnami/postgresql:14.10.0 -c "cat /opt/bitnami/postgresql/conf/pg_hba.conf" > pg_hba.conf

And while postgres came online, the other services couldn't connect to it. So I modified the above to include the changes I first mentioned.

If that's all you do, you'll need to also ensure a conf.d directory exists (referenced by the default postgresql.conf file):

kubectl exec -n immich immich-helmrelease-postgresql-0 -- mkdir /bitnami/postgresql/data/conf.d

And to copy the local file to the pod:

kubectl -n immich cp postgresql.conf immich-helmrelease-postgresql-0:/bitnami/postgresql/data
kubectl -n immich cp pg_hba.conf immich-helmrelease-postgresql-0:/bitnami/postgresql/data

How can you run those if the container is in a crash loop?

@evanreichard
Copy link
Author

@Nepoxx hacky, not sure if theres a better way, but I edit the StatefulSet container definition with:

command: ["sleep"]
args: ["infinity"]

Then do my changes, then remove the edit.

@PixelJonas
Copy link
Contributor

Ufff .... there are so many things to unpack here.

First of all, @ALL who are currently contributing to this thread: THANK YOU VERY MUCH 🙏

I finally found some time to upgrade my cluster apps and I knew that upgrading immich will be "a thing" which is why I did not do it for a while. Your comments and solutions were tremendous in helping me.

Honestly, I don't like the inclusion of the database for immich in this chart. The Sub-Charts of Bitnami have no way of upgrading between major versions, so we inherited the breaking changes that will come with it. with #61 we'd start building our own database image which would lead to us making sure this is updated and maintained and again - still breaking changes between major versions.

I did a LOT of manual steps to get the database for immich working, which included finding the correct pgvector image to use 🙈

The pgvecto.rs extension version is 0.2.0 instead of 0.1.11.
Please run 'DROP EXTENSION IF EXISTS vectors' and switch to 0.1.11, such as with the docker image 'tensorchord/pgvecto-rs:pg16-v0.1.11'.

This seems like there is an endorsement for an image to use as a database from the main project and I'd like us to use the same and/or have a collaborative effort to create an image on the main account of this project (rather than adding an Image to this Helm-Chart).

I'd also like us to condense the "manual migration" from the bitnami chart to this new version in a CHANGELOG and introduce it as a breaking change to this chart.

What are your thoughts about that? @bo0tzz you have any input?

@bo0tzz
Copy link
Member

bo0tzz commented Feb 13, 2024

Sorry about the breaking change and the silence afterwards folks! Unfortunately the way we have the helm chart releases set up right now means there is very little testing, and also no easy way to communicate (breaking) changes. On top of that, I got lazy and did too little manual testing when making this breaking release.

Like Jonas mentions, I'm also not a fan of including the database in this chart. On the other hand, I can see the argument for being able to deploy Immich easily without needing to set up several (postgres, redis) external dependencies manually.

I'm not sure what the best way forward is, but if possible I would like to avoid needing to maintain a(nother) database image. The upstream pgvecto.rs image is perfectly good other than not playing nice with the bitnami postgres chart (for which I blame bitnami). Since we're running on kubernetes, there's also the option of database operators with for example https://github.com/tensorchord/cloudnative-pgvecto.rs (for cloudnative-pg) or https://github.com/chkpwd/cdpgvecto.rs (for crunchydata PGO).

Finally, the release process of this chart needs to be improved as currently it just releases on every merge to main. Ideally releases should happen through Github Releases instead, and include proper changelogs and such. Any help with that would be very welcome.

@martimors
Copy link
Contributor

Doesn't the chart help configure postgresql with the geodata-plugin etc.? I think for that reason it's convenient to include it, although I would personally not mind running my own bitnami-postgres alongside immich.

@bo0tzz
Copy link
Member

bo0tzz commented Oct 4, 2024

If any of you have feedback on the proposal in #129, I'd love to hear it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.