-
Notifications
You must be signed in to change notification settings - Fork 11
16 - db encryption at rest #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Rafalkufel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
| APP_AUTHOR: str = "Code for Poznań" | ||
|
|
||
| DB_FILE_NAME: str = "alinka.sqlite" | ||
| DB_FILE_NAME: str = "alinka.db" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sqlite preferred extension for some reason?
Both pysqlcipher3 (sqlcipher3 is for of pysqlcipher3) and SQLAlchemy documentation suggest db extension:
https://github.com/rigglemania/pysqlcipher3/blob/master/README.rst
https://docs.sqlalchemy.org/en/20/dialects/sqlite.html#module-sqlalchemy.dialects.sqlite.pysqlcipher
On top of that, I wanted to avoid issues for people who already have alinka.sqlite. For persistence (I guess?), we keep it in volume. That file has been created with a different driver and is unencrypted. It won't work with SQLCipher. And errors and stacktrace are not very helpful, to say I least (I spent most of last weekend struggling to find the root cause)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation is ok for me. I'd try to avoid to change it back and forth from db to sqlite without any justification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@magul WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here are few thoughts I regarding this topic:
- sqlcipher3 is nt in fact a fork (I guess this is what you meant) of pysqlcipher3 - as they don't share any commit history, but I agree they are libraries that basically do the same thing (as far as I understand how sqlcipher works)
- I recall I moved from
.dbtosqlite(here: Use platformdirs package to determine location of DB and generated do… #39) - at that time (and TBH still) it just explicitly mark, what kind of DB it is, but I don't have strong feelings right now regarding whether we should go with.sqliteor.db.
Also, regarding:
On top of that, I wanted to avoid issues for people who already have alinka.sqlite. For persistence (I guess?), we keep it in volume. That file has been created with a different driver and is unencrypted. It won't work with SQLCipher.
We do need to have database persistency, this is kind of a reason, why we have .volumes/ mounted in docker-compose.yml, but here are few details worth mentioning:
.volumes/were added for development purpose only, this is only exposed via volumes in Docker, and neither Windows users, nor Debian/Ubuntu ones will use docker to run that app- we're lucky, that none of our actual users run this application, that way we can change filename of database, but as soon as we release first version (and we plan to do that in first half of June), we will need to figure out how to be able to migrate our data (and I don't only mean Alembic migrations here) - even in this pull request you add
DB_PASSPHRASE: str = "should_be_changed", in longer approach we need to be able to support changing that password, and this change cannot cause our users to start new db, we need to copy their data, I would advice all of us, to start thinking how to build such migrations framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.volumes/ were added for development purpose only, this is only exposed via volumes in Docker, and neither Windows users, nor Debian/Ubuntu ones will use docker to run that app
and that is what I meant. Maybe it would be obvious to you why the build is failing, but it wasn't to me, and I am not 100% sure that it would be to other developers.
Aren't Pydantic settings loaded from environmental variables? I added DB_PASSPHRASE: str = "should_be_changed" just to highlight the fact that we should treat it as a secret.
Maybe the better solution would be to have a dev version of .env file without defaults in alinka/config/configuration.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of confused regarding
and that is what I meant. Maybe it would be obvious to you why the build is failing, but it wasn't to me, and I am not 100% sure that it would be to other developers.
I've thought we talk here about .sqlite vs .db extension. I can assume, that it may be missed that Windows builder is failing, I've even thought about this today, thinking that instead of workflow_dispatch, we should build DEB and EXE packages on every pull_request. WDYT? I've also thought about doing some introduction to our Github Actions on next hacknight.
Assuming, we're talking here about failure Windows package failure - is there anything unclear right now (assuming you've already seen Gtihub Action I'd send in one of comments)?
Aren't Pydantic settings loaded from environmental variables? I added DB_PASSPHRASE: str = "should_be_changed" just to highlight the fact that we should treat it as a secret.
Maybe the better solution would be to have a dev version of .env file without defaults in alinka/config/configuration.py?
☝🏿 this is also confusing - I may be wrong, but as we're building desktop application here targeting mostly Windows-based, non-tech-savy endusers, I don't believe they will be able and willing to setup environment variables, on top of that I have stong feeling, that having encrypted DB using key that's readily available to read on the same machine is not safe. I would assume (and this is the reason I called this pull request as PoC), that in long run we would like enduser to type that passphrase everytime she/he starts Alinka.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately sqlcipher3-binary contains only binaries for Linux, and as 95%+ of our users use Windows, we cannot break build process for this class of operating system.
Our pipeline is getting:
RuntimeError
Unable to find installation candidates for sqlcipher3-binary (0.5.4)
at ~\.local\venv\lib\site-packages\poetry\installation\chooser.py:86 in choose_for
82|
83| links.append(link)
84|
85| if not links:
> 86| raise RuntimeError(f"Unable to find installation candidates for {package}")
87|
88| # Get the best link
89| chosen = max(links, key=lambda link: self._sort_key(package, link))
90|
Cannot install sqlcipher3-binary.
(see: https://github.com/magul/alinka-pyside/actions/runs/15241885651/job/42862975041)
I guess, we should use source package and build our binaries ourselves.
If you want to do that, then Github Action pipeline is available in here: https://github.com/CodeForPoznan/alinka-pyside/blob/develop/.github/workflows/_build-win.yml, to test that (after backmerging current develop) you can just manually trigger that pipline by going to: https://github.com/CodeForPoznan/alinka-pyside/actions/workflows/build-develop.yml (or to your fork, that your branch should be developed) and just click Run workflow and choose branch, that you want to run this workflow on.
After succesfull build, Windows installer should be available (zipped) as workflow artifact at the bottom of this run Summary page
If you also need Windows VM to test - I can provide one (I have both Win10 and Win11 build in VMware Workstation Pro).
Let me know, how I can help with that!
| APP_AUTHOR: str = "Code for Poznań" | ||
|
|
||
| DB_FILE_NAME: str = "alinka.sqlite" | ||
| DB_FILE_NAME: str = "alinka.db" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here are few thoughts I regarding this topic:
- sqlcipher3 is nt in fact a fork (I guess this is what you meant) of pysqlcipher3 - as they don't share any commit history, but I agree they are libraries that basically do the same thing (as far as I understand how sqlcipher works)
- I recall I moved from
.dbtosqlite(here: Use platformdirs package to determine location of DB and generated do… #39) - at that time (and TBH still) it just explicitly mark, what kind of DB it is, but I don't have strong feelings right now regarding whether we should go with.sqliteor.db.
Also, regarding:
On top of that, I wanted to avoid issues for people who already have alinka.sqlite. For persistence (I guess?), we keep it in volume. That file has been created with a different driver and is unencrypted. It won't work with SQLCipher.
We do need to have database persistency, this is kind of a reason, why we have .volumes/ mounted in docker-compose.yml, but here are few details worth mentioning:
.volumes/were added for development purpose only, this is only exposed via volumes in Docker, and neither Windows users, nor Debian/Ubuntu ones will use docker to run that app- we're lucky, that none of our actual users run this application, that way we can change filename of database, but as soon as we release first version (and we plan to do that in first half of June), we will need to figure out how to be able to migrate our data (and I don't only mean Alembic migrations here) - even in this pull request you add
DB_PASSPHRASE: str = "should_be_changed", in longer approach we need to be able to support changing that password, and this change cannot cause our users to start new db, we need to copy their data, I would advice all of us, to start thinking how to build such migrations framework.
| DOCUMENTS_PATH: str = os.path.join(platformdirs.user_documents_dir(), APP_NAME) | ||
|
|
||
| RSPO_DOMAIN: str = "https://rspo.gov.pl/" | ||
| DB_PASSPHRASE: str = "should_be_changed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as that's rather PoC, this is fine, but in long run we need to:
- allow our users to establish custom password
- make our database support re-encrypting with other cypher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some brief research, and it looks, that changing passphrase should not trigger full DB re-encryption, so DB migration "bullet dodged" here, but still in long run we should have a shared understanding, how to migrate DB between different versions of Alinka.
| transactional_ddl=True, | ||
| transaction_per_migration=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any explanation for adding this parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that it would fail at the start without it (at least for me). I found somewhere the issue that some other project has with Alembic and SQLCipher, and they managed to resolve it with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you, by any chance, have a link to that issue. I'm genuinely interested in understanding technical details, of what's going on here.
|
One small think. Will we be able to view/edit db using some 3rd party application (of course with password)? I think it'd good to be able to edit db while testing migration, etc. |
Yes, that's not so well supported (standard SQLite driver will not work here), but I was able to access this database using DBeaver with custom JDBC driver. so all driver parametes should be set for that particular version. |
|
Po przejęciu PR przez @magul i @jacekkalbarczyk wyszło na to, że nie wiemy do końca jak się za to zabrać. Na razie temat jest zaparkowany z tej przyczyny. @stanislawK jeśli masz coś do dodania poza powyższym daj znać :) |


fixes #16
I am adding
sqlcipher3-binarypackage, which contains all the necessary dependencies (mainly sqlcipher3 itself) for linux. It is working for our docker build, but I am not that familiar with our build process for windows. It is possible that we will need to install additional dependencies manually, or build against amalgamation.