Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database schema and data migrations #25

Open
2 tasks
tjementum opened this issue Apr 22, 2023 · 7 comments
Open
2 tasks

Database schema and data migrations #25

tjementum opened this issue Apr 22, 2023 · 7 comments
Labels
Roadmap This is a big roadmap item

Comments

@tjementum
Copy link
Member

tjementum commented Apr 22, 2023

Tasks

Preview Give feedback

Problem

In .NET Entity Framework, there can be several database schema migrations. However, between each Database migration, there might be the need to have data migration don by code code.Example:

  1. Schema migration: Add new FirstName and LastName columns to the Users table that allows null
  2. Data migration that populates the new columns from the existing Users.DisplayName column using C# code
  3. Schema migration: Remove the null check and delete the old DisplayName column

This is just a simple example. The solution should handle that many complex migrations are run in a certain order, if a system has not been updated for some time (e.g. Schema, Data, Schema, Schema, Data, Schema, Data).

How can I create a good migration strategy that runs as part of the Application startup?

Solution 1

Use SQL .dbproj and DacPac: Power your DevOps Development with SQL Server Containers

Solution 2

Managing database migrations and code transformations in a sequential and reliable manner is indeed an important aspect of maintaining a robust application, especially in a complex domain-driven design project. Your problem is about orchestrating these operations in a way that ensures database integrity and data consistency.

For .NET and Entity Framework, you have a tool called EF Migrations which can handle database schema changes. However, EF Migrations doesn't inherently support code migrations that transform data between schema migrations.

Here is an approach you might consider:

  1. Separate Schema Migrations and Data Migrations: Separate schema migrations (handled by EF Migrations) and data migrations (custom scripts for data transformations).
  2. Sequential Naming Convention: Adopt a naming convention for your migrations that includes a sequence number (e.g., 001_AddColumns, 002_PopulateColumns, 003_RemoveNullCheck). This will ensure that migrations are executed in the correct order.
  3. Custom Migration Runner: Implement a custom migration runner that executes at application startup. This runner should:
  • Run the EF Migrations up to the point where a data migration is needed.

  • Run the corresponding data migration scripts.

  • Continue with the rest of the EF Migrations.

    This could be achieved by using the DbContext.Database.Migrate() method to apply the EF
    Migrations and a custom method to execute the data migrations. The data migration scripts could be SQL scripts or C# scripts depending on the complexity of the data transformations.

  1. Idempotent Migrations: Make sure your migrations are idempotent - that is, they can be run multiple times without causing issues. This is important as migrations might fail and need to be rerun.
  2. Test Your Migrations: Always test your migrations, ideally in an environment that mirrors your production environment.

This solution allows you to manage both schema migrations and data transformations in a sequential manner. It also makes sure the order of operations is preserved when you deploy to different environments.

Remember that handling migrations this way could potentially cause longer downtimes during deployment, especially for large data migrations. So, you need to plan your deployments carefully.

Solution 3

A more naïve approach would be to run migration as part of the application startup. If not done carefully this might cause race conditions. One way to avoid Race conditions is to use "Init containers":

Generally available: Init containers in Azure Container Apps
Published date: August 16, 2023
The init containers feature in Azure Container Apps is now generally available. Init containers are specialized containers that run to completion before application containers are started in a replica, and they can contain utilities or setup scripts not present in your container app image. Init containers are useful for performing initialization logic such as setting up accounts, running setup scripts, and configuring databases.

To learn more, visit: https://aka.ms/aca/init-containers

@tjementum tjementum converted this from a draft issue Apr 22, 2023
@tjementum tjementum added this to the 🐣 PlatformPlatform Alpha milestone Apr 22, 2023
@tjementum tjementum added the Enhancement New feature or request label Apr 22, 2023
@tjementum tjementum moved this from Someday maybe to Later in Roadmap Apr 22, 2023
@tjementum tjementum added Roadmap This is a big roadmap item and removed Enhancement New feature or request labels Apr 24, 2023
@tjementum tjementum changed the title Database migration Automate database migrations Apr 24, 2023
@tjementum tjementum changed the title Automate database migrations Database migrations Apr 24, 2023
@tjementum tjementum changed the title Database migrations Database and code migrations Jun 29, 2023
@tjementum tjementum changed the title Database and code migrations Database schema and data migrations Aug 1, 2023
@toutas
Copy link

toutas commented Dec 21, 2023

DbUp is a great alternative solution to the EF migrations problem (schema, data, schema) since the migration scripts are executed in-order and are only applied once (schemaversions). integrating a migrations run at startup / build pipeline is as simple as

var internalScriptName = scriptName;
var db = scriptName.Split('-')[3];
var connectionString = ConfigurationManager.ConnectionStrings[db];
var upgrader = DeployChanges.To
    .PostgresqlDatabase(connectionString.ConnectionString)
    .WithExecutionTimeout(new TimeSpan(0, executionTimeoutMinutes, 0))
    .WithScriptsEmbeddedInAssembly(Assembly.GetExecutingAssembly(), s => s == internalScriptName)
    .LogToConsole()
    .Build();

var result = upgrader.PerformUpgrade();

would this be a viable solution alternative?

@tjementum
Copy link
Member Author

tjementum commented Dec 21, 2023

Thanks for the feedback/suggestion.

I've used DbUp extensively over the past 4 years. It is certainly an alternative, but it can be confusing to understand how your DB schema looks when you have made hundreds of migrations. It also is not trivial to work with when you do Disaster Recovery, as you need full control over which version your schema was in. And your backups might be different (unless you do full database restore when doing Disaster Recovery, where the migration history table is in sync).

Like EF migrations, it requires a deep understanding by everyone using these migrations, but most people think they are trivial because they only test the happy path. I've seen people go back and change old scripts because it was too slow to run on staging, then another slightly different version was deployed to production, essentially making the active schema ever so slightly different between staging and production (in terms of indexes). A bomb waiting to explode.

Personally, I like DacPac as it's declarative, and you have tooling to modify your database schema. Basically, you have a .dbproj that defines how you want your database to look, and the tooling generates the diff, which will ensure that no drifting happens. The scripts generated can do complex transactional migrations. However, this approach has limited support for "schema, code, schema" migrations. But so does DbUp and EF Migrations.

I'm starting to lean towards doing database migrations as part of CI/CD. It will make it clear for everyone that all database migrations must be at least one version backward compatible (so you can roll back, and so rolling deployments do not cause problems). Also, database migrations are much more likely to cause production problems, in the same way as infrastructure changes (Bicep, Terraform). So, they need more governance than normal code deployments.

We are still trying to make up our mind around this. Until we find something noticeably better, we'll stick with EF migrations.

@Trubador
Copy link

Trubador commented Feb 5, 2024

I would like to contribute on this topic as I have done a lot of research on SQL Schema and I would like to discuss and get feedback on this especially in particular to modular migrations that are reliable for different types of B2B/B2C SaaS systems.

@tjementum
Copy link
Member Author

Hi Casper,

Cool. Would you have time for a short introduction (online)?

As written above, I have some ideas on how to build this with DacPac in CI/CD workflows. But it's not completely set in stone.

We are currently using EF Core migrations, which are fine for now, but they are run as part of startup, which makes them, slow, fragile, and susceptible to race conditions. If a migration fails, there is an incident.

I've used DacPac before, but not in CI/CD, so we will have to do research to ensure it is reliable in both continuous deployment scenarios, and in disaster recovery (where you might restore a database that is several versions old).

A core part of our principles is that "application code" is deployed with continuous deployment (push on green), compared to continuous delivery (manual approval). See https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment:

CleanShot 2024-02-05 at 15 38 14

Database code is infrastructure and is deployed with continuous delivery (manual approval). So, I fear the scenario where a developer makes a pull-request with both database changes and code changes. In our setup, the application code will always hit production before the database changes. Somehow, we need to protect ourselves against this.

If you have experience with or ideas on how to set this up in a robust, clean, and developer-friendly way, I would love to talk. To be honest, I have not seen anyone solve this problem in a good way :)

@Trubador
Copy link

Trubador commented Feb 6, 2024

Hi Casper,

Cool. Would you have time for a short introduction (online)?

As written above, I have some ideas on how to build this with DacPac in CI/CD workflows. But it's not completely set in stone.

We are currently using EF Core migrations, which are fine for now, but they are run as part of startup, which makes them, slow, fragile, and susceptible to race conditions. If a migration fails, there is an incident.

I've used DacPac before, but not in CI/CD, so we will have to do research to ensure it is reliable in both continuous deployment scenarios, and in disaster recovery (where you might restore a database that is several versions old).

A core part of our principles is that "application code" is deployed with continuous deployment (push on green), compared to continuous delivery (manual approval). See https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment:

CleanShot 2024-02-05 at 15 38 14

Database code is infrastructure and is deployed with continuous delivery (manual approval). So, I fear the scenario where a developer makes a pull-request with both database changes and code changes. In our setup, the application code will always hit production before the database changes. Somehow, we need to protect ourselves against this.

If you have experience with or ideas on how to set this up in a robust, clean, and developer-friendly way, I would love to talk. To be honest, I have not seen anyone solve this problem in a good way :)

Hey Thomas,

I think I can find some time for an online meeting about this topic as I think it is quite important and I would like to understand your points of view on the matter a bit better.

If you could connect with me via my github profile for example on LinkedIn, I think that would be the easiest to schedule a meeting 😀.

tjementum pushed a commit that referenced this issue Dec 16, 2024
### Summary & Motivation
Move the endpoints from the Core project to the Calendar-Assistant API
project because `IEndpoints` are only picked up by Dependency Injection
when they reside in the API project. This adjustment ensures that the
Dependency Injection system properly registers and manages these
endpoints.

### Atomic Changes
- Move endpoints to Calendar-Assistant API project

### Checklist

- [x] I have added a Label to the pull-request
- [x] I have added tests, and done manual regression tests
- [x] I have updated the documentation, if necessary

---------

Co-authored-by: Marco van Kimmenade <[email protected]>
@ayuksekkaya
Copy link

@tjementum I want to comment on this from my perspective. I am using this template for my project, and personally, I am thinking of changing the database PostgreSQL, instead of MsSql or Azure SQL, before going live. And, I think that adding DACPAC might kind of limit this template since it's a starter template.

I think there would be many people who would want to use Postgres instead of MsSql due to licensing costs etc. And, currently, with EF Core and Azure that's easy enough to do. However, with DACPAC what would be the alternative if you didn't want to use SQL Server?

DbUp works with both, however, I am not sure how it will work with the code-first approach. I think you still need to run EF Core migrations and keep those files even when using DbUp. But I know that you said you don't like EF Core migrations so maybe you won't like that approach.

@tjementum
Copy link
Member Author

Hi Ali

Since PlatformPlatform is open source, you can stick to Entity Framework migrations, but when pulling changes from PlatformPlatform, you will run into problems no matter what if you change to Postgres.

I'm very clear that PlatformPlatform is an opinionated platform, which provides the benefit of making choices that do not need to follow the lowest common denominator. I also receive questions about supporting AWS, Kubernetes, Azure Functions, etc., but this flexibility comes with constraints and tradeoffs.

Personally, a database is just for tables and columns, so I don't see the benefits of using Postgres, MongoDB, etc. over SQL, as the database should have no logic.

On Azure, SQL Server is, in my opinion, the better database choice in many ways. Stability, security, integration with other Azure services (like Microsoft Fabric, Purview, PowerBI, etc.), documentation, and general support are all advantages. I'm not saying Postgres is a bad database, but like .NET, Microsoft SQL is a first-class citizen in Azure. Not to mention that you do not risk the Redis, FluentAssertion problem when open-source systems stop allowing cloud providers to monetize their platform.

As for Entity Framework migrations, I really am not a fan, and they always get in the way (merge conflicts, static code analysis warnings, search and replace when refactoring, problems when changing between branches, bad workflow). I know this is what others use, and similar approaches are used on other platforms like Prisma or TypeScript. Also, I think it's because most engineers have not seen what a true declarative solution to a database schema looks like.

I'm not 100% settled on DACPAC, as there might be unknown problems. I've never used DACPAC in CI/CD, but if it plays out like I hope, this will be a much better choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Roadmap This is a big roadmap item
Projects
Status: Next
Development

No branches or pull requests

4 participants