Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-23079 Make Raft storages destruction durable #4987

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

Phillippko
Copy link
Contributor

Copy link
Contributor

@rpuch rpuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice first attempt to approach the issue, but it seems to be more complicated than I thought before.

  1. It would be much better to use a single intent per group, not 2 per group
  2. Log storage factories should not be aware about destroying something on start (at least, at seems that we could avoid this). Current approach duplicates this logic in the log storage factories

How about trying to pull the knowledge about 'to which log storage factory to route the destruction request on start' higher (to raft server, for example?). We could introduce GroupStoragesDestructionIntent with one method like toIntentIdBytes(). For CMG and MS there would be one implementation (a class with just groupId in it); for partition it would be another implementation (with groupId and isVolatile flag in it). We would then register resolvers with raft server; the resolvers would resolve intent ID bytes to GroupStorageDestructionIntents and to the corresponding destroyers. Raft server would just iterate the destruction event storage, obtain destroyers and call them.

@@ -113,6 +114,9 @@
public class JraftServerImpl implements RaftServer {
private static final IgniteLogger LOG = Loggers.forClass(JraftServerImpl.class);

/** Prefix to save destroy storage intents in {@link DestroyStorageIntentStorage}. */
private static final String DESTROY_PREFIX = "jraftServer";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final String DESTROY_PREFIX = "jraftServer";
private static final String META_AND_SNAPSHOT_DESTROY_INTENT = "metaAndSnapshot";

Because it is about destroying Raft meta and snapshot storages

@@ -573,15 +584,18 @@ public boolean stopRaftNodes(ReplicationGroupId groupId) {

@Override
public void destroyRaftNodeStorages(RaftNodeId nodeId, RaftGroupOptions groupOptions) {
// TODO: IGNITE-23079 - improve on what we do if it was not possible to destroy any of the storages.
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both intents should be saved in the very beginning, and this should be done atomically, in one operation. Please use something like vault.putAll() under the hood

return new String(key, offset, key.length - offset, UTF_8);
}

private static ByteArray prefixByFactoryName(String factoryName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it about factory name? It seems to be an abstract prefix/identifier (as it now accepts not just factory names, but also server data path constant)

@Phillippko Phillippko marked this pull request as ready for review January 17, 2025 08:04
import org.apache.ignite.internal.raft.server.RaftGroupOptions;
import org.apache.ignite.internal.replicator.ReplicationGroupId;

/** Persists and retrieves intent to complete storages destruction on node start. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Persists and retrieves intent to complete storages destruction on node start. */
/** Persists and retrieves intents to destroy Raft group storages. */

Comment on lines 28 to 32
/** Add configurer for CMG or metastorage raft storages. */
void addGroupOptionsConfigurer(ReplicationGroupId groupId, RaftGroupOptionsConfigurer groupOptionsConfigurer);

/** Add configurer for partitions raft storages. */
void addPartitionGroupOptionsConfigurer(RaftGroupOptionsConfigurer partitionRaftConfigurer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods should not be on the interface. Ideally, the 3 configurers should be passed via constructor. If it's not possible, then just the implementation based on Vault should have some way to pass the configurers.

void addPartitionGroupOptionsConfigurer(RaftGroupOptionsConfigurer partitionRaftConfigurer);

/** Save intent to destroy raft storages. */
void saveDestroyStorageIntent(RaftNodeId nodeId, RaftGroupOptions groupOptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From groupOptions, the implementation of this method only uses isVolatile flag, so it seems that the whole object should not be passed in

void removeDestroyStorageIntent(String nodeId);

/** Returns group options needed to destroy raft storages, mapped by node id represented by String. */
Map<String, RaftGroupOptions> readGroupOptionsByNodeIdForDestruction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it happen that the number of groups to destroy is huge? Whould it make sense to have a streaming API here?

import org.apache.ignite.internal.vault.VaultEntry;
import org.apache.ignite.internal.vault.VaultManager;

/** Uses VaultManager to destroy raft group storages durably, using vault to store destruction intents. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Uses VaultManager to destroy raft group storages durably, using vault to store destruction intents. */
/** Uses VaultManager to store destruction intents. */


private static final String PARTITION_GROUP_NAME = "partition";

private static final int RAFT_GROUPS = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not groups but something like spaces


@Override
public void saveDestroyStorageIntent(RaftNodeId nodeId, RaftGroupOptions groupOptions) {
String configurerName = nodeId.groupId() instanceof PartitionGroupId ? PARTITION_GROUP_NAME : nodeId.groupId().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems too much for this class to know a difference between partitions and not partitions; it should not know about partitions at all. Probably the method could accept an object that would serialize the 'intent' (or just build the intent object)


String nodeId = raftNodeIdFromKey(next.key().bytes());

// todo add serializer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved TODO

// todo add serializer
DestroyStorageIntent intent = ByteUtils.fromBytes(next.value());

RaftGroupOptions groupOptions = intent.isVolatile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we build group options just to conform to the 'group options configurer' abstraction, we are making a loop.. But mybe we should change this later (probably with revisiting the 'group options configurer' abstraction itself)

Copy link
Contributor

@rpuch rpuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

@@ -21,6 +21,9 @@
* A {@link ReplicationGroupId} which corresponds to partition of a partitioned object.
*/
public interface PartitionGroupId extends ReplicationGroupId {
/** Used for durable destruction purposes. */
String PARTITION_GROUP_NAME = "partition";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that 'partition group ID' should know anything about destruction. Can we declare it elsewhere?

new NoOpFailureManager()
new NoOpFailureManager(),
new NoopGroupStoragesDestructionIntents(),
new GroupStoragesContextResolver(Objects::toString, Map.of(), Map.of())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to hide this in a no-arg constructor (test-only) to avoid copying this over and over?

verify(logStorageFactories.get(0), times(0)).destroyLogStorage(anyString());
}

private void destroyStorageFails(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void destroyStorageFails(
private void attemptDestroyStoragesAndExpectFailure(


restartServer();

assertFalse(Files.exists(nodeDataPath));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assert that this directory exists before an attempt to destroy storages - just to make sure we are looking at the correct directory

Comment on lines +229 to +239
Map<String, LogStorageFactory> logStorageFactoryByGroupName = Map.of(
PARTITION_GROUP_NAME, partitionsLogStorageFactory,
CmgGroupId.INSTANCE.toString(), partitionsLogStorageFactory,
MetastorageGroupId.INSTANCE.toString(), partitionsLogStorageFactory
);

Map<String, Path> serverDataPathByGroupName = Map.of(
PARTITION_GROUP_NAME, workingDir.basePath(),
CmgGroupId.INSTANCE.toString(), workingDir.basePath(),
MetastorageGroupId.INSTANCE.toString(), workingDir.basePath()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this 'partitions, metastorage, cmg' specifics in a test that is about Raft and nothing more?

// This destroys both meta storage and snapshots storage as they are stored under serverDataPath.
IgniteUtils.deleteIfExists(serverDataPath);
// This destroys both meta storage and snapshots storage as they are stored under nodeDataPath.
IgniteUtils.deleteIfExists(getServerDataPath(context.serverDataPath(), nodeId));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might oversee an exception here. We probably need to do the following:

  1. If the directory doesn't exist, we do nothing
  2. Otherwise, we try to delete it
  3. If an exception happens, we rethrow it

import org.jetbrains.annotations.Nullable;

/** Contains {@link DestroyStorageIntent}, server data path and {@link LogStorageFactory} for storage destruction. */
public class DestroyStorageContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class DestroyStorageContext {
public class StoragesDestructionContext {

import org.apache.ignite.internal.tostring.S;

/** Intent to destroy raft node's group storages. */
public class DestroyStorageIntent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class DestroyStorageIntent {
public class StoragesDestructionIntent {

}

private static byte[] toStorageBytes(DestroyStorageIntent intent) {
try (IgniteUnsafeDataOutput out = new IgniteUnsafeDataOutput(INITIAL_BUFFER_CAPACITY)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a versioned serializer

@@ -1186,6 +1200,26 @@ public class IgniteImpl implements Ignite {
publicCatalog = new PublicApiThreadingIgniteCatalog(new IgniteCatalogSqlImpl(sql, distributedTblMgr), asyncContinuationExecutor);
}

private GroupStoragesContextResolver groupStoragesContextResolver() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private GroupStoragesContextResolver groupStoragesContextResolver() {
private GroupStoragesContextResolver createGroupStoragesContextResolver() {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants