Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][format] op format factory mv to cache . #4813

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ranxianglei
Copy link
Contributor

Purpose

from #4497

better impl for #4782

Linked issue: close #xxx

Tests

API and Format

Documentation

@leaves12138
Copy link
Contributor

I tihink FileFormatFactory is a special interface, we don't need to change. Many third-party file formats depend on this service loader. We could just distinguish it from Factory.

@ranxianglei
Copy link
Contributor Author

ranxianglei commented Jan 2, 2025

The significance of unified management is that many methods do not need to be implemented multiple times.
For example, this PR #4818 implements the factory overide mechanism.

Third-party file formats just rename org.apache.paimon.format.FileFormatFactory to org.apache.paimon.factories.Factory then works.

@leaves12138

@leaves12138
Copy link
Contributor

leaves12138 commented Jan 2, 2025

That's not beckoning enough to make this change. If you think FormatFactoryUtils is too redundant, we can re-use more code.

We can change FactoryUtil to

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.paimon.factories;

import org.apache.paimon.shade.caffeine2.com.github.benmanes.caffeine.cache.Cache;
import org.apache.paimon.shade.caffeine2.com.github.benmanes.caffeine.cache.Caffeine;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.ServiceLoader;
import java.util.function.Function;
import java.util.stream.Collectors;

/** Utility for working with {@link Factory}s. */
public class FactoryUtil {

    private static final Logger LOG = LoggerFactory.getLogger(FactoryUtil.class);

    private static final Cache<ClassLoader, List<Factory>> FACTORIES =
            Caffeine.newBuilder().softValues().maximumSize(100).executor(Runnable::run).build();

    public static <T extends Factory> T discoverFactory(
            ClassLoader classLoader, Class<T> factoryClass, String identifier) {
        final List<T> foundFactories =
                getFactories(classLoader).stream()
                        .filter(f -> factoryClass.isAssignableFrom(f.getClass()))
                        .map(factoryClass::cast)
                        .collect(Collectors.toList());

        if (foundFactories.isEmpty()) {
            throw new FactoryException(
                    String.format(
                            "Could not find any factories that implement '%s' in the classpath.",
                            factoryClass.getName()));
        }

        return matchFactory(foundFactories, Factory::identifier, identifier);
    }

    /** Discovers a factory using the given factory base class and identifier. */
    @SuppressWarnings("unchecked")
    public static <T> T matchFactory(
            List<T> factories, Function<T, String> identifierFunction, String identifier) {

        final List<T> matchingFactories =
                factories.stream()
                        .filter(f -> identifierFunction.apply(f).equals(identifier))
                        .collect(Collectors.toList());

        if (matchingFactories.isEmpty()) {
            throw new FactoryException(
                    String.format(
                            "Could not find any factory for identifier '%s' in the classpath.\n\n"
                                    + "Available factory identifiers are:\n\n"
                                    + "%s",
                            identifier,
                            factories.stream()
                                    .map(identifierFunction)
                                    .collect(Collectors.joining("\n"))));
        }
        if (matchingFactories.size() > 1) {
            throw new FactoryException(
                    String.format(
                            "Multiple factories for identifier '%s' in the classpath.\n\n"
                                    + "Ambiguous factory classes are:\n\n"
                                    + "%s",
                            identifier,
                            matchingFactories.stream()
                                    .map(f -> f.getClass().getName())
                                    .sorted()
                                    .collect(Collectors.joining("\n"))));
        }

        return (T) matchingFactories.get(0);
    }

    public static <T extends Factory> List<String> discoverIdentifiers(
            ClassLoader classLoader, Class<T> factoryClass) {
        final List<Factory> factories = getFactories(classLoader);

        return factories.stream()
                .filter(f -> factoryClass.isAssignableFrom(f.getClass()))
                .map(Factory::identifier)
                .collect(Collectors.toList());
    }

    private static List<Factory> getFactories(ClassLoader classLoader) {
        return FACTORIES.get(classLoader, s -> discoverFactories(classLoader, Factory.class));
    }

    /**
     * Discover factories.
     *
     * @param classLoader the class loader
     * @param klass the klass
     * @param <T> the type of the factory
     * @return the list of factories
     */
    public static <T> List<T> discoverFactories(ClassLoader classLoader, Class<T> klass) {
        final Iterator<T> serviceLoaderIterator = ServiceLoader.load(klass, classLoader).iterator();

        final List<T> loadResults = new ArrayList<>();
        while (true) {
            try {
                // error handling should also be applied to the hasNext() call because service
                // loading might cause problems here as well
                if (!serviceLoaderIterator.hasNext()) {
                    break;
                }

                loadResults.add(serviceLoaderIterator.next());
            } catch (Throwable t) {
                if (t instanceof NoClassDefFoundError) {
                    LOG.debug(
                            "NoClassDefFoundError when loading a {}. This is expected when trying to load factory but no implementation is loaded.",
                            Factory.class.getCanonicalName(),
                            t);
                } else {
                    throw new RuntimeException(
                            "Unexpected error when trying to load service provider.", t);
                }
            }
        }

        return loadResults;
    }
}

change FormatFactoryUtil to :

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.paimon.factories;

import org.apache.paimon.format.FileFormatFactory;

import org.apache.paimon.shade.caffeine2.com.github.benmanes.caffeine.cache.Cache;
import org.apache.paimon.shade.caffeine2.com.github.benmanes.caffeine.cache.Caffeine;

import java.util.List;

import static org.apache.paimon.factories.FactoryUtil.discoverFactories;

/** Utility for working with {@link FileFormatFactory}s. */
public class FormatFactoryUtil {

    private static final Cache<ClassLoader, List<FileFormatFactory>> FACTORIES =
            Caffeine.newBuilder().softValues().maximumSize(100).executor(Runnable::run).build();

    /** Discovers a file format factory. */
    @SuppressWarnings("unchecked")
    public static <T extends FileFormatFactory> T discoverFactory(
            ClassLoader classLoader, String identifier) {
        final List<FileFormatFactory> foundFactories = getFactories(classLoader);

        return (T)
                FactoryUtil.matchFactory(foundFactories, FileFormatFactory::identifier, identifier);
    }

    private static List<FileFormatFactory> getFactories(ClassLoader classLoader) {
        return FACTORIES.get(
                classLoader, s -> discoverFactories(classLoader, FileFormatFactory.class));
    }
}

@leaves12138
Copy link
Contributor

Can you fix this as what I mentioned? Still regard FileFormatFactory as an independent interface.

@leaves12138
Copy link
Contributor

The significance of unified management is that many methods do not need to be implemented multiple times. For example, this PR #4818 implements the factory overide mechanism.

Third-party file formats just rename org.apache.paimon.format.FileFormatFactory to org.apache.paimon.factories.Factory then works.

@leaves12138

That need to handle files in META-INFO/services also

@ranxianglei
Copy link
Contributor Author

Sure, I am writing @leaves12138

@ranxianglei ranxianglei changed the title [core][format] op format factory mv to cache . WIP:[core][format] op format factory mv to cache . Jan 2, 2025
@ranxianglei ranxianglei changed the title WIP:[core][format] op format factory mv to cache . [core][format] op format factory mv to cache . Jan 2, 2025
@ranxianglei
Copy link
Contributor Author

done.Compatible with existing third-party factory implementations .

@leaves12138
Copy link
Contributor

done.Compatible with existing third-party factory implementations .

We don't need to combine META-INF/services/org.apache.paimon.format.FileFormatFactory and META-INF/services/org.apache.paimon.factories.Factory. FileFormatFactory could be regarded as a special.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants