Skip to content

[Enhancement]: Refactor: Invasive Changes Required for Adding New Data Types #3134

@jac0626

Description

@jac0626

Is there an existing issue for this?

  • I have searched the existing issues

What would you like to be added?

Problem

Adding a new data type requires modifying 5+ files with scattered elif branches:

  • types.py - HybridExtraList._extract_lazy_fields() (7+ elif branches)
  • entity_helper.py - entity_to_field_data() (10+ elif branches)
  • search_result.py - Add to type set
  • embedding_list.py - Add to dtype_map
  • bulk_writer/constants.py - Add to multiple dictionaries

This violates the Open/Closed Principle and makes the codebase hard to maintain.

Current Code Pattern

# Repeated in multiple files
elif field_data.type == DataType.FLOAT16_VECTOR:
    dim = field_data.vectors.dim
    bytes_per_vector = dim * 2
    start_pos = index * bytes_per_vector
    # ... 10+ lines of similar code

Proposed Solution

Implement a Type Handler Registry using Strategy Pattern:

class TypeHandler(ABC):
    @abstractmethod
    def extract_from_field_data(self, field_data, index, row_data):
        pass
    
    @abstractmethod
    def pack_to_field_data(self, entity_values, field_data, field_info):
        pass

# Usage: Replace all elif branches with
handler = get_type_handler(field_data.type)
handler.extract_from_field_data(field_data, index, row_data)

Benefits

  • Before: Modify 5+ files to add a type
  • After: Create 1 handler class + 1 line registration
  • Better testability, extensibility, and maintainability

Migration

  1. Create TypeHandler base class and registry
  2. Migrate existing types incrementally
  3. Replace elif branches with registry calls
  4. Remove old code

Why is this needed?

No response

Anything else?

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions