Skip to content

feat(tools): Add LangExtract tool integration for structured extraction#4549

Closed
its-animay wants to merge 2 commits intogoogle:mainfrom
its-animay:feat/langextract-tool-integration
Closed

feat(tools): Add LangExtract tool integration for structured extraction#4549
its-animay wants to merge 2 commits intogoogle:mainfrom
its-animay:feat/langextract-tool-integration

Conversation

@its-animay
Copy link

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

2. Or, if no issue exists, describe the change:

N/A — see linked issue above.

Problem:
ADK lacks a native tool integration for LangExtract, Google's library for extracting structured information from unstructured text using LLMs. Users must manually wrap lx.extract() in custom tool classes.

Solution:
Add LangExtractTool(BaseTool) as a first-class tool adapter following the same patterns as LangchainTool and CrewaiTool.

Changes:

  • New: src/google/adk/tools/langextract_tool.pyLangExtractTool class extending BaseTool with custom function declaration, asyncio.to_thread() execution, from_config() support, and companion LangExtractToolConfig
  • New: tests/unittests/tools/test_langextract_tool.py — 8 unit tests covering initialization, schema, execution, error handling, and config
  • Modified: pyproject.toml — Added langextract>=0.1.0 to test and extensions optional dependencies

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_default_initialization PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_custom_initialization PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_get_declaration PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_run_async PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_missing_text PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_missing_prompt_description PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_extraction_error PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_config PASSED
========================= 8 passed in 1.33s =========================

Existing tests also pass with no regressions (test_langchain_tool.py, test_base_tool.py).

Manual End-to-End (E2E) Tests:

Usage example:

import langextract as lx
from google.adk.tools.langextract_tool import LangExtractTool

tool = LangExtractTool(
    name='extract_entities',
    description='Extract named entities from text.',
    examples=[lx.data.ExampleData(
        text="John is a software engineer at Google.",
        extractions=[lx.data.Extraction(
            extraction_class="person",
            extraction_text="John",
            attributes={"role": "software engineer", "company": "Google"},
        )],
    )],
)

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

  • Follows the exact same adapter pattern as LangchainTool (langchain_tool.py) and CrewaiTool (crewai_tool.py)
  • Uses BaseTool (not FunctionTool) because LangExtract exposes an API function rather than tool objects
  • Uses asyncio.to_thread() for non-blocking execution of the synchronous lx.extract()
  • Dual-path _get_declaration() supporting both parameters_json_schema and types.Schema (matching BaseRetrievalTool pattern)
  • Code formatted with pyink and isort per project conventions

Add LangExtractTool as a first-class tool adapter in ADK, enabling agents
to extract structured information from unstructured text using Google's
LangExtract library. Follows the same integration pattern as LangchainTool
and CrewaiTool.
@google-cla
Copy link

google-cla bot commented Feb 19, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @its-animay, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the ADK by providing a first-class tool integration for Google's langextract library. Previously, users had to manually wrap langextract for structured data extraction; this change introduces a dedicated LangExtractTool adapter, mirroring existing tool patterns. This allows LLM agents to seamlessly leverage langextract's capabilities for extracting structured information from text, simplifying development and improving the overall utility of the ADK for advanced text processing tasks.

Highlights

  • New LangExtractTool Integration: Introduced a LangExtractTool class to natively integrate Google's langextract library, enabling LLM agents to perform structured information extraction from unstructured text within the ADK framework.
  • Asynchronous Execution: Implemented asynchronous execution for the synchronous lx.extract() function using asyncio.to_thread() to prevent blocking the event loop.
  • Declarative Configuration: Added LangExtractToolConfig to support declarative configuration of the LangExtractTool via YAML files, aligning with existing tool patterns.
  • Dependency Management: Updated pyproject.toml to include langextract as an optional dependency for both testing and extensions.
  • Comprehensive Unit Tests: Included a new suite of unit tests covering LangExtractTool initialization, schema declaration, asynchronous execution, error handling, and configuration validation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pyproject.toml
    • Added langextract>=0.1.0 to the test optional dependencies.
    • Added langextract>=0.1.0 to the extensions optional dependencies.
  • src/google/adk/tools/langextract_tool.py
    • Added LangExtractTool class, extending BaseTool for structured information extraction.
    • Implemented _get_declaration for tool schema definition, supporting both parameters_json_schema and types.Schema.
    • Implemented run_async to execute lx.extract() asynchronously using asyncio.to_thread().
    • Added from_config class method for loading tool configurations.
    • Added LangExtractToolConfig for defining tool parameters via configuration files.
  • tests/unittests/tools/test_langextract_tool.py
    • Added unit tests for LangExtractTool default and custom initialization.
    • Added tests for _get_declaration to verify schema correctness.
    • Added asynchronous tests for run_async to confirm lx.extract() invocation and result parsing.
    • Included tests for handling missing text and prompt_description arguments.
    • Added tests for error handling during extraction.
    • Added a test for LangExtractToolConfig validation.
Activity
  • Unit tests were added or updated and passed locally.
  • Existing tests passed with no regressions.
  • Manual end-to-end tests were performed using a provided usage example.
  • The author performed a self-review and commented the code.
  • The code was formatted with pyink and isort.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the tools [Component] This issue is related to tools label Feb 19, 2026
@adk-bot
Copy link
Collaborator

adk-bot commented Feb 19, 2026

Response from ADK Triaging Agent

Hello @its-animay, thank you for your contribution!

It looks like the Contributor License Agreement (CLA) is not signed yet. Please sign the CLA so that we can proceed with the review process. You can find more information in the "Details" link of the cla/google check at the bottom of the PR.

Thanks!

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new LangExtractTool for structured data extraction, along with its configuration and comprehensive unit tests. The implementation is clean and follows existing patterns in the codebase. I've provided a couple of suggestions to improve maintainability and test coverage. Overall, this is a great addition.

- Simplify from_config() using dict unpacking instead of passing each
  arg manually, improving maintainability
- Parameterize _get_declaration test to cover both JSON schema and
  types.Schema code paths via temporary_feature_override
@ryanaiagent ryanaiagent self-assigned this Feb 19, 2026
@ryanaiagent ryanaiagent added the community repo [Community] FRs/issues well suited for google/adk-python-community repository label Feb 19, 2026
@ryanaiagent
Copy link
Collaborator

Hi @its-animay , Thank you for your contribution! We appreciate you taking the time to submit this pull request.
Closing this PR here as it belongs to adk-python-community repo.
We highly recommend releasing the feature as a standalone package that we will then share through: https://google.github.io/adk-docs/integrations/"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community repo [Community] FRs/issues well suited for google/adk-python-community repository tools [Component] This issue is related to tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add LangExtract tool integration for structured information extraction

3 participants

Comments