Skip to content

Comments

feat(tools): Add LangExtract tool for structured information extraction#92

Open
its-animay wants to merge 1 commit intogoogle:mainfrom
its-animay:feat/langextract-tool
Open

feat(tools): Add LangExtract tool for structured information extraction#92
its-animay wants to merge 1 commit intogoogle:mainfrom
its-animay:feat/langextract-tool

Conversation

@its-animay
Copy link

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

Problem:
ADK community lacks a native tool integration for LangExtract, Google's library for extracting structured information from unstructured text using LLMs. Users must manually wrap `lx.extract()` in custom tool classes.

Solution:
Add `LangExtractTool(BaseTool)` to the `google.adk_community.tools` module. This was originally submitted to adk-python#4549 and redirected here by the maintainers.

Changes:

  • New: `src/google/adk_community/tools/langextract_tool.py` — `LangExtractTool` extending `BaseTool` with custom function declaration, `asyncio.to_thread()` execution, and companion `LangExtractToolConfig`
  • New: `src/google/adk_community/tools/init.py` — module exports
  • New: `tests/unittests/tools/test_langextract_tool.py` — 8 unit tests
  • Modified: `pyproject.toml` — added `langextract` optional dependency group and to test extras
  • Modified: `src/google/adk_community/init.py` — expose `tools` module

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_default_initialization PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_custom_initialization PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_get_declaration PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_run_async PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_missing_text PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_missing_prompt_description PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_extraction_error PASSED
tests/unittests/tools/test_langextract_tool.py::test_langextract_tool_config_build PASSED
========================= 8 passed in 5.03s =========================

Manual End-to-End (E2E) Tests:

from google.adk_community.tools import LangExtractTool
import langextract as lx

tool = LangExtractTool(
    name='extract_entities',
    description='Extract named entities from text.',
    examples=[
        lx.data.ExampleData(
            text='John is a software engineer at Google.',
            extractions=[
                lx.data.Extraction(
                    extraction_class='person',
                    extraction_text='John',
                    attributes={'role': 'software engineer', 'company': 'Google'},
                )
            ],
        )
    ],
)

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

  • Uses `asyncio.to_thread()` since `lx.extract()` is synchronous and potentially long-running
  • Optional install: `pip install google-adk-community[langextract]`
  • LangExtract repo: https://github.com/google/langextract

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @its-animay, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Google's LangExtract library into the ADK community tools, providing a LangExtractTool that enables LLM agents to perform structured information extraction directly. This significantly enhances the framework's ability to process unstructured text by offering a native, configurable, and asynchronous solution for data extraction, thereby improving developer experience and agent capabilities.

Highlights

  • New LangExtract Tool: Introduced LangExtractTool and LangExtractToolConfig to natively integrate Google's LangExtract library for structured information extraction within the ADK community tools.
  • Enhanced LLM Agent Capabilities: The new tool allows LLM agents to extract structured data (entities, attributes, relationships) from unstructured text, streamlining a previously manual process.
  • Dependency Management: Updated pyproject.toml to include langextract as an optional dependency and for testing purposes, ensuring proper installation and environment setup.
  • Module Structure and Exports: Modified src/google/adk_community/__init__.py to expose the new tools module, and created src/google/adk_community/tools/__init__.py to export the new tool classes.
  • Comprehensive Unit Testing: Added a dedicated test file with 8 unit tests covering the initialization, declaration, asynchronous execution, error handling, and configuration building of the LangExtractTool.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pyproject.toml
    • Added langextract to the [project.optional-dependencies] section.
    • Included langextract as a dependency for the test optional dependency group.
  • src/google/adk_community/init.py
    • Imported the new tools module.
    • Exposed the tools module for package-level access.
  • src/google/adk_community/tools/init.py
    • Created a new __init__.py file to define the tools module.
    • Exported LangExtractTool and LangExtractToolConfig to make them accessible from the tools module.
  • src/google/adk_community/tools/langextract_tool.py
    • Added the LangExtractTool class, inheriting from BaseTool, to wrap the langextract library.
    • Implemented _get_declaration to define the tool's function signature for LLM agents.
    • Implemented run_async to execute lx.extract() in a separate thread using asyncio.to_thread() to prevent blocking.
    • Included LangExtractToolConfig for declarative configuration of the tool.
  • tests/unittests/tools/test_langextract_tool.py
    • Added a new test file containing 8 unit tests for LangExtractTool and LangExtractToolConfig.
    • Tests cover initialization, declaration, asynchronous execution, error handling, and configuration building.
Activity
  • The author confirmed that unit tests were added or updated for the changes.
  • All unit tests passed locally as verified by the author.
  • The author performed a self-review of the code.
  • Code comments were added, particularly in complex areas.
  • Manual end-to-end tests were conducted and an example was provided in the PR description.
  • The pull request addresses issue feat: Add LangExtract tool for structured information extraction #91 and was redirected from a previous submission to adk-python#4549.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a LangExtractTool to integrate the langextract library for structured information extraction. The implementation is well-structured, including a tool class, a configuration class, and comprehensive unit tests. The use of asyncio.to_thread for the synchronous lx.extract call is correct and important for an async application. My main feedback is a suggestion to use Python's dataclasses for the configuration class to reduce boilerplate code, which is a common practice for such data-holding classes.

Add LangExtractTool to the community tools module, enabling ADK agents
to extract structured data (entities, attributes, relationships) from
unstructured text using Google's LangExtract library.

- New: src/google/adk_community/tools/langextract_tool.py
- New: src/google/adk_community/tools/__init__.py
- New: tests/unittests/tools/test_langextract_tool.py
- Updated: pyproject.toml with langextract optional dependency
- Updated: adk_community __init__.py to expose tools module

LangExtractToolConfig uses @DataClass for concise, idiomatic config.
@its-animay its-animay force-pushed the feat/langextract-tool branch from d1874a4 to 6c77a0c Compare February 19, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add LangExtract tool for structured information extraction

1 participant