|
1 | | -# OpenDeepSearch |
| 1 | +# OpenDeepSearch ππ |
2 | 2 |
|
3 | | -## Core Components |
| 3 | +OpenDeepSearch is a lightweight yet powerful search tool designed for seamless integration with AI agents. It enables deep web search and retrieval, optimized for use with Hugging Face's **[SmolAgents](https://github.com/huggingface/smolagents)** ecosystem. |
4 | 4 |
|
5 | | -### 1. Web Scraping |
6 | | -The WebScraper class supports multiple extraction strategies: |
7 | | -- LLM-based extraction |
8 | | -- CSS selectors |
9 | | -- XPath queries |
10 | | -- No-extraction (raw content) |
11 | | -- Cosine similarity-based extraction |
| 5 | +## Features β¨ |
12 | 6 |
|
13 | | -Reference: |
| 7 | +- **Semantic Search** π§ : Leverages Infinity Embedding API for high-quality search results. |
| 8 | +- **Two Modes of Operation** β‘: |
| 9 | + - **Default Mode**: Quick and efficient search with minimal latency. |
| 10 | + - **Pro Mode (Deep Search)**: More in-depth and accurate results at the cost of additional processing time. |
| 11 | +- **Optimized for AI Agents** π€: Works seamlessly with **SmolAgents** like `CodeAgent`. |
| 12 | +- **Fast and Lightweight** β‘: Designed for speed and efficiency with minimal setup. |
| 13 | +- **Extensible** π: Easily configurable to work with different models and APIs. |
14 | 14 |
|
15 | | -### 2. Content Processing |
16 | | -Includes advanced content processing features: |
17 | | -- Chunk-based text splitting |
18 | | -- Quality content filtering |
19 | | -- Educational value assessment |
20 | | -- Smart paragraph processing |
| 15 | +## Installation π¦ |
21 | 16 |
|
22 | | -Reference: |
| 17 | +To install OpenDeepSearch, run: |
23 | 18 |
|
24 | | -### 3. Semantic Search |
25 | | -Implements semantic search capabilities: |
26 | | -- Document reranking |
27 | | -- Query-document similarity scoring |
28 | | -- Customizable normalization methods |
| 19 | +```bash |
| 20 | +pip install opendeepsearch |
| 21 | +``` |
29 | 22 |
|
30 | | -Reference: |
| 23 | +## Usage ποΈ |
31 | 24 |
|
32 | | -## Environment Variables |
| 25 | +You can use OpenDeepSearch independently or integrate it with **SmolAgents** for enhanced reasoning and code generation capabilities. |
33 | 26 |
|
34 | | -Required environment variables: |
35 | | -- `SERPER_API_KEY`: For web search functionality |
36 | | -- `OPENROUTER_API_KEY`: For LLM-based extraction |
| 27 | +### Using OpenDeepSearch Standalone π |
37 | 28 |
|
38 | | -## Dependencies |
| 29 | +```python |
| 30 | +from opendeepsearch import OpenDeepSearchTool |
| 31 | +import os |
39 | 32 |
|
40 | | -Key dependencies include: |
| 33 | +search_agent = OpenDeepSearchTool(model_name="openrouter/google/gemini-2.0-flash-001", pro_mode=True) # Set pro_mode for deep search |
| 34 | +query = "Fastest land animal?" |
| 35 | +result = search_agent.search(query) |
| 36 | +print(result) |
| 37 | +``` |
41 | 38 |
|
42 | | -Full dependencies list can be found in the pyproject.toml file. |
| 39 | +### Integrating with SmolAgents & LiteLLM π€βοΈ |
43 | 40 |
|
44 | | -## Contributing |
| 41 | +```python |
| 42 | +from opendeepsearch import OpenDeepSearchTool |
| 43 | +from smolagents import CodeAgent, LiteLLMModel |
| 44 | +import os |
45 | 45 |
|
46 | | -Contributions are welcome! Please feel free to submit a Pull Request. |
| 46 | +search_agent = OpenDeepSearchTool(model_name="openrouter/google/gemini-2.0-flash-001", pro_mode=True) |
| 47 | +model = LiteLLMModel( |
| 48 | + "openrouter/google/gemini-2.0-flash-001", |
| 49 | + temperature=0.2, |
| 50 | + api_key=os.environ["OPENROUTER_API_KEY"] |
| 51 | +) |
47 | 52 |
|
48 | | -## Acknowledgments |
| 53 | +code_agent = CodeAgent(tools=[search_agent], model=model) |
| 54 | +query = "How long would a cheetah at full speed take to run the length of Pont Alexandre III?" |
| 55 | +result = code_agent.run(query) |
| 56 | + |
| 57 | +print(result) |
| 58 | +``` |
| 59 | + |
| 60 | +## LiteLLM Setup & Usage π₯ |
| 61 | + |
| 62 | +[LiteLLM](https://github.com/BerriAI/litellm) is a lightweight and efficient wrapper that enables seamless integration with multiple LLM APIs. OpenDeepSearch leverages LiteLLM, meaning you can use **any LLM from any provider** that LiteLLM supports. This includes OpenAI, Anthropic, Cohere, and others. **OpenRouter** is a great example of a provider that gives access to multiple models through a single API. |
| 63 | + |
| 64 | +### Installing LiteLLM |
| 65 | + |
| 66 | +[LiteLLM](https://github.com/BerriAI/litellm) is a lightweight and efficient wrapper that enables seamless integration with multiple LLM APIs. OpenDeepSearch leverages LiteLLM for model inference. |
| 67 | + |
| 68 | +### Installing LiteLLM |
| 69 | + |
| 70 | +To install LiteLLM, run: |
| 71 | + |
| 72 | +```bash |
| 73 | +pip install litellm |
| 74 | +``` |
| 75 | + |
| 76 | +### Using LiteLLM with OpenDeepSearch |
| 77 | + |
| 78 | +You need to set up your API key in your environment variables before using LiteLLM: |
| 79 | + |
| 80 | +```bash |
| 81 | +export OPENROUTER_API_KEY='your-api-key-here' |
| 82 | +``` |
| 83 | + |
| 84 | +Then, you can use it as shown in the SmolAgents integration example above. |
| 85 | + |
| 86 | +## Configuration βοΈ |
| 87 | + |
| 88 | +You can configure OpenDeepSearch with environment variables or parameters: |
| 89 | + |
| 90 | +- `OPENROUTER_API_KEY`: API key for accessing OpenRouter models. |
| 91 | +- `MODEL_NAME`: Model used for search (default: `openrouter/google/gemini-2.0-flash-001`). |
| 92 | +- `PRO_MODE`: Set to `True` to enable deep search for more accurate results. |
| 93 | + |
| 94 | +## Acknowledgments π‘ |
| 95 | + |
| 96 | +OpenDeepSearch is built on the shoulders of great open-source projects: |
| 97 | + |
| 98 | +- **[Crawl4AI](https://github.com/crawl4ai)** π·οΈ β Provides data crawling support. |
| 99 | +- **[Infinity Embedding API](https://infinity.ai)** π β Powers semantic search capabilities. |
| 100 | +- **LiteLLM** π₯ β Used for efficient AI model integration. |
| 101 | +- **Various Open-Source Libraries** π β Enhancing search and retrieval functionalities. |
| 102 | + |
| 103 | +## License π |
| 104 | + |
| 105 | +This project is licensed under the MIT License. See [LICENSE](LICENSE) for details. |
| 106 | + |
| 107 | +## Contributing π€ |
| 108 | + |
| 109 | +We welcome contributions! If you'd like to improve OpenDeepSearch, please: |
| 110 | + |
| 111 | +1. Fork the repository. |
| 112 | +2. Create a new branch (`feature-xyz`). |
| 113 | +3. Submit a pull request. |
| 114 | + |
| 115 | +For major changes, open an issue to discuss your ideas first. |
| 116 | + |
| 117 | +## Contact π¬ |
| 118 | + |
| 119 | +For questions or collaborations, open an issue or reach out to the maintainers. |
49 | 120 |
|
50 | | -- Built with [Crawl4AI](https://github.com/crawl4ai) |
51 | | -- Uses [Infinity Embedding API](https://infinity.ai) for semantic search |
52 | | -- Powered by various open-source libraries and tools |
|
0 commit comments