The podcast search engine allows users to find relevant podcast episodes that discuss topics they are interested in. The search engine presents short podcast clips that contain the topics the user is looking for. The search engine allows users to directly listen to the clips in Spotify.
Users can either put in the exact term they are looking for, or make a request in natural language under the activation of the OpenAI Query Optimization.
Users can pick between 30 seconds, 2 minutes and 5 minutes podcast clips.
Report with our main findings, detailed implementation and experiments
Clicking the Settings icon ⚙️ in the search bar the user can choose between multiple settings of the search engine:
- Show scores: Shows the tf_idf score for each podcast clip in the UI.
- Use OpenAI Query Optimization: Resolves the query by calling an OpenAI API and improving the query. This allows users to ask requests in natural language form, e.g.
Recommend me something about Kendrick
. It can increase the query time because the OpenAPI needs to be called. - Number of search results: Number of relevant podcast clips that will be shown. Up to 50. The clips are grouped by episode, therefore there might be fewer episodes visible than clips.
When OpenAI query optimization is turned off, the search engine performs a boolean search. The search engine then supports the following types of search:
-
Intersection Search: Default input.
-
Phrase Search: By adding quotes to the start and end of your query: e.g.
"flat earth", "climate change"
. -
Wildcard Search:
*
will match alternative of any length, and?
will match alternative of one character. e.g.flat eart*
. Default as union search.
To set up and use our indexing script with Elasticsearch, follow these steps:
- Prerequisites:
- Ensure
Python
andpip
(Python's package installer) are installed on your computer.
- Ensure
- Install Required Libraries:
- Install the necessary Python libraries by running
pip install -r requirements.txt
from theindexer
directory. This command will install the libraries listed in therequirements.txt
file, includingelasticsearch
,python-dotenv
,os-sys
, andhashlib
.
- Install the necessary Python libraries by running
- Set Up Elasticsearch:
- Create a Cloud Elastic account.
- Once your account is set up, create a project and retrieve your Cloud_id, Endpoint & API keys.
- Remember to open the privileges for your API key.
- Configure Environment Variables:
- Save your Elastic Cloud Instance id
CLOUD_ID
, the Elasticsearch endpointCLOUD_ENDPOINT
andAPI_KEY
in a local.env
.
- Save your Elastic Cloud Instance id
- Prepare Data:
- Place your data files under
data/podcast-transcripts
in preparation for indexing.
- Place your data files under
- Configure Indexing Parameters:
- Modify variables in the
indexer.py
script before running it. Setallow_overlap
to determine if podcast snippets can overlap, adjustdocument_size
to control the snippet length in seconds, and specifyindex_name
to name the Elasticsearch index where your data will be stored. These settings allow for customization based on your specific indexing needs.
- Modify variables in the
- Run the Indexer Script:
- Execute the script by running
python indexer.py
from theindexer
directory to start the indexing process.
- Execute the script by running
The front-end was developed with React and JavaScript.
Requirements: Make sure Node.js
and npm
are installed on your machine.
To start the UI locally run:
cd client # go to client directory
npm install # install all required node modules
npm start # start the React frontend
The front-end will be available at localhost:3000
.
The middle-ware was developed with Python and Flask.
Requirements: Make sure Python
, pip
and flask
are installed locally.
Make sure to import the following pip modules: elasticsearch, dotenv, flask, flask_cors, json, requests, openai, langchain
.
The search engine utilizes the Spotify Web API to retrieve additional information about the podcast episodes and get the show images. To use the Spotify API create a Spotify developer application and get the app credentials. Add your SPOTIFY_CLIENT_ID
and SPOTIFY_CLIENT_SECRET
to your local .env
file.
To start the middle-ware locally run:
cd app # go to app directory
flask --app searcher run # start the middleware
The middle-ware will be availbale at http://127.0.0.1:5000
.
The OpenAI Query Optimization is developed based on Lang-Chain, currently utilizing gpt-3.5-turbo model.
- Environment: Make sure
langchain-cli
are installed locally. - Config: Adding your
OPENAI_API_KEY
to a.env
file. In thechain.py
file, adding the name of your indices toINCLUDE_INDICES
.
You can turn on the LLM Query Optimization in the settings of the search engine.
NOTICE: The program will have to automatically query for the cloud, so please make sure the privileges of the Cloud API key are set as open.