The main goal of this project is to populate the Kabardian Poetry Collection with poems in Kabardian language - a potentially vulnerable* language spoken in Northwest Caucasus which even ChatGPT struggles to recognise:
The poems collection site presents each poem together with the poem's vocabulary - words that the custom Python script could find in the kbd.wiktionary.org/wiki/ dictionary and extract their translation.
The insert_data.py script performs the following steps to extract, transform and load data into the local database:
- it extracts each poem's content from up to ten txt files saved in the poems_contents folder;
- it checks if the poem is already in the database; the poem will not be added if this is the case;
- it then splits each poem into separate words and checks if these words can be found in the online dictionary;
- if a word was found, it checks if this word is already in the database and adds it to the words pool if not;
- finally, the script allocates the found words and their translation to the corresponding poem which will then appear in the Vocabulary section on each poem page.
Once the data is extracted and loaded to the local database, the contents of its tables will be moved to the corresponding tables in the PostgreSQL live database using copy_to_aws.py.
* classified by the UNESCO Atlas of the World's Languages in Danger.
- Poems added to the txt files. (As Kabardian is a low-resource language, it is not very easy to find many poems in this language online and usually it is enough to process ten poems at a time.)
- Run copy_to_aws.py which will first run the insert_data.py to identify the vocabulary for each selected poem, to scrape words' translations from an online dictionary and add them together with the poems to a local Postgres database. It will then use pg_dump to copy the contents of the tables from the local database to the corresponding tables on the live database.
- Run dbt models in the postgres_transformation project to update the model.
Dependency Management With Python Poetry
Sources: Real Python, Poetry Docs
To activate the virtual environment, create a nested shell with poetry shell. This will activate the environment and start a new shell session within it.
After running this command, you'll be inside the virtual environment, and any Python commands or scripts you run will use the dependencies installed in this environment.
poetry shell
To deactivate the virtual environment and exit this new shell type exit.
exit
To deactivate the virtual environment without leaving the shell use deactivate.
deactivate
If you don’t want to activate the entire shell but just want to run commands within the Poetry-managed virtual environment, you can prefix your commands with poetry run.
poetry run python insert_data.py
Or to run a dbt command:
poetry run dbt run
poetry update
DBT Installation With Python Poetry
poetry add dbt-core dbt-postgres
Verify installation:
poetry shell
dbt --version
dbt init postgres_transformation
cd postgres_transformation
dbt requires a profiles.yml file to connect to your PostgreSQL database. The file is usually located in ~/.dbt/. Configure the file:
kabardian_poems:
outputs:
dev:
dbname: your_db_name
host: your_host
pass: your_password
port: 5432
schema: "{{ env_var('DBT_SCHEMA', 'your_dbt_schema') }}"
threads: 4
type: postgres
user: your_user
target: dev
+materialized: table #view
poetry run dbt run
Retrieve recent web server logs for your Heroku app via CLI
- Log in to Heroku via terminal
heroku login
- This command will open a browser window prompting you to enter your Heroku credentials (email and password).
- Once logged in, you’ll be redirected back to the CLI, where you should see a message indicating you’re logged in.
- Retrieve recent logs with
--num
flag
- You can adjust 1500 to any number up to Heroku's current maximum.
heroku logs --app your-app-name --num 1500
- Save logs to a file
heroku logs --app your-app-name --num 1500 > recent-logs.txt