Releases: digitalmethodsinitiative/4cat
v1.46 Autumn Additions
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Added support for extensions, modular additions to 4CAT that can be put in the
/extensions/
folder in the 4CAT root (#451) - Added a processor to download 4CAT datasets as a Zip file, and updated the 'Import dataset' data source to allow loading these zip files as new datasets (#452)
- Added a data source for Threads, to allow importing Threads data via Zeeschuimer (a68f5d6)
- Added a processor for LLM-powered text coding via the DMI Service Manager (693960f)
- Added an option to the Telegram data source to crawl based on mentions and links in addition to forwarded messages (8f2193c)
- Added razdel as a tokeniser to the Tokenise processor for tokenising Russian text (0b74569)
- Added an option to the 'Word trees' processor to allow selecting which column(s) to read text from (e4c0099)
- Added more stopwords corpora to the Tokeniser and allow using multiple at the same time - by default the one for the chosen text language is used (b9a327a)
- Added more 'auto-fill' options when importing CSV files (empty values, or the current date and time) (9bd9da5)
- Added a warning to the 'Media upload' data source when trying to upload too many files at once (ffcb6a4, e4f982b, e304649)
- Added more indicative dataset status updates when running DMI Service Manager-powered processors (eb76937)
- Added support for previewing HTML datasets in the web interface (203314e)
- Added configuration settings to toggle display of Anonymisation controls on the 'Create dataset' page (0945d8c)
- Added configuration setting to toggle display of the 'you can install 4CAT yourself' message in the login form (cd356f7)
- Added a feed of the official 4CAT BlueSky account to the 4CAT 'Home' page (3d94b66)
- Added a delay to the worker that cleans up expired and orphaned datasets to wait 7 days before actually deleting an orphaned dataset (bfaf23b)
- Fix a crash in the 'Image category wall' processor (ebf39d8)
- Fix a crash in the 'Google Vision API' processor when running it on an empty dataset (fb09162)
- Fix a crash in the 'Video hashes' processor when running it on a dataset with no .metadata.json file (d41fa34)
- Fix a crash in the 'Download images' processor when trying to download images from a malformed URL (579ff64)
- Fix a crash in the 'Download videos' processor when trying to extract video URLs from a non-text data field (e9b5232)
- Fix a crash in the 'Hatebase' processor (4ba872b)
- Fix a rare race condition when running 4CAT via Docker (#396)
- Fix an issue in the front-end where an incomplete list of available processors was shown in some situations (4323946)
- Fix an issue in the Telegram data source where it would indicate that the 'app' needs updating to log in (d2a787e, 346150b)
- Fix an issue in the Telegram data source where crawl depth parameters would not be interpreted correctly (1c0bf5e, #444)
- Fix an issue in the Telegram data source where some post attributes were not read correctly (2c8c860, 959710a, c67a046)
- Fixed an issue where the link to a newly created dataset on the 'Create dataset' page would not always work (b542ded)
- Fixed an issue where configuration tags with no associated users could get deleted (d6064be)
- Fix an issue in the LinkedIn data source where image URLs would not always be parsed correctly (c27fbbe)
- Fix an issue in the Douyin data source where stream URLs would not always be parsed correctly (d769be4)
- Remove Spacy-powered text analysis processors (48c20c2)
- Remove the Parler data source (ee7f434)
- Update dependences (#450, a269f96, d2a787e)
Full Changelog: v1.45...v1.46
v1.45 Summer 2024 Special Edition
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Added a 'media upload' data source that allows uploading media for processing with various image/sound/video processor (#419)
- Added a 'Visualise images with text captions' processor that generates an image wall including captions for each image (e7e636b)
- Updated dependencies for video hash processor (aad94f3)
- Updated 'Help' link in footer and the page it links to to give better information on how to get help with 4CAT (acf5de0)
- Updated the in-page preview for datasets to more accurately make hyperlinks clickable (8d4f99b)
- Updated the Telegram data source to optionally allow one to crawl channels (e8714b6)
- Updated the 'Count values' processor with an option to differentiate between missing and blank values (f2145bd)
- Updated the item mapping for X/Twitter data to include URLs for the author profile picture and banner in the CSV output (bcb9140)
- Fixed a crash in the 'Download images' processor when setting the amount of images to download to 0 (e0c55a8)
- Fixed an issue with upgrading a 4CAT running in a Docker container where pip could not properly run to update Python dependencies (2aaa972)
- Fixed various bugs with the 'Visualise images by category' processor
- Fixed a bug in the 4chan data import helper script when processing posts from threads of which the OP had been deleted (d67cf44)
- Fixed a bug where the wrong worker would be used when converting Google Vision or Clarifai output to CSV (fd3ac23)
- Fixed a bug in the tokeniser where it could crash when selecting 'other' as a language (f4f8e66)
- Fixed a bug where a job for the orphaned file cleanup worker would not always be properly added to the queue (1b9965d)
- Fixed a bug in the 'visualise images by category' processor where setting the max images to 0 would not properly remove the image limit (3580fc9)
Full Changelog: v1.44...v1.45
v1.44 Dependency hotfix
While deploying the previous 4CAT release (1.43) an issue surfaced where the 'Count values' processor could not be loaded due to a dependency issue in a third-party library. We have updated 4CAT's dependency list to resolve this. Otherwise this release is identical to the previous one, save for this one additional feature:
- Added a progress bar to the list of active workers on the control panel's front page for workers where progress information is available.
Full Changelog: v1.43...v1.44
v1.43 A small update with some fixes and new features
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
When updating a Docker-based 4CAT, the front-end interface may fail to restart properly, marked by an error message like 'Error upgrading front-end container' in the restart log. In this case please run an upgrade via Docker Desktop or the command line as indicated on this page.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Fix a crash in the 'Fetch URL Metadata' processor (9057798, 7eab746)
- Fix an issue with uploading CSV files when Unix timestamps were formatted as floats rather than integers (27a568e)
- Fix a crash with metadata handling in the image-to-text processor (51e58dd, 51e58dd)
- Fix a potential crash in the LinkedIn data mapping code (e0e0668, ef9dd48)
- Fix a potential crash in the Telegram image downloader that could be triggered if a download timed out (5727ff7)
- Fix a crash in video processors when processing Telegram data (060f2cd, 661c42c)
- Update Douyin data parser to handle new data format (289aa34, 2d2bbb9)
- Update TikTok data parser to properly handle all imported data (d756162)
- Update Instagram data parser to properly handle all imported data (807ab77)
- Update video processors to be compatible with ffmpeg versions before 5.1 (1b51d22)
- Update dependencies (5b9b23f)
- Update the video scene frame extractor to be much more efficient (572d03f)
- Update order of shutting down workers when stopping the 4CAT backend to ensure the internal API remains available for as long as possible (4182c43
- Update error handling for processors interfacing with the DMI Service Manager (baacc86)
- Update Twitter-related code and text to reflect its name change to 'X' (ab34c41)
- Add support for automatic pseudonymisation when importing data from Zeeschuimer (8b66ae7, c973750)
- Add Gab data source (#401, 9b662e9)
v1.42 Fixing of bugs and imports
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
When updating a Docker-based 4CAT, the front-end interface may fail to restart properly, marked by an error message like 'Error upgrading front-end container' in the restart log. In this case please run an upgrade via Docker Desktop or the command line as indicated on this page.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Fix a bug in the restart procedure that could result in the front-end container failing to restart and upgrade when running 4CAT via Docker (765f29e)
- Fix a bug that could result in a processor crash when trying to filter datasets for a string on columns containing numeric values (537d764)
- Fix a bug that could result in a worker crash when importing CrowdTangle-formatted CSV files (91c3da1)
- Fix an issue with mapping Twitter data that could result in a crash (43c6ed6)
- Added the possibility to create notifications for all users with a certain tag in the Control Panel (c43e76d)
- Added a data source for importing TikTok comment data from Zeeschuimer (50a4434)
- Updated the default 4CAT configuration to enable the import of Gab and TikTok Comment data (342a403)
- Updated Douyin item mapping to properly process items not assigned to a specific room (6918bae, 1fd78b2)
v1.41 April does what it will
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or if you encounter issues when upgrading via the web UI.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
Processors
- Fixed an issue with the image downloading processor where it would not properly follow links to images for 4chan datasets (e5f1f70)
- Fixed an issue with the TF-IDF processor where results would be off if fewer results than the requested top n results were available (44848a8)
- Fixed a rare crash that could occur when a processor would encounter a FileNotFound exception while a Slack webhook was configured for logging (131a0ec, #422)
- Updated dataset filters to give filtered datasets a more context-senstive name, based on the original's name as well as the filter type (3ef3e5e)
- Updated the PixPlot processor to allow for a longer run time (2582538)
- Added a dedicated processor for downloading Telegram videos, replacing the generic one for datasets from that data source (94c814b, 3f15410)
- Added 'emoji' count option to 'Count values' processor, to count how often emoji occur in a dataset (bb50fc9)
- Added 'Fetch URL metadata' processor, to fetch details about URLs mentioned in a dataset (a0baae1)
- Added options to the Telegram image downloader to fetch link preview thumbnails (8a7da53)
Data sources
- Fixed an issue with Telegram datasets that made items not have unique IDs in certain situations (a8b36dc)
- Fixed an issue when mapping Instagram datasets where a crash could occur if the 'full_name' of a user could not be determined (fa3be93)
- Fixed an issue with the Telegram data source where the 'max messages to fetch' setting would not be parsed correctly (d749237)
- Updated the warnings given by imported datasets to the user about items that could not properly be imported or for which some data was missing to be more accurate (db05ae5, #418)
- Added columns with reactions, link details and number of forwards to Telegram dataset CSV exports (e653e3d, e4a9344)
- Added support for image galleries to the Douyin data source (876f4a4)
- Added a 4CAT setting to control the amount of entities that can be fetched at a time via the Telegram dataset (cd2e74d)
Web interface
- Fixed an issue where the UI would not prompt for confirmation when deleting a configuration tag (39f2ec4)
- Fixed an issue where deleting a
user:
tag would delete alluser:
tags (9b4981d) - Fixed an issue where the colour of the favicon would revert to pink in certain situations (073587e)
- Fixed an issue where the 'Request access' link would be visible on the login page even if requesting access was disabled (28d733d, 1f2cb77)
- Fixed an issue where the control panel could be unresponsive when 4CAT's data folder was very large; disk usage is now calculated every few hours in the background (c8ad90b)
- Fixed an issue where the configuration tag priority order could be edited via the Settings page; use the Configuration Tags page instead (ae1c00f)
- Updated the user filter on the user list page of the control panel to be case-insensitive (940bac7)
- Updated the layout of the control panel's Settings page to make it easier to navigate (d36254a)
- Updated the 'Share' dialog on dataset pages to allow comma-separated multiple item entry (6d8cb06)
- Updated some processors to hide/show certain options depending on the value of other options chosen (#397)
- Updated the CSV preview in the web UI to make hyperlinks clickable (daa7291)
- Added links to a list of users with the tag to the 'Configuration tags' page in the control panel (9b4981d)
Full Changelog: v1.40...v1.41
v1.40 Long Dutch Winter release
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
When updating a Docker-based 4CAT, upgrading to this version may fail or appear to not have made any changes the first time. This is due to a bug fixed in this version. If this happens to you, follow the 'Docker - how to upgrade with command' instructions via the link above.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Optimised Docker build (93ddb4b)
- Fix an issue with migrating/upgrading to a new version inside a Docker-based 4CAT (aeb8090)
- Fix an issue where the video downloader could fail when a link redirected too often or a video lacked a content type header (97209cb)
- Fix an issue where Twitter datasets exported as CSV could have different columns depending on the date the dataset was imported into 4CAT (ce2b2d5)
- Fix an issue where the 4CAT front-end log would not start properly when running 4CAT via gunicorn but not inside a Docker container (84168e9)
- Update LinkedIn item mapper to handle recently collected datasets (38a865e)
- Update Douban capture module to properly collect comment like counts (e1211c7)
- Add various explanatory tooltips to dataset result pages (c0aa4c7)
- Make 4CAT more robust in how it maps content imported with Zeeschuimer into CSV files (#409)
v1.39 Bugfixes and maintenance
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Fix several issues that could occur when trying to import CSV files (#404, 9963cd8, 02acd86, 0049de6, 151d498, cdfe75c)
- Fix an issue where the canonical host name for a Docker-based 4CAT front-end would not always be set correctly (#395, #403, 0d1dc05)
- Fix an issue where tags would linger in the database after they were no longer associated with any user, or would be stored in the wrong order of preference (c38a215, 79f58bf)
- Fix an issue where uploads from Zeeschuimer would fail for data sources whose ID starts with a number (e.g.
9gag
) - Fix an issue where making a user would crash the front-end if a user with the same e-mail already exists (14c1f9c)
- Fix a crash in the 'Post/topic matrix' processor (be6ea8c)
- Fix an issue where the user-specific setting for the max amount of downloaded TikTok videos would be ignored (df2462f)
- Fix a crash when duplicating a dataset and copying the dataset's owners (a3fdbad)
- Fix an issue where the TikTok metadata fetcher would fail with a 'proxy unavailable' error when trying to run multiple TikTok processors at the same time (04faf23)
- Fix an issue with the Telegram image downloader (23edf44)
- Fix an issue where network processors could crash with a divide by zero error if run on an empty dataset (e10cb2e)
- Add a clearer error message when trying to merge datasets that are not CSV or NDJSON (c5fbe02)
- Add an explicit edge weight to generated networks that is properly recognised by Gephi (c415870)
- Add the option to only capture the first frame of a video when extracting video frames (b3981c3)
- Add various image processors (e.g. image wall) as supported for video frame datasets (588290a)
- Add improved compatibility between video hash processor and image classification processor so that video frames can be visually categorised (a4e6904)
- Add data source options to explicitly define the Tumblr API key to use if no 4CAT-wide keys have been configured (fdbdca9)
- Add a 4CAT setting that determines which proxy headers will be taken into account for URLs generated in the front-end (5e47dac)
- Update the TikTok import processor to cope with the new data formats provided by Zeeschuimer (f24828b)
- Update code that determines place of a dataset in the queue to be more efficient (239726b)
- Update dataset importer to stream files, which should prevent issues with very large data files (7a1c4b9)
- Update the type of the
jobid
column of the jobs table to BIGINT to avoid issues with long-running 4CAT servers (3414a96) - Update jobs table to no longer have a useless 'status' column (9f493b2)
- Update processor presets so they do not linger in an unfinished state if one of their components crashes, finish with an error instead (b815c54, ddf9aab)
- Update various image dataset processors to produce more compatible .metadata.json files (87ec4d0)
- Updated version for the videohash library dependency (5f5e10f)
v1.38 CSV Upload hotfix
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fix for a bug introduced in the previous release, v1.37:
- Fixed an issue that made the CSV upload data source never get past the 'please define your columns' stage when uploading CSVs with a custom format.
v1.37 A release coincidental with AoIR 2023
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following features and fixes:
Processors
- Added a 'top hashtags' processor for datasets that contain hashtags (this is a preset for the 'count values' processor)
- Added more configuration options for the image wall processors to limit how large datasets can be
- Fixed a bug that made the co-link processor crash when used with particularly small datasets
- Fixed various issues with processing data via the DMI Service Manager
Data sources
- Data parsers for data imported from Zeeschuimer have been updated to allow data captured from the current version of the supported platforms.
- Added a data source that allows importing datasets from other 4CAT servers (#352, #375). This is not enabled by default but can be enabled in the Control Panel.
- Fixed an issue where CSV files would erroneously be detected as having no header rows upon importing them (#392)
- Fixed a number of issues with Telegram data parsing (#368, #371)
Deployment and configuration
- 4CAT will now update Python libraries to their latest compatible version when running
migrate.py
or upgrading via the control panel. - Docker images are now published for both ARM and x64 processor architectures (#392)
- Added a button to the 'Restart or upgrade' control panel page to restart only the front-end
- Added the option to migrate to a development branch of 4CAT via the control panel's "Upgrade" page. This requires enabling the 'Can upgrade to development branch' privilege in user settings before it is available.
- Fixed bugs with restarting the 4CAT front-end via the control panel when running via Apache, gunicorn or uwsgi
- Fixed a bug where generated URLs could have the wrong scheme when running 4CAT behind a reverse proxy
- Fixed a race condition that could cause the front-end container to crash on start-up when using 4CAT via Docker (#378)
- Fixed a potential issue when installing 4CAT via Docker with the latest version of the Postgres image (#382)
Interface
- Added a panel to the control panel which shows the active user tags for the currently logged in user
- Added a page to the control panel that allows creating many users at once by uploading a CSV file with user data
- Added a 'User Interface' category to the Settings panel to configure 4CAT's interface, for example to show in-line dataset previews and what to use as the 4CAT 'home page' (#380)
- Added the option for users to now receive an e-mail alert when their dataset is completed, which can be enabled via the control panel through the 'Show email when complete option' option in the 'User interface' settings (#329, #385)
- Added an indication of the precise place in the queue for queued datasets (#239)
- Added the option to force a particular configuration tag by passing a specific HTTP header. This can be used to serve a different configuration of 4CAT depending on e.g. the used domain name, or other factors as determined by the reverse proxy serving 4CAT (#380)
- Fixed an issue with the manipulation of user tags via the control panel (#383, #384)
- Fixed an issue with changing the ownership for many datasets at once via the 'Bulk dataset management' page
- Fixed an issue that allowed the 'About' page to appear twice in the site navigation