Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cartographer and PixPlot Image preview #400

Open
wants to merge 84 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
4f0748e
basic mappings for a pixplot; known bug
dale-wahl Oct 24, 2023
9ca73e0
Merge branch 'master' into cartagrapher
dale-wahl Oct 25, 2023
cae91bc
add in LOTS of necessary steps so it actually works
dale-wahl Oct 26, 2023
6f62b13
add the pixplot template as a base
dale-wahl Oct 26, 2023
d473454
fix up atlas overlapping images & increase res
dale-wahl Oct 26, 2023
7177441
create a "plots" endpoint that uses pixplot_template and theoretical …
dale-wahl Nov 1, 2023
8c38621
fix up some path issues in the js file
dale-wahl Nov 1, 2023
4459394
add plot as preview!
dale-wahl Nov 1, 2023
c0a042e
rename
dale-wahl Nov 2, 2023
97bd8c3
Update .gitignore
dale-wahl Nov 2, 2023
946a167
Update .gitignore
dale-wahl Nov 2, 2023
0ae3382
add pixplot_template images
dale-wahl Nov 2, 2023
372d1ac
Update .gitignore
dale-wahl Nov 2, 2023
fad263d
add additional images for pixplot
dale-wahl Nov 2, 2023
c01c6f4
build paths based on two different sources (assets and data)
dale-wahl Nov 2, 2023
3c8256e
Merge branch 'master' into cartographer
dale-wahl Nov 2, 2023
2cbda77
Merge branch 'master' into cartographer
dale-wahl Nov 7, 2023
3e47291
create preset that auto runs cartographer
dale-wahl Nov 7, 2023
dfae9d9
only run preset image downloader
dale-wahl Nov 7, 2023
9ea862c
remove debug log
dale-wahl Nov 7, 2023
701a480
serve archived files
dale-wahl Nov 29, 2023
1ce0caf
serve archived files via frontend; use those images in cartographer
dale-wahl Nov 30, 2023
d1184b9
fix umap thumbsize when actually grid
dale-wahl Nov 30, 2023
64e5a2c
fix up the cartographer page a bit
dale-wahl Nov 30, 2023
1ca28a2
Merge branch 'master' into cartographer
dale-wahl Nov 30, 2023
e1fd2bf
do NOT change that 128 thumbnail size
dale-wahl Dec 12, 2023
067506b
serve archive files via generator (as opposed to extracting and delet…
dale-wahl Dec 12, 2023
c4ad487
fix adding annotation labels to mapped results
dale-wahl Dec 12, 2023
7ae8315
preview accepts url params to increase number/size of preview
dale-wahl Dec 12, 2023
ab6de26
attach preset to download images (instead of cartographer)
dale-wahl Dec 12, 2023
a6347db
preview zip datasets w/ cartographer if exists
dale-wahl Dec 12, 2023
98c3279
preset does not copy results_file but updates its results_file; clean…
dale-wahl Dec 12, 2023
6cbbf5b
render something for zips when no cartographer exists
dale-wahl Dec 12, 2023
73aa724
Merge branch 'master' into cartographer
dale-wahl Dec 14, 2023
ef0e633
Merge branch 'master' into cartographer
dale-wahl Dec 14, 2023
d07914e
add tiktok and telegram presets to use cartographer
dale-wahl Dec 14, 2023
9cc6e9e
add metadata to plot!
dale-wahl Dec 15, 2023
dd7f26e
modify cartographer to use max amount
dale-wahl Dec 18, 2023
13367be
use collages
dale-wahl Dec 18, 2023
409c3ba
dataset updates: add get_children method, mod get_all_children, allow…
dale-wahl Dec 20, 2023
7b3902f
Merge branch 'master' into cartographer
dale-wahl Dec 20, 2023
9772f53
moved hash_similarity_network.py to video_hasher.py
dale-wahl Dec 20, 2023
562a53b
staticmethod to init a dataset w/o db
dale-wahl Dec 21, 2023
82c8187
prep cartographer to check for coordinate-maps
dale-wahl Dec 21, 2023
adf58ca
create coordinate-map datasets from sigma network preview - disabled
dale-wahl Dec 21, 2023
d7030cc
Merge branch 'master' into cartographer
dale-wahl Jan 9, 2024
89c87e6
allow text on categorical layout only
dale-wahl Jan 12, 2024
b68ed8c
cartographer: enable date layout; and almost categorical (hidden curr…
dale-wahl Jan 12, 2024
e312888
Merge branch 'master' into cartographer
dale-wahl Jan 30, 2024
8e8e0ee
get archived file handle file not found
dale-wahl Jan 30, 2024
8ed5a32
cartographer use archive zip instead of results subfolder
dale-wahl Jan 30, 2024
f8168bd
update to use get_children() dataset method
dale-wahl Jan 31, 2024
ee32662
time some routes in debug mode
dale-wahl Jan 31, 2024
98ede11
If button is hidden (say because you don't want to implement it yet),…
dale-wahl Feb 1, 2024
96c7586
Merge branch 'master' into cartographer
dale-wahl Feb 7, 2024
f903b34
cartographer: fix front sizes on layout change!!!
dale-wahl Feb 7, 2024
6e0a6a2
cartographer: increase character count to display more categories
dale-wahl Feb 7, 2024
0c86246
cartographer: category view works now!
dale-wahl Feb 7, 2024
9d9934f
cartographer: found that stupid floating zero
dale-wahl Feb 8, 2024
98dba8e
pixplot_template: move metadata to left of image view; fix thumbs in …
dale-wahl Feb 8, 2024
dd25db3
cartographer: tested a better categorical point_size
dale-wahl Feb 8, 2024
6229333
fix get_all_children method to allow non instantiated datasets
dale-wahl Feb 20, 2024
c25f8ec
Merge branch 'master' into cartographer
dale-wahl Feb 20, 2024
14fe1f0
remove dataset.get
dale-wahl Feb 20, 2024
4d62dcf
fix typo and remove time_this debug
dale-wahl Feb 20, 2024
5d16065
revert .env change (mistake)
dale-wahl Feb 20, 2024
8970f64
add cartographer for video scenes
dale-wahl Feb 20, 2024
cb370e6
Merge branch 'master' into cartographer
dale-wahl Feb 21, 2024
a04ee29
deactivate video_scene_frames to plot pipeline
dale-wahl Feb 21, 2024
b5ee46b
cartographer handle directories
dale-wahl Feb 21, 2024
e053fee
reenable video-scene-frames preset to plot
dale-wahl Feb 21, 2024
5685d31
remove video-scene-frames preset; breaks other preset's `is_compatible`
dale-wahl Feb 21, 2024
14af493
add ui_only parameter to DataSet.get_available_processors() and Basic…
dale-wahl Feb 29, 2024
77fa3d3
Merge branch 'display_in_ui' into cartographer
dale-wahl Feb 29, 2024
9743254
update image downloaders and presets to use display_in_ui instead of …
dale-wahl Feb 29, 2024
1b6c0c8
don't delete twice
dale-wahl Feb 29, 2024
7720631
preview zip files opens new window as opposed to iframe
dale-wahl Feb 29, 2024
7f68486
Merge branch 'master' into cartographer
dale-wahl May 8, 2024
cb477c9
fix up ui display changes
dale-wahl May 8, 2024
c6bdc04
fix is_hidden from tiktok video to image downloader
dale-wahl May 8, 2024
219f30b
Merge branch 'master' into cartographer
dale-wahl May 28, 2024
ad79e70
fix up max images (if 0, max would always use 0)
dale-wahl May 28, 2024
8bcc5b1
map umap optional!
dale-wahl May 28, 2024
35c1c6d
alphabetic is also optional
dale-wahl May 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,6 @@ webtool/venv/
*.ipynb
venv/

# do not ignore interface images
!webtool/static/img/*.png
!webtool/static/img/*.gif
!webtool/static/img/*.jpg
!webtool/static/img/favicon/*.ico
!webtool/static/img/flags/*.png
!common/assets/github-screenshots/*.png

# generated by 4CAT
webtool/static/css/colours.css

Expand All @@ -65,3 +57,13 @@ keys/
images/
sphinx-3.3.1/
sphinx/

# do not ignore interface images
!webtool/static/img/*.png
!webtool/static/img/*.gif
!webtool/static/img/*.jpg
!webtool/static/img/favicon/*.ico
!webtool/static/img/flags/*.png
!webtool/static/pixplot_template/assets/images/*
!webtool/static/pixplot_template/assets/images/icons/*
!common/assets/github-screenshots/*.png
4 changes: 3 additions & 1 deletion backend/lib/preset.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ def process(self):
# also make sure there is always a "parameters" key
pipeline = [{"parameters": {}, **p} for p in pipeline.copy()]

pipeline[-1]["parameters"]["attach_to"] = self.dataset.key
# check that preset has an "attach_to" parameter in one of the processors
if not any("attach_to" in p["parameters"] for p in pipeline):
pipeline[-1]["parameters"]["attach_to"] = self.dataset.key

# map the linear pipeline to a nested processor parameter set
while len(pipeline) > 1:
Expand Down
34 changes: 21 additions & 13 deletions backend/lib/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -307,8 +307,7 @@ def after_process(self):

if self.dataset.get_results_path().exists():
# Update the surrogate's results file suffix to match this dataset's suffix
surrogate.data["result_file"] = surrogate.get_results_path().with_suffix(self.dataset.get_results_path().suffix)
shutil.copyfile(str(self.dataset.get_results_path()), str(surrogate.get_results_path()))
surrogate.result_file = str(self.dataset.get_results_path().name)

try:
surrogate.finish(self.dataset.data["num_rows"])
Expand Down Expand Up @@ -626,7 +625,7 @@ def write_csv_items_and_finish(self, data):
self.dataset.update_status("Finished")
self.dataset.finish(len(data))

def write_archive_and_finish(self, files, num_items=None, compression=zipfile.ZIP_STORED):
def write_archive_and_finish(self, filelist_or_folder, num_items=None, compression=zipfile.ZIP_STORED):
"""
Archive a bunch of files into a zip archive and finish processing

Expand All @@ -639,21 +638,30 @@ def write_archive_and_finish(self, files, num_items=None, compression=zipfile.ZI
are not compressed, to speed up unarchiving.
"""
is_folder = False
if issubclass(type(files), PurePath):
is_folder = files
if not files.exists() or not files.is_dir():
raise RuntimeError("Folder %s is not a folder that can be archived" % files)
if issubclass(type(filelist_or_folder), PurePath):
# folder with files
is_folder = filelist_or_folder
if not filelist_or_folder.exists() or not filelist_or_folder.is_dir():
raise RuntimeError("Folder %s is not a folder that can be archived" % filelist_or_folder)

files = files.glob("*")
#files = files.glob("*")

# create zip of archive and delete temporary files and folder
self.dataset.update_status("Compressing results into archive")
done = 0
with zipfile.ZipFile(self.dataset.get_results_path(), "w", compression=compression) as zip:
for output_path in files:
zip.write(output_path, output_path.name)
output_path.unlink()
done += 1
with zipfile.ZipFile(self.dataset.get_results_path(), "w", compression=compression) as zipf:
if is_folder:
for root, dirs, files in os.walk(filelist_or_folder):
for file in files:
zipf.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file), filelist_or_folder))
done += 1
else:
# list of files
for output_path in filelist_or_folder:
zipf.write(output_path, output_path.name)
output_path.unlink()
done += 1

# delete temporary folder
if is_folder:
Expand Down
4 changes: 3 additions & 1 deletion backend/workers/cleanup_tempfiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,12 @@ def work(self):
# if for whatever reason there are multiple hashes in the filename,
# the key would always be the last one
key = possible_keys.pop()

try:
dataset = DataSet(key=key, db=self.db)
except DataSetException:
if self.db.fetchone(f"select * from datasets where result_file = '{file.name}'") is not None:
# Another dataset is using this file
continue
# the dataset has been deleted since, but the result file still
# exists - should be safe to clean up
self.log.info("No matching dataset with key %s for file %s, deleting file" % (key, str(file)))
Expand Down
106 changes: 80 additions & 26 deletions common/lib/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ class DataSet(FourcatModule):
data = None
key = ""

children = None
_children = None
available_processors = None
genealogy = None
preset_parent = None
Expand Down Expand Up @@ -71,7 +71,6 @@ def __init__(self, parameters=None, key=None, job=None, data=None, db=None, pare
# Ensure mutable attributes are set in __init__ as they are unique to each DataSet
self.data = {}
self.parameters = {}
self.children = []
self.available_processors = {}
self.genealogy = []
self.staging_areas = []
Expand Down Expand Up @@ -148,11 +147,6 @@ def __init__(self, parameters=None, key=None, job=None, data=None, db=None, pare
# Reserve filename and update data['result_file']
self.reserve_result_file(parameters, extension)

# retrieve analyses and processors that may be run for this dataset
analyses = self.db.fetchall("SELECT * FROM datasets WHERE key_parent = %s ORDER BY timestamp ASC", (self.key,))
self.children = sorted([DataSet(data=analysis, db=self.db) for analysis in analyses],
key=lambda dataset: dataset.is_finished(), reverse=True)

self.refresh_owners()

def check_dataset_finished(self):
Expand Down Expand Up @@ -566,16 +560,17 @@ def delete(self, commit=True):
self.db.delete("datasets_owners", where={"key": self.key}, commit=commit)
self.db.delete("users_favourites", where={"key": self.key}, commit=commit)

# delete from drive
try:
self.get_results_path().unlink()
if self.get_results_path().with_suffix(".log").exists():
self.get_results_path().with_suffix(".log").unlink()
if self.get_results_folder_path().exists():
shutil.rmtree(self.get_results_folder_path())
except FileNotFoundError:
# already deleted, apparently
pass
# delete from drive if not used elsewhere
if self.db.fetchone(f"select * from datasets where result_file = '{self.get_results_path().name}' and key != '{self.key}'") is None:
try:
self.get_results_path().unlink()
if self.get_results_path().with_suffix(".log").exists():
self.get_results_path().with_suffix(".log").unlink()
if self.get_results_folder_path().exists():
shutil.rmtree(self.get_results_folder_path())
except FileNotFoundError:
# already deleted, apparently
pass

def update_children(self, **kwargs):
"""
Expand Down Expand Up @@ -724,7 +719,7 @@ def add_owner(self, username, role="owner"):
self.refresh_owners()

# make sure children's owners remain in sync
for child in self.children:
for child in self.get_children(instantiate_datasets=True):
child.add_owner(username, role)
# not recursive, since we're calling it from recursive code!
child.copy_ownership_from(self, recursive=False)
Expand Down Expand Up @@ -755,7 +750,7 @@ def remove_owner(self, username):
del self.tagged_owners[username]

# make sure children's owners remain in sync
for child in self.children:
for child in self.get_children(instantiate_datasets=True):
child.remove_owner(username)
# not recursive, since we're calling it from recursive code!
child.copy_ownership_from(self, recursive=False)
Expand Down Expand Up @@ -800,7 +795,7 @@ def copy_ownership_from(self, dataset, recursive=True):

self.db.commit()
if recursive:
for child in self.children:
for child in self.get_children(instantiate_datasets=True):
child.copy_ownership_from(self, recursive=recursive)

def get_parameters(self):
Expand Down Expand Up @@ -1242,7 +1237,29 @@ def get_genealogy(self, inclusive=False):
self.genealogy = genealogy
return self.genealogy

def get_all_children(self, recursive=True):
def get_children(self, instantiate_datasets=True, update=False):
"""
Get children of this dataset

:param bool instantiate_datasets: Instantiate DataSet objects for each child else return ChildDataset objects w/ only key and type attributes
:param bool update: Update the list of children from database if True, else return cached value
:return list: List of child datasets
"""
if self._children and not update:
return self._children

if instantiate_datasets:
analyses = self.db.fetchall("SELECT * FROM datasets WHERE key_parent = %s ORDER BY timestamp ASC",
(self.key,))
self._children = sorted([DataSet(data=analysis, db=self.db) for analysis in analyses],
key=lambda dataset: dataset.is_finished(), reverse=True)
return self._children
else:
# Returns simple ChildDataset objects with only key and type
# Do not update self._children since this is not a list of DataSet objects
return [ChildDataset(key=key, type=dataset_type) for key, dataset_type in self.db.fetchall("SELECT key, type FROM datasets WHERE key_parent = %s ORDER BY timestamp ASC", (self.key,))]

def get_all_children(self, recursive=True, instantiate_datasets=True):
"""
Get all children of this dataset

Expand All @@ -1252,11 +1269,20 @@ def get_all_children(self, recursive=True):

:return list: List of DataSets
"""
children = [DataSet(data=record, db=self.db) for record in self.db.fetchall("SELECT * FROM datasets WHERE key_parent = %s", (self.key,))]
children = self.get_children(instantiate_datasets=instantiate_datasets)
results = children.copy()
if recursive:
for child in children:
results += child.get_all_children(recursive)
if instantiate_datasets:
# Can use the DataSet.get_all_children method for each child
for child in children:
results += child.get_all_children(recursive)
else:
# Need to check database directly for children of children
while children:
child = children.pop(0)
new_kids = [ChildDataset(key=key, type=dataset_type) for key, dataset_type in self.db.fetchall("SELECT key, type FROM datasets WHERE key_parent = %s ORDER BY timestamp ASC", (child.key,))]
children += new_kids
results += new_kids

return results

Expand Down Expand Up @@ -1374,9 +1400,11 @@ def get_own_processor(self):

:return: Processor class, or `None` if not available.
"""
processor_type = self.parameters.get("type", self.data.get("type"))
processor_type = self.type if hasattr(self, "type") else self.parameters.get("type")
return backend.all_modules.processors.get(processor_type)

def get(self, key):
return self.data.get(key)

def get_available_processors(self, user=None):
"""
Expand All @@ -1397,7 +1425,7 @@ def get_available_processors(self, user=None):

processors = self.get_compatible_processors(user=user)

for analysis in self.children:
for analysis in self.get_children(instantiate_datasets=False):
if analysis.type not in processors:
continue

Expand Down Expand Up @@ -1591,6 +1619,18 @@ def warn_unmappable_item(self, item_count, processor=None, error_message=None, w
else:
# No other log available
raise DataSetException(f"Unable to map item {item_count} for dataset {closest_dataset.key} and properly warn")
@staticmethod
def get_dataset_by_key(key, db=None):
"""
Get dataset by key

:param str key: Dataset key
:return DataSet: Dataset
"""
if db is None:
config.with_db()
db = config.db
return DataSet(key=key, db=db)

def __getattr__(self, attr):
"""
Expand Down Expand Up @@ -1640,3 +1680,17 @@ def __setattr__(self, attr, value):

if attr == "parameters":
self.parameters = json.loads(value)

class ChildDataset:
"""
Allows for easy access to child some dataset attributes without instantiating them all
"""
def __init__(self, key, type):
self.key = key
self.type = type

def instantiate(self, db):
"""
Instantiates the dataset
"""
return DataSet(key=self.key, db=db)
27 changes: 26 additions & 1 deletion common/lib/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
Miscellaneous helper functions for the 4CAT backend
"""
import subprocess
import zipfile

import requests
import datetime
import smtplib
Expand Down Expand Up @@ -99,6 +101,13 @@ def sniff_encoding(file):
return "utf-8-sig" if maybe_bom == b"\xef\xbb\xbf" else "utf-8"


def get_html_redirect_page(url):
"""
Returns a html string to redirect to PixPlot.
"""
return f"<head><meta http-equiv='refresh' charset='utf-8' content='0; URL={url}'></head>"


def get_software_commit():
"""
Get current 4CAT commit hash
Expand Down Expand Up @@ -829,4 +838,20 @@ def _sets_to_lists_gen(d):
else:
yield k, v

return dict(_sets_to_lists_gen(d))
return dict(_sets_to_lists_gen(d))

def get_archived_file(archive_path, archived_file, temp_dir):
with zipfile.ZipFile(archive_path, "r") as archive_file:
archive_contents = sorted(archive_file.namelist())

if archived_file in archive_contents:
info = archive_file.getinfo(archived_file)
if info.is_dir():
raise IsADirectoryError("File is a directory")

archive_file.extract(archived_file, temp_dir)

return temp_dir.joinpath(archived_file)

else:
raise FileNotFoundError("File not found in archive")
10 changes: 2 additions & 8 deletions processors/machine-learning/pix-plot.py
dale-wahl marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from common.config_manager import config
from common.lib.dmi_service_manager import DmiServiceManager, DsmOutOfMemory, DmiServiceManagerException
from common.lib.helpers import UserInput, convert_to_int
from common.lib.helpers import UserInput, get_html_redirect_page
from backend.lib.processor import BasicProcessor

__author__ = "Dale Wahl"
Expand Down Expand Up @@ -227,7 +227,7 @@ def process(self):

# Results HTML file redirects to output_dir/index.html
plot_url = ('https://' if config.get("flask.https") else 'http://') + config.get("flask.server_name") + '/result/' + f"{os.path.relpath(self.dataset.get_results_folder_path(), self.dataset.folder)}/index.html"
html_file = self.get_html_page(plot_url)
html_file = get_html_redirect_page(plot_url)

# Write HTML file
with self.dataset.get_results_path().open("w", encoding="utf-8") as output_file:
Expand Down Expand Up @@ -362,12 +362,6 @@ def format_metadata(self, temp_path):
self.dataset.update_status("Metadata.csv created")
return metadata_file_path if rows_written != 0 else False

def get_html_page(self, url):
"""
Returns a html string to redirect to PixPlot.
"""
return f"<head><meta http-equiv='refresh' charset='utf-8' content='0; URL={url}'></head>"

def clean_filename(self, s):
"""
Given a string that points to a filename, return a clean filename
Expand Down
Loading