Skip to content

Evaluating ‘Graphical Perception’ with Multimodal Large Language Models

License

Notifications You must be signed in to change notification settings

raminguyen/LLMP2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Feb 16, 2025
9c4fb4e · Feb 16, 2025

History

49 Commits
Dec 14, 2024
Feb 16, 2025
Dec 20, 2024
Dec 20, 2024
Dec 14, 2024
Feb 13, 2025
Dec 14, 2024
Dec 14, 2024
Dec 20, 2024
Dec 14, 2024
Dec 20, 2024
Dec 20, 2024
Dec 20, 2024
Dec 20, 2024
Dec 20, 2024
Dec 20, 2024

Repository files navigation

image

We would like to investigate the answer by evaluating how Multimodal Large Language Models (MLLMs) perform when applied to graphical perception tasks.

How to answer this question?

To answer this research question, our study evaluates MLLMs' performance on graphical perception, inspired by two popular studies from Cleveland and McGrill, and Haehn et al. We incorporate our pretrained and fine-tuned MLLMs into our experiments. In particular, we compress fine-tuned models by training them on top of pretrained models, digging deep into their performance using efficient training configurations and scalable adaptation techniques to improve efficiency and save resources.

You can view our data for our results here.

Quick Intro

Multimodal Large Language Models (MLLMs) have remarkably progressed in analyzing and understanding images. Despite these advancements, accurately regressing values in charts remains an underexplored area for MLLMs.

What are our pretrained MLLMs?

We used the latest Multimodal Large Language Models technology, such as GPT-4o, Gemini 1.5 Flash, Gemini Pro Vision, and Llama 3.2B Vision.

What are our finetuned MLLMs?

We also produced 15 fine-tuned MLLMs (each experiment has three fine-tuned MLLMs) for this study, and built on top of Llama 3.2 Vision.

image

🟢 What is our direction for this study?

Our study compares the performance of fine-tuned and pretrained models to see which models perform better. We also desire to understand where aspects of MLLMs succeed or fail in the data visualization fields.

We recreated stimuli for five experiments based on Cleveland and McGill’s foundational 1984 study, comparing those results with human task performance.

image

What are the special features of our study?

We integrated the most advanced packages into our study:

  • Pytorch
  • Pytorch Lighting
  • Hugging Face Transformers
  • Distributed Data Parallel
  • Vision Encoder-Decoder Models
  • Parameter-Efficient Fine-tuning
  • Low-Rank Adaptation of Large Language Models
  • Bits-per-Bite

We also designed a comprehensive script, and everything in one place, starting from generating datasets, fine-tuned models, and evaluating their MLLMs performances to interpret and analyze visual data.

Installation

Prerequisites

You'll need to have the following installed:

  • Python 3.10 or higher
  • Pip
  • Conda

Setup

After you forked and cloned our project into your repo, we recommend using Conda to set up a new environment:

conda env create -f conda-environment.yml
pip install -r pip-requirements.txt

After installing all packages and dependencies:

  1. Navigate to the LLMP directory:
cd LLMP
  1. Create a .env file with your API keys:
chatgpt_api_key="add your API key"
gemini_api_key="add your API key"
  1. Navigate to the experiment directory:
cd LLMP/EXP/EXP1
  1. Run the experiment:
python EXP1fullprogressrun.py

We use Experiment 1 for this demonstration.

Authors and Acknowledgements

We would like to thank:

  • Daniel Haehn for helping us build the foundation for this research and advising us to integrate the latest technology into our paper.
  • Kenichi Maeda and Masha Geshvadi for their assistance in writing and reviewing multiple scripts and building up our paper together.

We have also learned a great deal from this project, spanning #programming, #datavisualization, #imageprocessing, #machinelearning, and #computergraphics, and applied these insights to our study. Most importantly, we all contributed to this CS460 - Computer Graphics project at the University of Massachusetts Boston. Here's the link to the course website: CS460.org.

Releases

No releases published

Packages

No packages published