-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathanchor.json
8402 lines (8402 loc) · 621 KB
/
anchor.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[
{
"unique_idx": 2,
"image": "./data/images/2.jpg",
"question": "Identify the objects in the image and determine their potential function.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/F-jIjPIX0AAz_Z5?format=jpg&name=small",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the objects in the image and determine their potential function.",
"answer": "In the image, there is a group of seven pens placed inside a clear cup. These pens come in various colors, making them a visually pleasing assortment. The collection of pens in the cup might be a part of a creative workspace, meant for individuals who need to write, draw, or take notes frequently. Such colorful pens can also be used for decorative purposes, adding a touch of color and fun to the environment.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 3,
"image": "./data/images/3.jpeg",
"question": "Identify the object in the image and indicate the use case of the item.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/F0Xsp38WYBE77te?format=jpg&name=large",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the object in the image and indicate the use case of the item.",
"answer": "The image features a small blue, round machine with wheels, called a floor scraper, standing in a building with wood pallets. The floor scraper is designed for removing dirt, debris, and other small items from floors, which helps to maintain cleanliness and hygiene in the area. In this particular image, the machine is sitting on a dirt floor, possibly at the end of a workday, with the work done for the moment.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 6,
"image": "./data/images/6.jpg",
"question": "Identify the fish-like object in the image and indicate the use case of the item.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/F-qp042WcAEcr7-?format=jpg&name=large",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the fish-like object in the image and indicate the use case of the item.",
"answer": "In the image, there is a stuffed fish-like object lying on the bed next to the pillows. This object, which appears to be a stuffed crocodile, serves as a decorative piece or a toy. It might be a comforting item for the person who sleeps in the bed, adding a touch of personalization and warmth to the space. Additionally, it can also serve as a toy for a child or a simple collectible item for an individual who enjoys such objects.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 5,
"image": "./data/images/5.jpg",
"question": "Identify the object in the image and point out the difference between it and the general fan.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/F-JzosaW4AAGm1E?format=jpg&name=small",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the object in the image and point out the difference between it and the general fan.",
"answer": "The image features a portable space heater, often referred to as a small stove, sitting next to a black control panel. The heater has a circular fan next to it that can be used for direct heating of a small area. This is different from a general fan, as the heater is designed for providing warmth and comfort instead of just circulating air or cooling. The heater and the fan are both black and compact, making them suitable for personal use and travel. However, the heater is specifically designed for heating purposes and is not a general appliance like a regular fan.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 4,
"image": "./data/images/4.jpg",
"question": "Identify the objects in the image and determine the functionality of each socket.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/F-VliuDbAAAF4tz?format=png&name=large",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the objects in the image and determine the functionality of each socket.",
"answer": "In the image, there is a small box with several sockets displayed on it. One socket appears to be a USB plug, which can be used for charging or connecting devices like smartphones or external hard drives to a computer. Another socket is a microSD card slot, which is designed to provide a secure and convenient method for storing and accessing data on a removable, high-capacity memory card. A third socket is a headphone jack, which typically allows for the connection of headphones or speakers to a device for audio output, with the ability to play or listen to sound without disturbing others.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 1,
"image": "./data/images/1.jpg",
"question": "Identify the white objects in the image and determine their functionality.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/FmuQWeIaMAAqJ08?format=jpg&name=medium",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the white objects in the image and determine their functionality.",
"answer": "The image features a group of white snowmen, or snowflakes, and other white objects, such as cups and vases, arranged on a wooden dining table. These objects are designed as decorative items for the table, adding an element of festivity and charm to the setting. The snowflakes and snowmen may be part of a theme or an aesthetic choice, while the cups and vases serve as functional items for holding beverages or displaying flowers, respectively. Overall, the main focus of the image is the creative and decorative arrangement of these white objects on the tabletop.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 13,
"image": "./data/images/13.jpeg",
"question": "The image shows a receipt from 'Receiptify', an application that generates a music listening receipt based on a user's listening history. Based on the image, what is the number of times the song Future(Lollapalooza ver.) was played?",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://pbs.twimg.com/media/F91QfgiWgAAphUD?format=jpg&name=large",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "The image shows a receipt from 'Receiptify', an application that generates a music listening receipt based on a user's listening history. Based on the image, what is the number of times the song Future(Lollapalooza ver.) was played?",
"answer": "The number of times the song Future(Lollapalooza ver.) was played is 3.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 0,
"image": "./data/images/0.jpeg",
"question": "Identify the primary structure in the image and describe its architectural style and features.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/GEbgySCbkAAJGyL?format=jpg&name=medium",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the primary structure in the image and describe its architectural style and features.",
"answer": "The primary structure in the image is the Brooklyn Bridge. Brooklyn Bridge is an iconic suspension bridge in New York City, built in 1883. It spans over the East River, connecting the borough of Brooklyn with Manhattan. The bridge has a distinctive twin-tower design, with two towers at the abutments on each end of the bridge. These towers house the cable supports and carry the weight of the suspended bridge deck, which is suspended from them. The bridge's architectural style is a blend of Gothic and Romanesque influences, making it an important historical and cultural landmark.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 10,
"image": "./data/images/10.jpg",
"question": "Which dishes can have extra ingredients added, and what is the cost of adding ingredients?",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://pbs.twimg.com/media/F-Qz_HoXQAAnic9?format=jpg&name=large",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Which dishes can have extra ingredients added, and what is the cost of adding ingredients?",
"answer": "Several dishes on the menu allow for the addition of extra ingredients. These dishes include the spaghetti carbonara and the penne alla chiarra. The cost of adding extra ingredients for these dishes is either 50p or 70p, depending on the dish. The price can also vary depending on the specific ingredients added and their portions, so it is best to confirm the cost with the restaurant.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 8,
"image": "./data/images/8.jpeg",
"question": "Identify the object in the image and describe its usage.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/F_o2tYAX0AA7B9O?format=jpg&name=medium",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the object in the image and describe its usage.",
"answer": "The object in the image is a large lens, possibly the eyepiece of a projector, mounted on a ceiling. The lens is situated next to a fan, suggesting that it is likely connected to a projector unit, which is used for displaying images or videos onto a screen or surface.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 20,
"image": "./data/images/20.jpg",
"question": "Identify the possible badminton stroke being performed by the player.",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://pbs.twimg.com/media/DOL9khuVQAAcTdU?format=jpg&name=small",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the possible badminton stroke being performed by the player.",
"answer": "The possible badminton stroke being performed by the player is a serve. The man is seen jumping while holding a racquet, which is a common technique used in serving to generate more power and precision in the shot. The serve is an important part of a badminton game, as it initiates play and sets the tone for the rest of the rally.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 9,
"image": "./data/images/9.jpeg",
"question": "Identify the transportation in the picture and distinguish it from traditional means of transportation.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/F_1pcqcbUAAAVt6?format=jpg&name=medium",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the transportation in the picture and distinguish it from traditional means of transportation.",
"answer": "The transportation in the picture is a train traveling across a large bridge that spans over a lush green field. This is an example of modern transportation infrastructure, which differentiates it from more traditional means of transportation, such as boats, horses, or pedestrian travel. The bridge is built to accommodate and facilitate efficient transportation of passengers and goods over the vast landscape, providing a convenient and reliable connection between different parts of the country, connecting remote areas, and supporting economic growth. The presence of the train moving swiftly through the grassy landscape illustrates the capabilities and advantages of modern transportation methods.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 11,
"image": "./data/images/11.jpg",
"question": "Recognize the text in the image and the implicit message conveyed.",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://images.unsplash.com/photo-1541419519724-45f1964fc4b0?w=500&auto=format&fit=crop&q=60&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxzZWFyY2h8MTF8fHNsb2dhbnxlbnwwfHwwfHx8Mg%3D%3D",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Recognize the text in the image and the implicit message conveyed.",
"answer": "The text in the image is a saying or a motivational quote, which reads \"Life is a Straight Road to Success.\" The message conveyed by this quote is that success is a direct and clear pathway, and that with determination and effort, one can achieve success in life. The red background and bold lettering emphasize the importance and positivity of the message.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 27,
"image": "./data/images/27.jpeg",
"question": "What are people in the image doing?",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://pbs.twimg.com/media/F_-CADFWYAAIRJr?format=jpg&name=medium",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "What are people in the image doing?",
"answer": "The people in the image are practicing dance moves, possibly yoga or another form of exercise, in a room.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 12,
"image": "./data/images/12.jpeg",
"question": "Assume a passenger bought an air ticket to Boston with Delta Airline on flight number 5706, which gate should the passenger go to wait for the flight",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://pbs.twimg.com/media/FmM62WNWYAMC0EB?format=jpg&name=large",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Assume a passenger bought an air ticket to Boston with Delta Airline on flight number 5706, which gate should the passenger go to wait for the flight",
"answer": "Based on the image, the departure board at the National Airport lists gate numbers and flight times. To locate the correct gate for flight number 5706 to Boston with Delta Airline, you should find the board that indicates the specific gate assignment for Delta Airline flights. The board provides information on the departure time, gate number, and the airline's code. By following the Delta Airline codes and looking for the flight number 5706, the passenger can find the correct gate to wait for their flight.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 7,
"image": "./data/images/7.jpg",
"question": "Identify the object in the hands of the anime character and briefly describe the use and techniques of this object.",
"meta": {
"level": "Perception/Remembering",
"capability": "General Object Recognition",
"url": "https://pbs.twimg.com/media/F-XKNDIXcAAf0f4?format=jpg&name=small",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the object in the hands of the anime character and briefly describe the use and techniques of this object.",
"answer": "The anime character is holding a trombone, which is a musical instrument used to produce sound by vibrating the air inside it. The trombone is made of a long metal tube, with a flared bell at the end, and has a sliding mechanism that allows the player to change notes. The trombone uses brass and slide techniques to produce various tones and notes, making it a versatile instrument in different musical genres such as jazz, classical, and military marching bands.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 15,
"image": "./data/images/15.jpg",
"question": "Solve the problem in the image and provide a detailed explanation.",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://th.bing.com/th/id/R.2191baa175e367378ee23033135bb00d?rik=qWn49SGq1phjiQ&riu=http%3a%2f%2f3.bp.blogspot.com%2f-uVOu4ngGfCk%2fVK7Y0P7pf5I%2fAAAAAAAAF8A%2fCM8ejP3rddM%2fs1600%2fIMG_7566.JPG&ehk=nqJnhjhTbK6I4mP3d8UqckEwn0GtMLnLQSN8jba4ijs%3d&risl=&pid=ImgRaw&r=0",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Solve the problem in the image and provide a detailed explanation.",
"answer": "The image features a piece of paper with handwritten text describing the movements of people at a zoo. It appears to be a timeline or a summary of people's visits to the zoo. The paper contains several times, such as 11:30 and 7:30, along with numbers of people visiting at these times. Additionally, there is a question at the end of the text that appears to be related to the overall content. Based on the information available, it's difficult to provide a more detailed explanation without additional context.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 22,
"image": "./data/images/22.jpeg",
"question": "Identify the action taking place in the image.",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://pbs.twimg.com/media/F_6Gh81WIAAHtCD?format=jpg&name=medium",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the action taking place in the image.",
"answer": "In the image, a woman is getting her blood pricked by a nurse, likely for medical purposes.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 16,
"image": "./data/images/16.jpg",
"question": "List all films from Italy in the image.",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://i.ytimg.com/vi/JDtgQCRj3aQ/maxresdefault.jpg",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "List all films from Italy in the image.",
"answer": "\"Foreign Film Festival\" features several Italian films on its poster. They include:\n\n1. \"Boccaccio '70\"\n2. \"Charade\"\n3. \"The Victors\"\n4. \"La Dolce Vita\"\n5. \"Never on Sunday\"\n6. \"Les Liaisons Dangereuses\" (adapted from the French novel)\n7. \"Baby Elephant\"\n8. \"Sunday\"\n\nThese films showcase the diversity and richness of Italian cinema.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 18,
"image": "./data/images/18.jpg",
"question": "Determine the competition information in the picture, such as stage of the game, team information, etc.",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://pbs.twimg.com/media/F-tTyHGaYAAsFS8?format=jpg&name=medium",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Determine the competition information in the picture, such as stage of the game, team information, etc.",
"answer": "The image displays a trophy sitting next to a TV screen, which is showing the scoreboard during a game. There are two teams playing in the game, as indicated by the presence of two logos or team names. The competition appears to be a tournament, as there is a suggestion of a series of games being played. The scoreboard indicates that \"JD Gaming\" is in the lead, while \"T1\" is trailing behind. The game seems to be at an advanced stage, with only a few minutes remaining. The trophy in the foreground adds to the competitive atmosphere of the scene.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 19,
"image": "./data/images/19.jpg",
"question": "Extract all text data from the boarding pass and organize them into JSON format, including passenger name, flight number, departure and destination cities, date, gate, boarding time, and seat number.",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://static.turbosquid.com/Preview/2016/06/03__14_24_55/2.pngf5b386e3-37f4-44b1-8af4-ffe6332cac24Zoom.jpg",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Extract all text data from the boarding pass and organize them into JSON format, including passenger name, flight number, departure and destination cities, date, gate, boarding time, and seat number.",
"answer": "{\n\"from\": \"Dee John\",\n\"to\": \"OKL22\",\n\"flight\": \"OKL22\",\n\"date\": \"11/2/2018\",\n\"economy\": true,\n\"flight\\_class\": \"Economy\",\n\"boarding\\_time\": \"10:30\",\n\"gate\": \"Gate 10\",\n\"passenger\\_name\": \"Dee John\",\n\"departure\\_city\": \"Glasgow\",\n\"destination\\_city\": \"London\",\n\"seat\\_number\": \"26A\"\n}",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 25,
"image": "./data/images/25.jpg",
"question": "What is the woman doing? Why does she hold a microphone?",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://www.johornow.com/wp-content/uploads/sites/2/2019/01/2-min-6.jpg",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "What is the woman doing? Why does she hold a microphone?",
"answer": "The woman is singing with her mouth wide open while holding a microphone. She is likely at a concert or a live event, where she is the main performer. Holding a microphone is essential for amplifying her voice so that the audience can hear her clearly and loudly. The woman's dramatic performance, combined with her full-mouthed singing, enhances the energy and excitement of the event, engaging the audience and creating a memorable experience.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 29,
"image": "./data/images/29.jpeg",
"question": "Identify the activity taking place, why are they pulling the rope?",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://pbs.twimg.com/media/F_9x7nnXwAEA5Qm?format=jpg&name=large",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the activity taking place, why are they pulling the rope?",
"answer": "The image shows a group of people, including children and teenagers, standing on a field of grass and pulling a rope. This activity seems to be an organized team-building event or an organized game where participants need to work together to achieve a common goal. The exact purpose of pulling the rope is unclear; it could be related to a tug-of-war, an obstacle course, or a symbolic gesture. It is evident that the activity requires cooperation and coordination among the participants, which aims to foster a sense of camaraderie and mutual understanding among the team members.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 23,
"image": "./data/images/23.jpg",
"question": "Identify the phase of the hunting sequence the eagle is in.",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://ts1.cn.mm.bing.net/th/id/R-C.3565e58e91a5501b485c07252d38b329?rik=dKK0%2bx%2bFPsUPlw&riu=http%3a%2f%2fwallpapercave.com%2fwp%2fgqzKrvU.jpg&ehk=Z70rSlewwJx9Au6yiPgsucngwxkFXqJ5EhWocrWPNss%3d&risl=&pid=ImgRaw&r=0",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the phase of the hunting sequence the eagle is in.",
"answer": "The eagle is in the phase where it is either preparing to catch its prey or has just captured it. In the image, the eagle is in flight and has its wings fanned out, which indicates that it is actively engaged in its hunting process. The eagle could be in the initial stage where it is scanning the surroundings, or it might be in the final stage where it has successfully grabbed its prey and is preparing to return to its nest or another safe location. Eagles typically have exceptional eyesight and precision, making them highly adept at capturing their prey during these flying sequences.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 34,
"image": "./data/images/34.jpeg",
"question": "Critique the visual elements of the logo on the cup, such as color, shape, and imagery, and their effectiveness in conveying the company's ethos and appeal to its target demographic.",
"meta": {
"level": "Perception/Remembering",
"capability": "Symbol Recognition",
"url": "https://pbs.twimg.com/media/F_a-guGaQAAWEGO?format=jpg&name=small",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Critique the visual elements of the logo on the cup, such as color, shape, and imagery, and their effectiveness in conveying the company's ethos and appeal to its target demographic.",
"answer": "The logo on the Starbucks cup is represented by the company's iconic mermaid, which is stylized and colorful. The mermaid figure is depicted with a crown on her head, and the cup itself is red, featuring a white lid. This distinctive design and color combination make the Starbucks logo visually memorable and easily recognizable to its target demographic.\n\nBy incorporating a mermaid figure, the logo connects with the imaginative and magical aspects of the brand's coffee experience. The crown on the mermaid's head emphasizes the brand's premium and luxurious positioning. The red cup and white lid provide a bold contrast, which further draws attention to the logo. Additionally, the cup's bright color and the logo's playful imagery make it appealing to a diverse range of customers, from families to young adults. Overall, the color, shape, and imagery of the Starbucks logo effectively convey the company's ethos and appeal to its target demographic by incorporating elements of luxury, imagination, and playfulness.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 41,
"image": "./data/images/41.jpeg",
"question": "What ingredients are used to make the dish shown in the picture?",
"meta": {
"level": "Perception/Remembering",
"capability": "Food Recognition",
"url": "https://pbs.twimg.com/media/GAJcNJ7bkAAjtBI?format=jpg&name=medium",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "What ingredients are used to make the dish shown in the picture?",
"answer": "The dish shown in the picture contains corn and lime as key ingredients.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 30,
"image": "./data/images/30.jpg",
"question": "Infer the brand identity or values that the logo aims to represent, considering elements such as motifs, stylization, and complexity.",
"meta": {
"level": "Perception/Remembering",
"capability": "Symbol Recognition",
"url": "https://images.unsplash.com/photo-1654356202106-dd2d6e09a176?q=80&w=2574&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Infer the brand identity or values that the logo aims to represent, considering elements such as motifs, stylization, and complexity.",
"answer": "The logo in the image is a stylized and complex emblem of a car brand. The emblem features a stylized lion, possibly representing the strength, power, and elegance associated with the brand. The motif adds an element of sophistication and luxury to the brand identity. The logo's distinctive visuals may suggest a focus on creating a strong and memorable image for the car brand, which can be appealing to potential customers seeking high-quality and stylish vehicles.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 26,
"image": "./data/images/26.jpg",
"question": "Evaluate the writing posture of the girl in the picture and make some suggestions.",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://writereflections4u.com/wp-content/uploads/1-Girl-Writing.jpg",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Evaluate the writing posture of the girl in the picture and make some suggestions.",
"answer": "The girl in the photo appears to have her elbow resting on the desk while writing. While this posture may provide additional support and comfort, it is not ideal for maintaining a proper and healthy writing posture. For better posture, the girl should consider adopting one of the following suggestions:\n\n1. Hold the pen or pencil in both hands and use a firm grip to minimize strain on the wrist and fingers.\n2. Place the notebook or book on a flat surface, like a desk or a lap, allowing the writing surface to be easily accessible.\n3. Keep the elbows slightly bent and maintain a relaxed, neutral position while writing.\n4. Maintain a comfortable position for the wrists and avoid bending them excessively while writing.\n\nBy implementing these suggestions, the girl can improve her writing posture, reduce the risk of strain or discomfort, and maintain a healthy and efficient writing experience.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 32,
"image": "./data/images/32.jpg",
"question": "What type of symbol is this? Identify all elements in the symbol and describe their meanings.",
"meta": {
"level": "Perception/Remembering",
"capability": "Symbol Recognition",
"url": "NaN",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "What type of symbol is this? Identify all elements in the symbol and describe their meanings.",
"answer": "The symbol is a recycle logo, also known as the recycling arrows symbol. The symbol is usually represented by three chasing arrows forming a triangle, with the words \"HDPE\" present below the arrows. The three chasing arrows signify the process of recycling, with each arrow representing a different stage: breaking down the used material, melting it down, and remolding it into new products. The \"HDPE\" stands for high-density polyethylene, which is a type of plastic commonly used for bottles and containers. This symbol is used to indicate the recyclability of HDPE materials and encourages people to recycle items made of this type of plastic.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 14,
"image": "./data/images/14.jpeg",
"question": "Extract the definitions (def) and examples (ex) from the notes, and record the formulas using LaTeX.",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://mzucker.github.io/images/noteshrink/notesA1.jpg",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Extract the definitions (def) and examples (ex) from the notes, and record the formulas using LaTeX.",
"answer": "def A: A transformation matrix A is an orthogonal matrix, meaning it has a determinant of 1 and its columns and rows are orthonormal.\nex: A transformation matrix is a fundamental tool in linear algebra and geometry, as it can transform points, vectors, and other geometric objects from one coordinate system to another.\n\ndef IFFT: The inverse of a forward Fourier Transform, which converts a time-domain signal back to its frequency components.\nex: IFFT is crucial in various applications, such as signal processing, digital image processing, and communication systems.\n\ndef If: An integral transform, which converts a function from the time domain to the frequency domain.\nex: If is essential in transforming periodic signals, such as those encountered in control systems and telecommunication networks.\n\ndef \u222bA: The integral of a function A over a specific range.\nex: \u222bA is a mathematical operation used in calculus, physics, and engineering to determine the area under the curve of a function.\n\ndef M: A matrix M is an orthogonal matrix, meaning it has a determinant of 1, and its columns and rows are orthonormal.\nex: M is used in linear algebra to represent transformations and transformations matrices.\n\ndef 2: A scalar quantity equal to 2.\nex: 2 is a fundamental number in mathematics and science, often used as a constant or a base for calculating other quantities.\n\ndef 2+I: A complex number, with a real part of 2 and an imaginary part of 1.\nex: 2+I is an important element in complex analysis, which can be used to represent rotations and other transformations in a two-dimensional space.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 48,
"image": "./data/images/48.jpg",
"question": "which country does this type of dish most likely originated from?",
"meta": {
"level": "Perception/Remembering",
"capability": "Food Recognition",
"url": "https://images.pexels.com/photos/884596/pexels-photo-884596.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "which country does this type of dish most likely originated from?",
"answer": "The dish in the image, featuring various food items such as rice, breaded chicken, coleslaw, and shrimp, along with a wooden tray, likely originated from Japan or a country influenced by Japanese cuisine. The use of a rectangular wooden tray and the presentation of the food with rice, chicken, and shrimp are common characteristics of Japanese bento box lunches, which often include a balance of protein, vegetables, and carbohydrates. These bento box lunches are a popular part of Japanese meal traditions and are known for their artful and visually appealing presentation.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 36,
"image": "./data/images/36.jpg",
"question": "The image is the logo of an NBA team: the Golden State Warriors. Identify the elements in the image and discuss how the elements of the logo in the image influence recognition and public perception of the team.",
"meta": {
"level": "Perception/Remembering",
"capability": "Symbol Recognition",
"url": "https://ts1.cn.mm.bing.net/th/id/R-C.79317048dbaba065dc182b8a04f31586?rik=OKFMQykH%2fkfzFA&riu=http%3a%2f%2flogos-download.com%2fwp-content%2fuploads%2f2016%2f04%2fGolden_State_Warriors_logo_alternative.png&ehk=7vyjHjNjtMpjM0fSlcnxf5PnHGhAcA52rYG7vP6D0QU%3d&risl=&pid=ImgRaw&r=0",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "The image is the logo of an NBA team: the Golden State Warriors. Identify the elements in the image and discuss how the elements of the logo in the image influence recognition and public perception of the team.",
"answer": "The logo of the Golden State Warriors prominently features their team mascot, a blue and yellow bridge, which is a distinct symbol of the city of San Francisco. The bridge's design and color scheme, along with the team's name, instantly identify the NBA franchise associated with the logo. \n\nThis logo design elements play a significant role in public perception and recognition of the team. The team's mascot, the Golden Gate Bridge, is a notable landmark and a symbol of the city, which can evoke a strong sense of pride in fans and residents alike. By incorporating this iconic symbol into their logo, the team is able to create a strong connection with their community and establish a unique identity within the NBA. Additionally, the consistent use of blue and yellow colors in the logo enhances the team's visual identity and makes it easily recognizable among other teams' logos. In conclusion, the elements of the logo work together to create a memorable symbol that reflects the team's values, history, and connection with its fans and the city of San Francisco.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 21,
"image": "./data/images/21.jpeg",
"question": "Identify the exercise in the image and briefly introduce the technical essentials of this excercise.",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://pbs.twimg.com/media/FsMXf3iaIAApvHQ?format=jpg&name=4096x4096",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the exercise in the image and briefly introduce the technical essentials of this excercise.",
"answer": "The image features a man performing a weighted bench press, which is a strength training exercise. The man is lifting a weight (barbell) above his head and lowering it behind his chest, pressing it down and then pushing it up again, working on his upper body muscles. The technical essentials of the bench press include maintaining proper form, engaging the correct muscle groups, and lifting the weight with a controlled motion, while keeping the focus on safety and avoiding excess strain on the body.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 43,
"image": "./data/images/43.jpg",
"question": "what is the green bean-like vegetable in the bowl?",
"meta": {
"level": "Perception/Remembering",
"capability": "Food Recognition",
"url": "https://images.pexels.com/photos/15913452/pexels-photo-15913452.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "what is the green bean-like vegetable in the bowl?",
"answer": "The green bean-like vegetable in the bowl is snap peas.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 37,
"image": "./data/images/37.jpg",
"question": "Decompose the logo into its constituent parts and based on the elements, make an hypothesis on the type of team or organization it represents.",
"meta": {
"level": "Perception/Remembering",
"capability": "Symbol Recognition",
"url": "https://static.vecteezy.com/system/resources/previews/000/674/154/original/shark-mascot-logo-vector.jpg",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Decompose the logo into its constituent parts and based on the elements, make an hypothesis on the type of team or organization it represents.",
"answer": "The logo features a shark with a scissor in its mouth. This unique combination could represent a business or organization that has a connection to the marine life (shark) and cutting-edge technology or precision (the scissor). One possible hypothesis is that the logo might represent a shark-related business, such as a diving or marine tours operator, or an aquarium or marine research facility. Alternatively, it could represent a business or organization that provides cutting-edge or precise services in an industry that is distinct from marine life, such as medical or beauty services.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 44,
"image": "./data/images/44.jpg",
"question": "What is the red berry in the smoothie bowl in the image? How is it different from the dark purple berry in the bowl? ",
"meta": {
"level": "Perception/Remembering",
"capability": "Food Recognition",
"url": "https://images.pexels.com/photos/1099680/pexels-photo-1099680.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "What is the red berry in the smoothie bowl in the image? How is it different from the dark purple berry in the bowl? ",
"answer": "The red berry in the smoothie bowl is the strawberry, whereas the dark purple berry is the acai berry. Both of these berries are included in the smoothie alongside other fruits, such as blueberries, blackberries, and bananas.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 39,
"image": "./data/images/39.jpg",
"question": "Analyze the crest for symbolic meanings, researching historical or cultural meanings to the elements on the logo, and provide an assessment of what organizations or entities might use such a crest.",
"meta": {
"level": "Perception/Remembering",
"capability": "Symbol Recognition",
"url": "https://static.igem.wiki/teams/4469/wiki/cuhk-logo.png",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Analyze the crest for symbolic meanings, researching historical or cultural meanings to the elements on the logo, and provide an assessment of what organizations or entities might use such a crest.",
"answer": "The crest features a decorative design with both yellow and purple elements, along with a variety of writing, possibly in Japanese. The crest shows a bird, a dragon or a lion, and a sword. These symbols can represent different meanings, depending on the context in which they are used.\n\nThe bird likely symbolizes freedom, strength, or determination, while the dragon is often associated with power, courage, and good fortune. The lion is strong, brave, and represents leadership qualities. The sword can symbolize protection, courage, or authority. \n\nRegarding organizations that might use such a crest, it could be associated with martial arts, sports teams, military groups, or companies that provide security or protective services. The presence of Japanese writing and the specific design elements might indicate a connection to Japanese culture, art, or history. In any case, the crest is a rich and complex symbol, reflecting a combination of positive attributes and qualities.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 50,
"image": "./data/images/50.jpg",
"question": "Which historical event or period is the landmark in the image usually associated with?",
"meta": {
"level": "Perception/Remembering",
"capability": "Landmark Recognition",
"url": "https://pbs.twimg.com/media/F-aIqMwbAAA2yfI?format=jpg&name=small",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Which historical event or period is the landmark in the image usually associated with?",
"answer": "The landmark in the image, the Eiffel Tower, is usually associated with the late 19th and early 20th centuries. This period is when the tower was built and it has since become an iconic symbol of Paris, France. The Eiffel Tower, which is a massive metal structure, was completed in 1889 and has since become a popular tourist attraction and historical marker for the city of Paris.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 17,
"image": "./data/images/17.jpg",
"question": "Organize the extracted text into JSON format, with separate fields for the album title, artist, track name, guest artist, and track durations.",
"meta": {
"level": "Perception/Remembering",
"capability": "OCR",
"url": "https://i.ytimg.com/vi/EFfUhtEuPWQ/maxresdefault.jpg",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Organize the extracted text into JSON format, with separate fields for the album title, artist, track name, guest artist, and track durations.",
"answer": "{\n\"album\\_title\": \"The Jersey Crime Boss Mack-Nice\",\n\"artist\": \"Mack-Nice\",\n\"tracks\": [\n{\n\"name\": \"Hate It or Love It\",\n\"guest\\_artist\": \"The Game\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"Fresstyle\",\n\"guest\\_artist\": \"Big Mace\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"Grinding\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"We Got It\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"Grab That\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"Ya Know\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"South Jersey Girl\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"No Problems\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"Dirty Money\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"Choose\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"16\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"Movin Pac\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"23\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"Black Divas\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"10\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"Got Locked Up\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n},\n{\n\"name\": \"My Black Diva\",\n\"guest\\_artist\": \"Radio B.\",\n\"duration\": \"3:41\"\n}\n]\n}",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 28,
"image": "./data/images/28.jpg",
"question": "Identify the type of dance based on dancers' attire and movement.",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://ts1.cn.mm.bing.net/th/id/R-C.ab63170b3261c142f8d20229562f1276?rik=q4H0%2bMinzLydLw&riu=http%3a%2f%2fjohndavidmann.com%2fwp-content%2fuploads%2f2014%2f05%2fDancers.jpg&ehk=KaWjMPFbK36v1xqCrT5ruSYUOoiwnJTG9PiG%2bWkpzvs%3d&risl=&pid=ImgRaw&r=0",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the type of dance based on dancers' attire and movement.",
"answer": "The type of dance depicted in the image is ballet, as the male and female dancers are wearing black clothing and demonstrating graceful movements characteristic of ballet. The man is lifting the woman in a split while standing, and she is reaching upwards with her arms, which are typical ballet positions and techniques. The dancers' black attire also suggests a formal or professional performance, further supporting the notion that they are participating in a ballet routine.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 55,
"image": "./data/images/55.jpg",
"question": "Identify the landmark in the image and describe its historical and social meanings.",
"meta": {
"level": "Perception/Remembering",
"capability": "Landmark Recognition",
"url": "https://images.pexels.com/photos/290386/pexels-photo-290386.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the landmark in the image and describe its historical and social meanings.",
"answer": "The landmark in the image is the Statue of Liberty, which is a well-known and iconic representation of freedom and democracy in the United States. It is located on Liberty Island in New York Harbor, New York City. Completed in 1886, the Statue of Liberty serves as an enduring symbol of the country's values and identity. The statue shows a woman holding a torch in one hand and an open book in the other, and is often referred to as \"Lady Liberty.\" The statue is not only an engineering marvel but also carries deep symbolic meanings, such as the welcoming of immigrants to the United States and the celebration of the country's ideals of freedom and equality. The image shows the statue against a cloudy sky, further emphasizing its historical and cultural significance.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 46,
"image": "./data/images/46.jpeg",
"question": "What are the dishes in the image?",
"meta": {
"level": "Perception/Remembering",
"capability": "Food Recognition",
"url": "https://pbs.twimg.com/media/GAjI_JnbQAA8mX4?format=jpg&name=large",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "What are the dishes in the image?",
"answer": "The dishes in the image are a spring roll with sauce and a piece of bread covered in sauce or gravy, sitting on plates.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 51,
"image": "./data/images/51.jpg",
"question": "What is the building in the picture? To which period can its construction be traced back to?",
"meta": {
"level": "Perception/Remembering",
"capability": "Landmark Recognition",
"url": "https://pbs.twimg.com/media/F-WzbiZXcAAyH7h?format=jpg&name=medium",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "What is the building in the picture? To which period can its construction be traced back to?",
"answer": "The building in the picture is an ancient, large, and historic structure, likely a castle or a fortress. Its construction can be traced back to the Xi'an period in China, which dates back to around the 11th century. It has a unique design, featuring a high stone wall lining its side and a distinctive stepped appearance. The steepness of the wall and its impressive height make it a fascinating sight and a testament to the architectural and engineering prowess of the time.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 24,
"image": "./data/images/24.jpg",
"question": "Determine the swimming stroke being used by examining the position of the swimmer's hands and head.",
"meta": {
"level": "Perception/Remembering",
"capability": "Action Recognition",
"url": "https://shoppinginromania.com/sites/default/files/2017-09/635988002103636108-1298978663_swimmer.jpg",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Determine the swimming stroke being used by examining the position of the swimmer's hands and head.",
"answer": "Based on the image, the swimmer is in the process of taking a stroke while swimming in the water. He is wearing glasses and a white cap. His left arm is extended, and his head is positioned to the side, which suggests that he is using a breaststroke technique to swim. The breaststroke is a swimming stroke where swimmers use their arms and chest muscles to propel themselves through the water in a pattern resembling a breaststroke motion.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 57,
"image": "./data/images/57.jpg",
"question": "What is the significance of the temple shown in the image?",
"meta": {
"level": "Perception/Remembering",
"capability": "Landmark Recognition",
"url": "https://images.pexels.com/photos/161401/fushimi-inari-taisha-shrine-kyoto-japan-temple-161401.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "What is the significance of the temple shown in the image?",
"answer": "The significance of the temple in the image can be inferred from its impressive and intricate design elements, such as the red and gold color scheme, the presence of statues, dragons, and lanterns, and the large staircases leading up to its entrance. These features suggest that the temple is an important place of worship and reflection for the local community, possibly serving as a landmark or tourist attraction. The fact that many people are present in the image, walking up and down the stairs, showcases the temple's cultural value and attraction, further emphasizing its importance in the community.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 31,
"image": "./data/images/31.png",
"question": "Identify the sign in the image. What does this sign mean?",
"meta": {
"level": "Perception/Remembering",
"capability": "Symbol Recognition",
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/MUTCD_R9-4.svg/1024px-MUTCD_R9-4.svg.png",
"question type": "compound"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Identify the sign in the image. What does this sign mean?",
"answer": "The sign in the image is a \"no thumbs up\" or \"no thumbs up to you\" sign, indicating that someone is not giving their approval or support. The thumbs up symbol is being crossed out, suggesting that it is not allowed or is actively discouraged in this particular context.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 58,
"image": "./data/images/58.jpg",
"question": "To which country should I travel to visit the fort shown in the image?",
"meta": {
"level": "Perception/Remembering",
"capability": "Landmark Recognition",
"url": "https://images.pexels.com/photos/815880/pexels-photo-815880.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2",
"question type": "close-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "To which country should I travel to visit the fort shown in the image?",
"answer": "To visit the fort shown in the image, you should travel to India. The fort is located in the city of Jaipur in Rajasthan, which is known for its beautiful sandstone architecture and historical sites. The fort is a UNESCO World Heritage Site and a prominent tourist destination. In the image, you can see people walking near the fortress, and an ornate camel is traveling nearby. This scenery is typical for the region, where camels are commonly used for transportation and tourism purposes.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 53,
"image": "./data/images/53.jpeg",
"question": "Please describe the architectural style and possible design concept of the architect of the landmark in this picture.",
"meta": {
"level": "Perception/Remembering",
"capability": "Landmark Recognition",
"url": "https://pbs.twimg.com/media/F-t_qZ8aMAAhXko?format=jpg&name=small",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Please describe the architectural style and possible design concept of the architect of the landmark in this picture.",
"answer": "The architectural style of the landmark in the image is a modern, futuristic design, featuring large, curved surfaces and a strong emphasis on lighting. The building appears to be a large, illuminated museum or a significant public structure located near the ocean. The design concept may have been inspired by the surrounding natural environment, focusing on the interplay between organic shapes and clean, minimalist lines. The use of vibrant colors, such as red and green, along with the integration of the building fa\u00e7ade with the water's edge, adds an artistic and visually striking element to the structure. This design approach creates a memorable and unique landmark that captures the essence of the modern architectural movement, while also providing a functional and aesthetically pleasing experience for visitors.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 62,
"image": "./data/images/62.jpg",
"question": "Translate the content in the three blocks of 'samples of representative Tweets' into English.",
"meta": {
"level": "Perception/Remembering",
"capability": "Multilingual Text Recognition",
"url": "https://pbs.twimg.com/media/ErH5W4zXAAAQGze?format=jpg&name=4096x4096",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Translate the content in the three blocks of 'samples of representative Tweets' into English.",
"answer": "Block 1: \nTwitter user @casspays shared a tweet about Grand Cascablanca's weather on January 8th, 2023. They wrote, \"Raining in Grand Cascablanca today, seems like a heavy rainstorm.\"\n\nBlock 2:\nTwitter user @cass_tourist shared a tweet about food in Grand Cascablanca on January 6th, 2023. They wrote, \"Just tried the new restaurant in Grand Casablanca and it was amazing! Great service and delicious food.\"\n\nBlock 3:\nTwitter user @cass_citylife shared a tweet about life in Grand Cascablanca on January 10th, 2023. They wrote, \"Spent a great day at the park in Grand Casablanca. It's always lively and full of activity here.\"",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}
},
{
"unique_idx": 35,
"image": "./data/images/35.jpg",
"question": "Discuss how the use of color, iconography, and text within the logo represents the club's identity and heritage.",
"meta": {
"level": "Perception/Remembering",
"capability": "Symbol Recognition",
"url": "https://ts1.cn.mm.bing.net/th/id/R-C.86780a723eb0b81b381f7f2cb28b245b?rik=MGEKtWbaFa5Oqg&riu=http%3a%2f%2flogos-download.com%2fwp-content%2fuploads%2f2016%2f05%2fManchester_United_logo_logotype_crest.png&ehk=Pub7uuy2S60G9gT5I%2fFWSMMuJnZhDcOlbL1fuFHEokk%3d&risl=1&pid=ImgRaw&r=0",
"question type": "open-ended"
},
"gen_model_id": "llava-v1.5-13b",
"prompt": "Discuss how the use of color, iconography, and text within the logo represents the club's identity and heritage.",
"answer": "The Manchester United logo incorporates several elements that collectively represent the club's identity and heritage. The primary color used is red, which signifies the passion and determination associated with the club. The red color is commonly associated with Manchester United and is used to create a strong visual connection between the brand and the club for its fans.\n\nThe iconography within the logo also plays a crucial role in representing their identity. The shield, which features the Manchester United name, serves as a symbol of protection and unity. The red devil or angel emblem within the shield is a graphic representation of the club's nickname, \"The Red Devils.\" This symbol has been a part of the team's identity since 1991, when it replaced the old gold lion crest.\n\nThe text of the logo, \"Manchester United,\" emphasizes the club's location and the fact that it is part of the city of Manchester. By including both the name and the iconography, the logo effectively communicates the club's identity as a professional football team from Manchester, with a strong sense of pride and unity among its fans.",
"gen_kwargs": {
"max_new_tokens": 768,
"min_new_tokens": 1,
"do_sample": true,
"temperature": 0.8
}