From b7aa9dfbca69f8a9b0d511039aec4fb322ea61eb Mon Sep 17 00:00:00 2001 From: jansila Date: Tue, 9 Apr 2024 20:03:09 +0200 Subject: [PATCH] lecture 7 solved --- 07_Algorithmic problem solving - solved.ipynb | 1015 +++++++++++++++++ 1 file changed, 1015 insertions(+) create mode 100644 07_Algorithmic problem solving - solved.ipynb diff --git a/07_Algorithmic problem solving - solved.ipynb b/07_Algorithmic problem solving - solved.ipynb new file mode 100644 index 0000000..67c8c36 --- /dev/null +++ b/07_Algorithmic problem solving - solved.ipynb @@ -0,0 +1,1015 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "05d4d58b-d56b-44f3-8d52-fc798723172b", + "metadata": {}, + "source": [ + "## Lecture 7 - Algorithmic problem solving" + ] + }, + { + "cell_type": "markdown", + "id": "c599074a-8b96-41ac-9404-2981b9a2f691", + "metadata": {}, + "source": [ + "# Goals:\n", + "- Algorithms are designed to solve computational problems.\n", + "- Communicate the way of solving, searching for correct (prove) and efficient solutions.\n", + "\n", + "# Problem:\n", + "- Set of inputs -> space of outputs (m -> n) mapping.\n", + "- Binary relation between inputs and outputs, not necessarily strictly bijection (one-to-one).\n", + "- Specify a predicate: Are there two people with the same birthday in the class today?\n", + "- Mathematic problems - Scientific computing:\n", + " - What is the derivative of x^2 at x=1?\n", + "- The algorithm should be general enough to apply to, say, C2 functions.\n", + " - Apply to arbitrarily large inputs.\n", + "\n", + "# Algorithm:\n", + "- Procedure (function) generating outputs, ideally correct outputs.\n", + "- m -> 1\n", + "\n", + "# Question:\n", + "- Devise an algorithm that finds out if two people have the same birthday.\n", + "\n", + "\n", + "# Efficiency, complexity:\n", + "- Abstract sense, not seconds or hours\n", + "- how many fundamental operations can algorithm do over real time\n", + "- dont measure time, count ops (operations) -> asymptotic analysis\n", + "- does not even need to be connected to the implementation - certain tasks has certain efficiency\n", + "- exact performance depends on size of the input space (birthday problem for 10 or 1000 people in class)\n", + "- most efficient O(1) access of item in dictionary\n", + "- least efficient O(n!)\n", + " - Traveling Salesman Problem (**TSP**): The TSP involves finding the shortest possible route that visits a set of cities and returns to the origin city. A brute force approach, which tries every possible permutation to find the shortest tour, has a time complexity of O(n!).\n", + "\n", + "# For Us:\n", + "- Breaking down complex problems into smaller, manageable parts.\n", + "- Pieces should be clearly defined and well-tested (later in your coding life).\n", + "- Efficient use of data structures.\n", + "- We use only built-in libraries today.\n" + ] + }, + { + "cell_type": "markdown", + "id": "a20dde5e-0202-40de-8ecf-3a14b98748aa", + "metadata": {}, + "source": [ + "### warm up task\n", + "\n", + "- find a maximum value in a list\n", + "- what steps would you take?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "91aa260f-55b9-4d8e-ae7b-1d2af7a6bb7e", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "\n", + "def find_maximum(arr):\n", + " max_num = arr[0] # Initialize max_num with the first element of the array\n", + " for num in arr:\n", + " # Compare each number with the current max_num\n", + " if num > max_num:\n", + " max_num = num # Update max_num if a larger number is found\n", + " return max_num # Return the largest number found in the array\n", + "\n", + "# Example usage\n", + "arr = [3, 6, 2, 8, 4]\n", + "print(f\"Maximum number in the array is: {find_maximum(arr)}\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "99926678-b66d-497c-aca8-4d9993246af2", + "metadata": {}, + "outputs": [], + "source": [ + "# give an array of arbitrary length, find on what position lies a value\n", + "# assume array is of integers\n", + "# if value is not found, return None" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "533961bd-a01e-40a1-af09-af713065c766", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Element found at index: 1\n" + ] + } + ], + "source": [ + "def linear_search(arr, x):\n", + " # Iterate through each element in the array\n", + " for i in range(len(arr)):\n", + " # Check if the current element is equal to the target value x\n", + " if arr[i] == x:\n", + " return i # Return the index of the element if found\n", + " return -1 # Return -1 if the element is not found in the array\n", + "\n", + "# Example usage\n", + "arr = [3, 4, 1, 7, 9]\n", + "x = 4\n", + "print(f\"Element found at index: {linear_search(arr, x)}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "e0d11799-fc5b-495e-8130-ee1c4c7bb387", + "metadata": {}, + "source": [ + "## Linear Search\n", + "\n", + "### Time Complexity\n", + "- **O(n)**, where `n` is the number of elements in the list.\n", + "\n", + "### Performance Characteristics\n", + "- **Worst-Case Scenario**: The element is at the end or not present, requiring `n` comparisons.\n", + "- **Average-Case Scenario**: On average, `n/2` comparisons are made.\n", + "\n", + "### Efficiency\n", + "- Linear search is less efficient for larger datasets, with performance degrading linearly with the size of the data.\n", + "\n", + "## Question:\n", + "- What is the complexity of searching in `m x n` matrix?\n", + "\n", + "## Can we devise a better algorithm?" + ] + }, + { + "cell_type": "markdown", + "id": "edebf315-f531-4776-b9db-d124526dd8a9", + "metadata": {}, + "source": [ + "## Binary Search" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "7746271a-cab9-4199-98d3-9db30d254c1c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Element found at index: 0\n" + ] + } + ], + "source": [ + "def binary_search(arr, x):\n", + " low = 0 # Starting index\n", + " high = len(arr) - 1 # Ending index\n", + "\n", + " # Repeat until the two indices meet\n", + " while low <= high:\n", + " mid = (low + high) // 2 # Find the middle element\n", + "\n", + " # If the middle element is less than x, ignore the left half\n", + " if arr[mid] < x:\n", + " low = mid + 1\n", + " # If the middle element is greater than x, ignore the right half\n", + " elif arr[mid] > x:\n", + " high = mid - 1\n", + " else:\n", + " return mid # Element is found, return its index\n", + " return -1 # Element is not present in array\n", + "\n", + "# Example usage\n", + "arr = [1, 3, 5, 7, 9]\n", + "x = 1\n", + "print(f\"Element found at index: {binary_search(arr, x)}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "08a296b4-c0d4-469b-af51-808c3f013314", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Time Complexity\n", + "- **O(log n)**, assuming the data is sorted.\n", + " - How would we achieve this in real life?\n", + "\n", + "### Performance Characteristics\n", + "- **Worst-Case and Average-Case Scenario**: The search space is halved with each step, significantly reducing the number of comparisons.\n", + "\n", + "### Efficiency\n", + "- Significantly more efficient than linear search for large, sorted datasets. Efficiency increases as the dataset size grows.\n", + "\n", + "## Comparison\n", + "\n", + "- **Dataset Size Dependency**: Linear search's performance is proportional to the dataset size, while binary search's performance is logarithmically proportional.\n", + "- **Precondition**: Binary search requires sorted data, unlike linear search.\n", + "- **Practical Implications**: For small datasets, the speed difference might be negligible. However, for large datasets, especially if sorted, binary search is exponentially faster than linear search." + ] + }, + { + "cell_type": "markdown", + "id": "f79e199b-2f0d-43f7-bab4-65c26eb3f031", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "id": "e1f78c59-be1a-4559-8a3d-4e4b09460583", + "metadata": {}, + "source": [ + "### Find sum of all subarrays of size k in an array" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "20389694-c8ba-4c11-a691-66bd2675baca", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[13, 20, 24, 28]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def double_iteration(arr:list[int], k:int)->list[int]:\n", + " results = []\n", + " for start in range(len(arr)-k+1):\n", + " current_sum = 0\n", + " for item in range(start, start+k):\n", + " current_sum += arr[item]\n", + " #append to results \n", + " results.append(current_sum)\n", + " return results\n", + "arr = [1,5,7,8,9,11]\n", + "double_iteration(arr, 3) #O(n*k)" + ] + }, + { + "cell_type": "markdown", + "id": "da86bace-7334-489c-a2a8-1dd386a928ac", + "metadata": {}, + "source": [ + "**Performance Analysis:**\n", + "\n", + "*Time Complexity*: As in the above approach, There are two loops, where first loop runs (N – K) times and second loop runs for K times. Hence the Time Complexity will be O(N*K).\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "71036d24-e149-4b27-becd-4f28a529664e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[13, 20, 24, 28]" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def fixed_sliding_window(arr:list[int], k:int)->list[int]:\n", + " #Sum up the first subarray and add to the result\n", + " curr_subarray = sum(arr[:k])\n", + " results = [curr_subarray]\n", + " #now iterate from the first item until lenght-k+1 (note the -k +1)\n", + " for idx in range(1, len(arr)-k+1):\n", + " curr_subarray = curr_subarray - arr[idx-1]\n", + " curr_subarray = curr_subarray + arr[idx+k-1]\n", + " results.append(curr_subarray)\n", + " return results\n", + "\n", + "arr = [1,5,7,8,9,11] \n", + "fixed_sliding_window(arr, 3)#O(n)" + ] + }, + { + "cell_type": "markdown", + "id": "8047baa5-6d2d-4f0d-b7a4-5b96b694c948", + "metadata": {}, + "source": [ + "**Performance Analysis**:\n", + "What do you guys think?" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "2d0ffb86-4760-4464-a88c-41599bfad16e", + "metadata": {}, + "outputs": [], + "source": [ + "#lets compare the algorithms\n", + "import random\n", + "\n", + "arr = [random.randint(0, 10000) for _ in range(1000)]\n", + "k = 120" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "273e4662-8129-46fa-9e8c-7d03c0c656e5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "21.4 ms ± 2.77 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "%timeit -r 10 -n 10 double_iteration(arr, k)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "fd06c4e2-52fc-4877-a9d0-54723b6887d5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "608 µs ± 60.7 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "%timeit -r 10 -n 10 fixed_sliding_window(arr, k)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "92e15508-e51d-4dc0-8801-a823910c9bf8", + "metadata": {}, + "outputs": [], + "source": [ + "## imagine sliding windows with more complicated functions (mean, meadian) \n", + "# create a structure which adds -> removes elements from the end and start of the window\n", + "# idea is to always keep one pass through the data " + ] + }, + { + "cell_type": "markdown", + "id": "42a6c6ea-e586-4002-b817-3866ee8c0836", + "metadata": {}, + "source": [ + "# lets get back to data processing level a bit\n", + "## Lets try to make a small project of putting together a dataset of Tarantino movies from wikipedia" + ] + }, + { + "cell_type": "code", + "execution_count": 109, + "id": "1abb1df0-1e3a-4b54-a944-3bcac8164888", + "metadata": {}, + "outputs": [], + "source": [ + "#lets prepare step by step what we will do" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "id": "f4881107-7ef0-406b-83c3-25f326187d9c", + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "from bs4 import BeautifulSoup\n", + "root_url = \"https://en.wikipedia.org\"\n", + "\n", + "def get_soup_for_link(lnk):\n", + " r = requests.get(lnk)\n", + " return BeautifulSoup(r.text,'html')\n", + "soup = get_soup_for_link(f\"{root_url}/wiki/Quentin_Tarantino#Filmography\")" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "id": "29a9a449-4409-4f3a-8bd7-f6f7e006f1c9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['/wiki/Reservoir_Dogs', 'Reservoir Dogs'],\n", + " ['/wiki/Miramax', 'Miramax'],\n", + " ['/wiki/Pulp_Fiction', 'Pulp Fiction'],\n", + " ['/wiki/Jackie_Brown', 'Jackie Brown'],\n", + " ['/wiki/Kill_Bill:_Volume_1', 'Kill Bill: Volume 1'],\n", + " ['/wiki/Kill_Bill:_Volume_2', 'Kill Bill: Volume 2'],\n", + " ['/wiki/Death_Proof', 'Death Proof'],\n", + " ['/wiki/Dimension_Films', 'Dimension Films'],\n", + " ['/wiki/Inglourious_Basterds', 'Inglourious Basterds'],\n", + " ['/wiki/The_Weinstein_Company', 'The Weinstein Company'],\n", + " ['/wiki/Universal_Pictures', 'Universal Pictures'],\n", + " ['/wiki/Django_Unchained', 'Django Unchained'],\n", + " ['/wiki/Sony_Pictures_Releasing', 'Sony Pictures Releasing'],\n", + " ['/wiki/The_Hateful_Eight', 'The Hateful Eight'],\n", + " ['/wiki/Once_Upon_a_Time_in_Hollywood', 'Once Upon a Time in Hollywood']]" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#find table element\n", + "table = soup.findAll('caption')[0].parent\n", + "lnks = []\n", + "for item in table.findAll('a'):\n", + " lnks.append([item['href'], item['title']])\n", + "lnks" + ] + }, + { + "cell_type": "code", + "execution_count": 105, + "id": "a8816607-1b61-4bc0-8520-37d0098cc7dc", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "error getting https://en.wikipedia.org//wiki/Miramax\n", + "error getting https://en.wikipedia.org//wiki/Dimension_Films\n", + "error getting https://en.wikipedia.org//wiki/The_Weinstein_Company\n", + "error getting https://en.wikipedia.org//wiki/Universal_Pictures\n", + "error getting https://en.wikipedia.org//wiki/Sony_Pictures_Releasing\n" + ] + } + ], + "source": [ + "def get_movie_df(link):\n", + " movie_soup = get_soup_for_link(link)\n", + " \n", + " try:\n", + " movie_table = movie_soup.findAll('table',{'class':'infobox vevent'})[0]\n", + " except IndexError:\n", + " print(f\"error getting {link}\")\n", + " return pd.DataFrame()\n", + " # Extract data from the table\n", + " data = {}\n", + " for row in movie_table.find_all('tr'):\n", + " if row.th and row.td:\n", + " key = row.th.text.strip()\n", + " value = row.td.text.strip()\n", + " data[key] = value\n", + " \n", + " # Convert dictionary to Pandas DataFrame\n", + " df = pd.DataFrame(list(data.items()), columns=['Attribute', 'Value'])\n", + " df.set_index('Attribute',inplace=True)\n", + " return df\n", + "\n", + "\n", + "movie_dfs = []\n", + "for movei_link, movie in lnks:\n", + " mov_df = get_movie_df(f\"{root_url}/{movei_link}\")\n", + " if not mov_df.empty:\n", + " mov_df.loc['title']=movie\n", + " movie_dfs.append(mov_df)\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 108, + "id": "997876c3-8bf0-44b6-900b-59086d486d32", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AttributeDirected byWritten byProduced byStarringCinematographyEdited byProductioncompaniesDistributed byRelease datesRunning time...Box officetitleStory byScreenplay byBased onProductioncompanyMusic byRelease dateLanguagesCountries
ValueQuentin TarantinoQuentin TarantinoLawrence BenderHarvey Keitel\\nTim Roth\\nChris Penn\\nSteve Bus...Andrzej SekułaSally MenkeLive America Inc.\\nDog Eat Dog ProductionsMiramax FilmsJanuary 21, 1992 (1992-01-21) (Sundance)\\nOcto...99 minutes[1]...$2.9 million[1]Reservoir DogsNaNNaNNaNNaNNaNNaNNaNNaN
ValueQuentin TarantinoQuentin TarantinoLawrence BenderJohn Travolta\\nSamuel L. Jackson\\nUma Thurman\\...Andrzej SekułaSally MenkeA Band Apart\\nJersey FilmsMiramax FilmsMay 21, 1994 (1994-05-21) (Cannes)\\nOctober 14...154 minutes[1]...$213.9 million[2]Pulp FictionQuentin Tarantino\\nRoger AvaryNaNNaNNaNNaNNaNNaNNaN
ValueQuentin TarantinoNaNLawrence BenderPam Grier\\nSamuel L. Jackson\\nRobert Forster\\n...Guillermo NavarroSally MenkeNaNMiramax FilmsDecember 8, 1997 (1997-12-08) (Ziegfeld Theatr...154 minutes[1]...$74.7 million[2]Jackie BrownNaNQuentin TarantinoRum Punchby Elmore LeonardA Band ApartNaNNaNNaNNaN
ValueQuentin TarantinoQuentin TarantinoLawrence BenderUma Thurman\\nLucy Liu\\nVivica A. Fox\\nMichael ...Robert RichardsonSally MenkeNaNMiramax Films[1]NaN111 minutes...$180.9 million[2]Kill Bill: Volume 1NaNNaNNaNA Band Apart[1]RZAOctober 10, 2003 (2003-10-10)EnglishChineseJapaneseNaN
ValueQuentin TarantinoQuentin TarantinoLawrence BenderUma Thurman\\nDavid Carradine\\nMichael Madsen\\n...Robert RichardsonSally MenkeNaNMiramax FilmsApril 8, 2004 (2004-04-08) (Cinerama Dome)\\nAp...137 minutes...$152.2 million[3]Kill Bill: Volume 2NaNNaNNaNA Band Apart[1]RZA\\nRobert RodriguezNaNNaNNaN
ValueQuentin TarantinoQuentin TarantinoQuentin Tarantino\\nRobert Rodriguez\\nElizabeth...Kurt Russell\\nRosario Dawson\\nVanessa Ferlito\\...Quentin TarantinoSally MenkeNaNDimension FilmsNaN113 minutes...$31.1 million[1]Death ProofNaNNaNNaNTroublemaker StudiosNaNApril 6, 2007 (2007-04-06)\\n(released as part ...NaNNaN
ValueQuentin TarantinoQuentin TarantinoLawrence BenderBrad Pitt\\nChristoph Waltz\\nMichael Fassbender...Robert RichardsonSally MenkeThe Weinstein Company[1]\\nUniversal Pictures[1...The Weinstein Company (United States)\\nUnivers...May 20, 2009 (2009-05-20) (Cannes)\\nAugust 20,...153 minutes[2]...$321.5 million[7]Inglourious BasterdsNaNNaNNaNNaNNaNNaNEnglish\\nGerman\\nFrenchUnited States[3][4][5]\\nGermany[3][4]
ValueQuentin TarantinoQuentin TarantinoStacey Sher\\nReginald Hudlin\\nPilar SavoneJamie Foxx\\nChristoph Waltz\\nLeonardo DiCaprio...Robert RichardsonFred RaskinA Band Apart[1]\\nColumbia Pictures[1]The Weinstein Company[1] (United States)[2]\\nC...December 11, 2012 (2012-12-11) (Ziegfeld Theat...165 minutes[4]...$426 million[3]Django UnchainedNaNNaNNaNNaNNaNNaNNaNNaN
ValueQuentin TarantinoQuentin TarantinoRichard N. Gladstein\\nStacey Sher\\nShannon McI...Samuel L. Jackson\\nKurt Russell\\nJennifer Jaso...Robert RichardsonFred RaskinThe Weinstein Company[1]\\nShiny Penny[2]\\nFilm...The Weinstein Company[2]December 7, 2015 (2015-12-07) (Cinerama Dome)\\...187 minutes (Roadshow)\\n168 minutes (General)\\......$156.5 million[6]The Hateful EightNaNNaNNaNNaNEnnio MorriconeNaNNaNNaN
ValueQuentin TarantinoQuentin TarantinoDavid Heyman\\nShannon McIntosh\\nQuentin TarantinoLeonardo DiCaprio\\nBrad Pitt\\nMargot Robbie\\nE...Robert RichardsonFred RaskinColumbia Pictures\\nBona Film Group\\nHeyday Fil...Sony Pictures ReleasingMay 21, 2019 (2019-05-21) (Cannes)\\nJuly 26, 2...161 minutes[1]...$377.6 million[4]Once Upon a Time in HollywoodNaNNaNNaNNaNNaNNaNNaNUnited States\\nUnited Kingdom\\nChina[2]
\n", + "

10 rows × 23 columns

\n", + "
" + ], + "text/plain": [ + "Attribute Directed by Written by \\\n", + "Value Quentin Tarantino Quentin Tarantino \n", + "Value Quentin Tarantino Quentin Tarantino \n", + "Value Quentin Tarantino NaN \n", + "Value Quentin Tarantino Quentin Tarantino \n", + "Value Quentin Tarantino Quentin Tarantino \n", + "Value Quentin Tarantino Quentin Tarantino \n", + "Value Quentin Tarantino Quentin Tarantino \n", + "Value Quentin Tarantino Quentin Tarantino \n", + "Value Quentin Tarantino Quentin Tarantino \n", + "Value Quentin Tarantino Quentin Tarantino \n", + "\n", + "Attribute Produced by \\\n", + "Value Lawrence Bender \n", + "Value Lawrence Bender \n", + "Value Lawrence Bender \n", + "Value Lawrence Bender \n", + "Value Lawrence Bender \n", + "Value Quentin Tarantino\\nRobert Rodriguez\\nElizabeth... \n", + "Value Lawrence Bender \n", + "Value Stacey Sher\\nReginald Hudlin\\nPilar Savone \n", + "Value Richard N. Gladstein\\nStacey Sher\\nShannon McI... \n", + "Value David Heyman\\nShannon McIntosh\\nQuentin Tarantino \n", + "\n", + "Attribute Starring \\\n", + "Value Harvey Keitel\\nTim Roth\\nChris Penn\\nSteve Bus... \n", + "Value John Travolta\\nSamuel L. Jackson\\nUma Thurman\\... \n", + "Value Pam Grier\\nSamuel L. Jackson\\nRobert Forster\\n... \n", + "Value Uma Thurman\\nLucy Liu\\nVivica A. Fox\\nMichael ... \n", + "Value Uma Thurman\\nDavid Carradine\\nMichael Madsen\\n... \n", + "Value Kurt Russell\\nRosario Dawson\\nVanessa Ferlito\\... \n", + "Value Brad Pitt\\nChristoph Waltz\\nMichael Fassbender... \n", + "Value Jamie Foxx\\nChristoph Waltz\\nLeonardo DiCaprio... \n", + "Value Samuel L. Jackson\\nKurt Russell\\nJennifer Jaso... \n", + "Value Leonardo DiCaprio\\nBrad Pitt\\nMargot Robbie\\nE... \n", + "\n", + "Attribute Cinematography Edited by \\\n", + "Value Andrzej Sekuła Sally Menke \n", + "Value Andrzej Sekuła Sally Menke \n", + "Value Guillermo Navarro Sally Menke \n", + "Value Robert Richardson Sally Menke \n", + "Value Robert Richardson Sally Menke \n", + "Value Quentin Tarantino Sally Menke \n", + "Value Robert Richardson Sally Menke \n", + "Value Robert Richardson Fred Raskin \n", + "Value Robert Richardson Fred Raskin \n", + "Value Robert Richardson Fred Raskin \n", + "\n", + "Attribute Productioncompanies \\\n", + "Value Live America Inc.\\nDog Eat Dog Productions \n", + "Value A Band Apart\\nJersey Films \n", + "Value NaN \n", + "Value NaN \n", + "Value NaN \n", + "Value NaN \n", + "Value The Weinstein Company[1]\\nUniversal Pictures[1... \n", + "Value A Band Apart[1]\\nColumbia Pictures[1] \n", + "Value The Weinstein Company[1]\\nShiny Penny[2]\\nFilm... \n", + "Value Columbia Pictures\\nBona Film Group\\nHeyday Fil... \n", + "\n", + "Attribute Distributed by \\\n", + "Value Miramax Films \n", + "Value Miramax Films \n", + "Value Miramax Films \n", + "Value Miramax Films[1] \n", + "Value Miramax Films \n", + "Value Dimension Films \n", + "Value The Weinstein Company (United States)\\nUnivers... \n", + "Value The Weinstein Company[1] (United States)[2]\\nC... \n", + "Value The Weinstein Company[2] \n", + "Value Sony Pictures Releasing \n", + "\n", + "Attribute Release dates \\\n", + "Value January 21, 1992 (1992-01-21) (Sundance)\\nOcto... \n", + "Value May 21, 1994 (1994-05-21) (Cannes)\\nOctober 14... \n", + "Value December 8, 1997 (1997-12-08) (Ziegfeld Theatr... \n", + "Value NaN \n", + "Value April 8, 2004 (2004-04-08) (Cinerama Dome)\\nAp... \n", + "Value NaN \n", + "Value May 20, 2009 (2009-05-20) (Cannes)\\nAugust 20,... \n", + "Value December 11, 2012 (2012-12-11) (Ziegfeld Theat... \n", + "Value December 7, 2015 (2015-12-07) (Cinerama Dome)\\... \n", + "Value May 21, 2019 (2019-05-21) (Cannes)\\nJuly 26, 2... \n", + "\n", + "Attribute Running time ... \\\n", + "Value 99 minutes[1] ... \n", + "Value 154 minutes[1] ... \n", + "Value 154 minutes[1] ... \n", + "Value 111 minutes ... \n", + "Value 137 minutes ... \n", + "Value 113 minutes ... \n", + "Value 153 minutes[2] ... \n", + "Value 165 minutes[4] ... \n", + "Value 187 minutes (Roadshow)\\n168 minutes (General)\\... ... \n", + "Value 161 minutes[1] ... \n", + "\n", + "Attribute Box office title \\\n", + "Value $2.9 million[1] Reservoir Dogs \n", + "Value $213.9 million[2] Pulp Fiction \n", + "Value $74.7 million[2] Jackie Brown \n", + "Value $180.9 million[2] Kill Bill: Volume 1 \n", + "Value $152.2 million[3] Kill Bill: Volume 2 \n", + "Value $31.1 million[1] Death Proof \n", + "Value $321.5 million[7] Inglourious Basterds \n", + "Value $426 million[3] Django Unchained \n", + "Value $156.5 million[6] The Hateful Eight \n", + "Value $377.6 million[4] Once Upon a Time in Hollywood \n", + "\n", + "Attribute Story by Screenplay by \\\n", + "Value NaN NaN \n", + "Value Quentin Tarantino\\nRoger Avary NaN \n", + "Value NaN Quentin Tarantino \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "\n", + "Attribute Based on Productioncompany \\\n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value Rum Punchby Elmore Leonard A Band Apart \n", + "Value NaN A Band Apart[1] \n", + "Value NaN A Band Apart[1] \n", + "Value NaN Troublemaker Studios \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "\n", + "Attribute Music by \\\n", + "Value NaN \n", + "Value NaN \n", + "Value NaN \n", + "Value RZA \n", + "Value RZA\\nRobert Rodriguez \n", + "Value NaN \n", + "Value NaN \n", + "Value NaN \n", + "Value Ennio Morricone \n", + "Value NaN \n", + "\n", + "Attribute Release date \\\n", + "Value NaN \n", + "Value NaN \n", + "Value NaN \n", + "Value October 10, 2003 (2003-10-10) \n", + "Value NaN \n", + "Value April 6, 2007 (2007-04-06)\\n(released as part ... \n", + "Value NaN \n", + "Value NaN \n", + "Value NaN \n", + "Value NaN \n", + "\n", + "Attribute Languages Countries \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value EnglishChineseJapanese NaN \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value English\\nGerman\\nFrench United States[3][4][5]\\nGermany[3][4] \n", + "Value NaN NaN \n", + "Value NaN NaN \n", + "Value NaN United States\\nUnited Kingdom\\nChina[2] \n", + "\n", + "[10 rows x 23 columns]" + ] + }, + "execution_count": 108, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.concat(movie_dfs,axis=1).transpose()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a46008bf-e03d-42eb-87df-f75aa5c021f2", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}