{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "d78bb23f-2c37-4085-b03e-fbf5a22403c8", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "---------------------------------\n", "Working on the host: Joachims-MacBook-Pro.local\n", "\n", "---------------------------------\n", "Python version: 3.10.2 | packaged by conda-forge | (main, Feb 1 2022, 19:30:18) [Clang 11.1.0 ]\n", "\n", "---------------------------------\n", "Python interpreter: /opt/miniconda3/envs/srh/bin/python\n" ] } ], "source": [ "%matplotlib inline\n", "# Load the \"autoreload\" extension\n", "%load_ext autoreload\n", "# always reload modules\n", "%autoreload 2\n", "# black formatter for jupyter notebooks\n", "#%load_ext nb_black\n", "# black formatter for jupyter lab\n", "%load_ext lab_black\n", "\n", "%run ../../../src/notebook_env.py" ] }, { "cell_type": "markdown", "id": "dbaf8d98-cc4e-4a87-bbdf-78f41a2a8556", "metadata": {}, "source": [ "# Die Pandas Bibliothek" ] }, { "cell_type": "markdown", "id": "2abd91eb-e4a3-4cbd-a557-81ebf3167035", "metadata": {}, "source": [ "Die Pandas-Bibliothek wurde 2010 von Wes McKinney entwickelt. pandas bietet **Datenstrukturen** und **Funktionen** für die Manipulation, Verarbeitung, Bereinigung und Verwertung von Daten. Im Python-Ökosystem ist pandas das modernste Werkzeug für die Arbeit mit tabellarischen oder tabellenähnlichen Daten, bei denen jede Spalte von einem anderen Typ sein kann (`String`, `numerisch`, `Datum` oder andere). pandas bietet ausgefeilte Indizierungsfunktionen, die das Umformen, Zerlegen, Aggregieren und Auswählen von Teilmengen von Daten erleichtern. pandas stützt sich auf andere Pakete, wie NumPy und SciPy. Außerdem integriert pandas matplotlib zum Plotten.\n", "\n", "Wenn Sie neu im Umgang mit pandas sind, empfehlen wir Ihnen dringend, die sehr gut geschriebenen pandas-Tutorials zu besuchen, die alle relevanten Abschnitte für neue Benutzer abdecken, um richtig loszulegen.\n", "\n", "Nach der Installation (Details finden Sie in der Dokumentation) wird pandas mit dem kanonischen Alias `pd` importiert." ] }, { "cell_type": "code", "execution_count": 2, "id": "ba0d7735-a041-4763-b12c-67934e6f0341", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "id": "1b8af681-ea24-4063-b56f-9ff7d32782a6", "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "id": "c7e36759-f0c0-495e-b7ff-8d156d27cf5f", "metadata": {}, "source": [ "Die Pandas-Bibliothek verfügt über zwei bewährte Datenstrukturen: **Series** und **DataFrame**.\n", "\n", " - eindimensionales pd.Series-Objekt\n", " - zweidimensionales pd.DataFrame-Objekt" ] }, { "cell_type": "markdown", "id": "1b24e6d5-1255-48e4-bdec-392e01ecfae7", "metadata": { "tags": [] }, "source": [ "## Das `pd.Series` Objekt" ] }, { "cell_type": "markdown", "id": "b96fb633-44e9-4a1a-b2b2-c15782dd4dc4", "metadata": {}, "source": [ "Erzeugung von Daten" ] }, { "cell_type": "code", "execution_count": 4, "id": "7836d460-9561-4d94-b4e3-5101613eed28", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 3, -8, -8, -4, 7, 9, 0, -9, -10, 7, 5, -1, -10,\n", " 4, -10, 5, 9, 4, -6, -10, 6, -6, 7, -7, -8, -3])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# importiere das random module von numpy\n", "from numpy import random\n", "\n", "# setze seed\n", "random.seed(123)\n", "# Erzeuge 26 Zufallszahlen zwischen -10 and 10\n", "my_data = random.randint(low=-10, high=10, size=26)\n", "# Ausgabe\n", "my_data" ] }, { "cell_type": "code", "execution_count": 5, "id": "8a9a012f-62d5-4f4f-9bf6-f73f56558a6e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(my_data)" ] }, { "cell_type": "markdown", "id": "fa34c86f-b0f7-4ece-8831-8d311b691142", "metadata": {}, "source": [ "Eine Series ist ein eindimensionales Array-ähnliches Objekt, das ein Array mit Daten und ein zugehöriges Array mit Datenbeschriftungen, genannt Index, enthält. Wir erstellen ein `pd.Series-Objekt`, indem wir die Funktion `pd.Series()` aufrufen." ] }, { "cell_type": "code", "execution_count": 6, "id": "4df63558-005e-4042-906c-0ef7f966cade", "metadata": {}, "outputs": [], "source": [ "# Entkommentieren für Dokumentation\n", "\n", "# docstring\n", "# ?pd.Series\n", "\n", "# source\n", "# ??pd.Series" ] }, { "cell_type": "code", "execution_count": 7, "id": "847c37b9-d52f-43fd-9434-b34a5fd31359", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 3\n", "1 -8\n", "2 -8\n", "3 -4\n", "4 7\n", "5 9\n", "6 0\n", "7 -9\n", "8 -10\n", "9 7\n", "10 5\n", "11 -1\n", "12 -10\n", "13 4\n", "14 -10\n", "15 5\n", "16 9\n", "17 4\n", "18 -6\n", "19 -10\n", "20 6\n", "21 -6\n", "22 7\n", "23 -7\n", "24 -8\n", "25 -3\n", "dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Erzeuge pd.Series Objekt\n", "s = pd.Series(data=my_data)\n", "s" ] }, { "cell_type": "code", "execution_count": 8, "id": "3b3482da-b780-49e3-9e2d-9a83c62e9481", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(s)" ] }, { "cell_type": "markdown", "id": "adcad019-64b2-4270-87c3-e638f9cea1ff", "metadata": {}, "source": [ "### `pd.Series`-Attribute" ] }, { "cell_type": "markdown", "id": "e0b6af3d-efc3-492d-aaf1-d4fe1d6e9247", "metadata": {}, "source": [ "Python-Objekte im Allgemeinen und die `pd.Series` im Besonderen bieten nützliche objektspezifische Attribute.\n", "\n", "*Attribut* ->`OBJECT.attribute` \n", "\n", "*Beachten Sie, dass das Attribut ohne Klammern aufgerufen wird*" ] }, { "cell_type": "code", "execution_count": 9, "id": "82ff79e7-f027-4aec-a0bd-d8d347768637", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.dtypes" ] }, { "cell_type": "code", "execution_count": 10, "id": "a1b44685-7d0d-45e0-8aa8-025cfb7b5fb9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=26, step=1)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.index" ] }, { "cell_type": "markdown", "id": "23272ad3-f009-439b-abc9-45a23bfc7e63", "metadata": {}, "source": [ "Wir können das Attribut `index` verwenden, um einem `pd.Series-Objekt` einen Index zuzuweisen.\n", "\n", "Betrachten wir die Buchstaben des Alphabets...." ] }, { "cell_type": "code", "execution_count": 11, "id": "dd174785-a581-416f-b934-d0af5a61037b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import string\n", "\n", "letters = string.ascii_uppercase\n", "letters" ] }, { "cell_type": "code", "execution_count": 12, "id": "768a33c4-c1eb-4428-8870-e00e0c48eb7e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A 3\n", "B -8\n", "C -8\n", "D -4\n", "E 7\n", "F 9\n", "G 0\n", "H -9\n", "I -10\n", "J 7\n", "K 5\n", "L -1\n", "M -10\n", "N 4\n", "O -10\n", "P 5\n", "Q 9\n", "R 4\n", "S -6\n", "T -10\n", "U 6\n", "V -6\n", "W 7\n", "X -7\n", "Y -8\n", "Z -3\n", "dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.index = list(letters)\n", "s" ] }, { "cell_type": "markdown", "id": "8c02c73c-c2ee-48a3-99e9-3a51d8a5ecbb", "metadata": {}, "source": [ "### `pd.Series`-Methoden" ] }, { "cell_type": "code", "execution_count": 13, "id": "097c71b4-be9f-414e-9bc5-685f4a4db861", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-34" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.sum()" ] }, { "cell_type": "code", "execution_count": 14, "id": "238f6dc6-2a15-4b45-8515-504dd056d176", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-1.3076923076923077" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.mean()" ] }, { "cell_type": "code", "execution_count": 15, "id": "e2fc7253-1df5-4eda-9c3c-167d639f0a43", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "9" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.max()" ] }, { "cell_type": "code", "execution_count": 16, "id": "30a29c16-f4e6-4b85-8dc3-a6c1bf685e0e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-10" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.min()" ] }, { "cell_type": "code", "execution_count": 17, "id": "8d6beb4e-09a9-4a6a-8462-00c12e0412a7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-2.0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.median()" ] }, { "cell_type": "code", "execution_count": 18, "id": "870d5789-305d-4352-9e1c-73f263b5966d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-2.0" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.quantile(q=0.5)" ] }, { "cell_type": "code", "execution_count": 19, "id": "55367dbc-c4f7-4e38-ab57-2902ddb87a65", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.25 -8.0\n", "0.50 -2.0\n", "0.75 5.0\n", "dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.quantile(q=[0.25, 0.5, 0.75])" ] }, { "cell_type": "markdown", "id": "66b0d331-b0d5-4eb7-8a11-47bddeb169d6", "metadata": {}, "source": [ "### Elementweise Arithmetik" ] }, { "cell_type": "markdown", "id": "a93f5c4b-3953-4f83-b7e9-9940b86f0ffd", "metadata": {}, "source": [ "Eine sehr nützliche Eigenschaft von `pd.Series`-Objekten ist, dass wir arithmetische Operationen *elementweise* anwenden können." ] }, { "cell_type": "code", "execution_count": 20, "id": "44062f75-0c2d-47fb-9043-3a97095bb36a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A 13\n", "B 2\n", "C 2\n", "D 6\n", "E 17\n", "F 19\n", "G 10\n", "H 1\n", "I 0\n", "J 17\n", "K 15\n", "L 9\n", "M 0\n", "N 14\n", "O 0\n", "P 15\n", "Q 19\n", "R 14\n", "S 4\n", "T 0\n", "U 16\n", "V 4\n", "W 17\n", "X 3\n", "Y 2\n", "Z 7\n", "dtype: int64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s + 10\n", "# s*0.1\n", "# 10/s\n", "# s**2\n", "# (2+s)*1**3\n", "# s+s" ] }, { "cell_type": "markdown", "id": "25af5412-e9c7-43a6-8da6-1fe4b7404f3b", "metadata": {}, "source": [ "### Auswahl und Indizierung" ] }, { "cell_type": "markdown", "id": "c3d52e0a-be9f-4153-aa5b-a75950327495", "metadata": {}, "source": [ "Eine weitere wichtige Datenoperation ist die Indizierung und Auswahl bestimmter Teilmengen des Datenobjekts. pandas verfügt über einen sehr umfangreichen Satz von Methoden für diese Art von Aufgaben.\n", "\n", "In der einfachsten Form indizieren wir eine Reihe numpy-ähnlich, indem wir den `[ ]` Operator verwenden, um einen bestimmten `Index` der Reihe auszuwählen." ] }, { "cell_type": "code", "execution_count": 21, "id": "4d88458c-46aa-4a0a-9cc5-0e2d0dc71370", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A 3\n", "B -8\n", "C -8\n", "D -4\n", "E 7\n", "F 9\n", "G 0\n", "H -9\n", "I -10\n", "J 7\n", "K 5\n", "L -1\n", "M -10\n", "N 4\n", "O -10\n", "P 5\n", "Q 9\n", "R 4\n", "S -6\n", "T -10\n", "U 6\n", "V -6\n", "W 7\n", "X -7\n", "Y -8\n", "Z -3\n", "dtype: int64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 22, "id": "8944cb4a-30a9-4256-8089-04ad3e7e2f38", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-4" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[3]" ] }, { "cell_type": "code", "execution_count": 23, "id": "ecf6e8bf-9eb0-47c1-83bb-e3038d4bbc0c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "C -8\n", "D -4\n", "E 7\n", "F 9\n", "dtype: int64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[2:6]" ] }, { "cell_type": "code", "execution_count": 24, "id": "45227ee3-1a5a-4bc3-b017-a2870ea079f4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-8" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[\"C\"]" ] }, { "cell_type": "code", "execution_count": 25, "id": "af6324d8-f899-437e-8b10-f0aee675c926", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "C -8\n", "D -4\n", "E 7\n", "F 9\n", "G 0\n", "H -9\n", "I -10\n", "J 7\n", "K 5\n", "dtype: int64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[\"C\":\"K\"]" ] }, { "cell_type": "markdown", "id": "7b0cc09e-6b4a-4d60-be1a-dfc5e0252ff9", "metadata": {}, "source": [ "## Das `pd.DataFrame`-Objekt" ] }, { "cell_type": "markdown", "id": "64183077-8155-4528-b4c8-b24361b66726", "metadata": {}, "source": [ "Die primäre Datenstruktur von Pandas ist der `DataFrame`. Es handelt sich um eine zweidimensionale, größenveränderliche, potenziell heterogene tabellarische Datenstruktur mit Zeilen- und Spaltenbeschriftungen. Arithmetische Operationen richten sich sowohl auf Zeilen- als auch auf Spaltenbeschriftungen aus. Grundsätzlich kann man sich den `DataFrame` als einen `dictionary`-artigen Container für Seriesobjekte vorstellen.\n", "\n", "**Erzeugen eines `DataFrame`-Objekts von Grund auf**\n", "\n", "pandas erleichtert den Import verschiedener Datentypen und -quellen, aber für dieses Tutorial erzeugen wir ein DataFrame-Objekt von Grund auf.\n", "\n", "Quelle: http://duelingdata.blogspot.de/2016/01/the-beatles.html" ] }, { "cell_type": "code", "execution_count": 26, "id": "51fb7a75-893d-4759-8ebc-5702e7815cb3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNameLast Namedeadyear_bornno_of_songs
01JohnLennonTrue194062
12PaulMcCartneyFalse194258
23GeorgeHarrisonTrue194324
34RingoStarFalse19403
\n", "
" ], "text/plain": [ " id Name Last Name dead year_born no_of_songs\n", "0 1 John Lennon True 1940 62\n", "1 2 Paul McCartney False 1942 58\n", "2 3 George Harrison True 1943 24\n", "3 4 Ringo Star False 1940 3" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(\n", " {\n", " \"id\": range(1, 5),\n", " \"Name\": [\"John\", \"Paul\", \"George\", \"Ringo\"],\n", " \"Last Name\": [\"Lennon\", \"McCartney\", \"Harrison\", \"Star\"],\n", " \"dead\": [True, False, True, False],\n", " \"year_born\": [1940, 1942, 1943, 1940],\n", " \"no_of_songs\": [62, 58, 24, 3],\n", " }\n", ")\n", "df" ] }, { "cell_type": "markdown", "id": "8cd333f9-1f76-4226-ba57-1cb28fdb5d0b", "metadata": {}, "source": [ "### `pd.DataFrame`-Attribute" ] }, { "cell_type": "code", "execution_count": 27, "id": "1a6ae349-d43d-44ff-bf2e-9b0462008b46", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id int64\n", "Name object\n", "Last Name object\n", "dead bool\n", "year_born int64\n", "no_of_songs int64\n", "dtype: object" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "code", "execution_count": 28, "id": "85340592-8c86-448f-b125-54d3c9f961a2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['id', 'Name', 'Last Name', 'dead', 'year_born', 'no_of_songs'], dtype='object')" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Achse 0\n", "df.columns" ] }, { "cell_type": "code", "execution_count": 29, "id": "6bf53af5-7259-426f-8e64-627a3fda1470", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=4, step=1)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Achse 1\n", "df.index" ] }, { "cell_type": "markdown", "id": "61eb7e94-2877-4e85-ba6a-d87a8cf801de", "metadata": {}, "source": [ "### `pd.DataFrame`-Methoden" ] }, { "cell_type": "markdown", "id": "7f5ba15d-c365-46fd-b883-37302c20f674", "metadata": {}, "source": [ "**Verschaffen Sie sich einen schnellen Überblick über den Datensatz**" ] }, { "cell_type": "code", "execution_count": 30, "id": "1cb35f3d-c6c5-4d37-962d-5e969da8f7aa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 4 entries, 0 to 3\n", "Data columns (total 6 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 id 4 non-null int64 \n", " 1 Name 4 non-null object\n", " 2 Last Name 4 non-null object\n", " 3 dead 4 non-null bool \n", " 4 year_born 4 non-null int64 \n", " 5 no_of_songs 4 non-null int64 \n", "dtypes: bool(1), int64(3), object(2)\n", "memory usage: 292.0+ bytes\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "code", "execution_count": 31, "id": "eb01c3eb-2826-42cf-8926-e76ba7979737", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idyear_bornno_of_songs
count4.0000004.004.000000
mean2.5000001941.2536.750000
std1.2909941.5028.229712
min1.0000001940.003.000000
25%1.7500001940.0018.750000
50%2.5000001941.0041.000000
75%3.2500001942.2559.000000
max4.0000001943.0062.000000
\n", "
" ], "text/plain": [ " id year_born no_of_songs\n", "count 4.000000 4.00 4.000000\n", "mean 2.500000 1941.25 36.750000\n", "std 1.290994 1.50 28.229712\n", "min 1.000000 1940.00 3.000000\n", "25% 1.750000 1940.00 18.750000\n", "50% 2.500000 1941.00 41.000000\n", "75% 3.250000 1942.25 59.000000\n", "max 4.000000 1943.00 62.000000" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": 32, "id": "3882d279-9d5a-4bee-bafb-6caa5927dcd0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNameLast Namedeadyear_bornno_of_songs
count4.0000004444.004.000000
uniqueNaN442NaNNaN
topNaNJohnLennonTrueNaNNaN
freqNaN112NaNNaN
mean2.500000NaNNaNNaN1941.2536.750000
std1.290994NaNNaNNaN1.5028.229712
min1.000000NaNNaNNaN1940.003.000000
25%1.750000NaNNaNNaN1940.0018.750000
50%2.500000NaNNaNNaN1941.0041.000000
75%3.250000NaNNaNNaN1942.2559.000000
max4.000000NaNNaNNaN1943.0062.000000
\n", "
" ], "text/plain": [ " id Name Last Name dead year_born no_of_songs\n", "count 4.000000 4 4 4 4.00 4.000000\n", "unique NaN 4 4 2 NaN NaN\n", "top NaN John Lennon True NaN NaN\n", "freq NaN 1 1 2 NaN NaN\n", "mean 2.500000 NaN NaN NaN 1941.25 36.750000\n", "std 1.290994 NaN NaN NaN 1.50 28.229712\n", "min 1.000000 NaN NaN NaN 1940.00 3.000000\n", "25% 1.750000 NaN NaN NaN 1940.00 18.750000\n", "50% 2.500000 NaN NaN NaN 1941.00 41.000000\n", "75% 3.250000 NaN NaN NaN 1942.25 59.000000\n", "max 4.000000 NaN NaN NaN 1943.00 62.000000" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe(include=\"all\")" ] }, { "cell_type": "markdown", "id": "e3fc9e0f-7c50-43dd-8c56-3ecc6547b909", "metadata": {}, "source": [ "**Index in die Variable `id` ändern**" ] }, { "cell_type": "code", "execution_count": 33, "id": "d895c130-93c0-4faf-955a-2e502554f3f2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNameLast Namedeadyear_bornno_of_songs
01JohnLennonTrue194062
12PaulMcCartneyFalse194258
23GeorgeHarrisonTrue194324
34RingoStarFalse19403
\n", "
" ], "text/plain": [ " id Name Last Name dead year_born no_of_songs\n", "0 1 John Lennon True 1940 62\n", "1 2 Paul McCartney False 1942 58\n", "2 3 George Harrison True 1943 24\n", "3 4 Ringo Star False 1940 3" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 34, "id": "7eaf95c9-7e03-4a33-9986-b17b312fc11b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
1JohnLennonTrue194062
2PaulMcCartneyFalse194258
3GeorgeHarrisonTrue194324
4RingoStarFalse19403
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "1 John Lennon True 1940 62\n", "2 Paul McCartney False 1942 58\n", "3 George Harrison True 1943 24\n", "4 Ringo Star False 1940 3" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index(\"id\")" ] }, { "cell_type": "code", "execution_count": 35, "id": "cb0dc184-ecb3-46df-abd9-f6485ba2a86f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNameLast Namedeadyear_bornno_of_songs
01JohnLennonTrue194062
12PaulMcCartneyFalse194258
23GeorgeHarrisonTrue194324
34RingoStarFalse19403
\n", "
" ], "text/plain": [ " id Name Last Name dead year_born no_of_songs\n", "0 1 John Lennon True 1940 62\n", "1 2 Paul McCartney False 1942 58\n", "2 3 George Harrison True 1943 24\n", "3 4 Ringo Star False 1940 3" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "cd392ee2-98ba-4ac9-add4-c41e248ac728", "metadata": {}, "source": [ "Beachten Sie, dass sich nichts geändert hat!!\n", "\n", "Aus Gründen der Speicher- und Berechnungseffizienz gibt `Pandas` eine Ansicht des Objekts zurück, keine Kopie. Wenn wir also eine dauerhafte Änderung vornehmen wollen, müssen wir das Objekt einer Variablen zuweisen/neu zuordnen:\n", "\n", "`df = df.set_index(\"id\") `\n", "\n", "oder einige Methoden haben das Argument `inplace=True`:\n", "\n", "`df.set_index(\"id\", inplace=True)` " ] }, { "cell_type": "code", "execution_count": 36, "id": "3cb07063-60e4-4cd6-9022-cd1d0c8056bc", "metadata": {}, "outputs": [], "source": [ "df = df.set_index(\"id\")" ] }, { "cell_type": "code", "execution_count": 37, "id": "5a62f520-2f27-4ca6-a719-a65c4b700b95", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
1JohnLennonTrue194062
2PaulMcCartneyFalse194258
3GeorgeHarrisonTrue194324
4RingoStarFalse19403
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "1 John Lennon True 1940 62\n", "2 Paul McCartney False 1942 58\n", "3 George Harrison True 1943 24\n", "4 Ringo Star False 1940 3" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "72bee6b1-99da-46ba-9100-a4e00ea490cd", "metadata": {}, "source": [ "**Arithmetische Methoden**" ] }, { "cell_type": "code", "execution_count": 38, "id": "55454ac6-7b5c-460e-a760-d58bfd37ebf9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
1JohnLennonTrue194062
2PaulMcCartneyFalse194258
3GeorgeHarrisonTrue194324
4RingoStarFalse19403
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "1 John Lennon True 1940 62\n", "2 Paul McCartney False 1942 58\n", "3 George Harrison True 1943 24\n", "4 Ringo Star False 1940 3" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 39, "id": "9be595a5-37ba-4784-80f1-776f287f466f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name JohnPaulGeorgeRingo\n", "Last Name LennonMcCartneyHarrisonStar\n", "dead 2\n", "year_born 7765\n", "no_of_songs 147\n", "dtype: object" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sum(axis=0)" ] }, { "cell_type": "code", "execution_count": 40, "id": "2d269f9c-9100-482a-9ae0-d8323cdea594", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/4l/3kysx_3j3vj4h8l26gg_1q8c0000gn/T/ipykernel_30190/1459321664.py:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.\n", " df.sum(axis=1)\n" ] }, { "data": { "text/plain": [ "id\n", "1 2003.0\n", "2 2000.0\n", "3 1968.0\n", "4 1943.0\n", "dtype: float64" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sum(axis=1)" ] }, { "cell_type": "markdown", "id": "8351ea53-f329-42b5-bd6a-2f7e85211458", "metadata": {}, "source": [ "### `Groupby`-Methode" ] }, { "cell_type": "markdown", "id": "a03a942d-5e70-4d5a-86e6-74940b56804b", "metadata": {}, "source": [ "Hadley Wickham 2011: The Split-Apply-Combine Strategy for Data Analysis, Journal of Statistical Software, 40(1)" ] }, { "cell_type": "markdown", "id": "a08ce865-0e7d-4aa6-a41b-98f29ba27ef1", "metadata": {}, "source": [ "![Alt-Text](../_img/split-apply-combine.svg)" ] }, { "cell_type": "code", "execution_count": 41, "id": "ea805e8a-6e5e-4dba-b585-8bbd49905906", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
1JohnLennonTrue194062
2PaulMcCartneyFalse194258
3GeorgeHarrisonTrue194324
4RingoStarFalse19403
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "1 John Lennon True 1940 62\n", "2 Paul McCartney False 1942 58\n", "3 George Harrison True 1943 24\n", "4 Ringo Star False 1940 3" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 42, "id": "4c6fb570-6362-4dfe-a545-017b50eb14f4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"dead\")" ] }, { "cell_type": "code", "execution_count": 43, "id": "b6e5ed58-1845-4008-af2e-34862aed860d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
year_bornno_of_songs
dead
False388261
True388386
\n", "
" ], "text/plain": [ " year_born no_of_songs\n", "dead \n", "False 3882 61\n", "True 3883 86" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"dead\").sum()" ] }, { "cell_type": "code", "execution_count": 44, "id": "e57cd5f5-57f3-4768-8bc8-143798e8028b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dead\n", "False 61\n", "True 86\n", "Name: no_of_songs, dtype: int64" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"dead\")[\"no_of_songs\"].sum()" ] }, { "cell_type": "code", "execution_count": 45, "id": "b36dbc33-dc88-4f56-99ba-4f224d7ee5c0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dead\n", "False 30.5\n", "True 43.0\n", "Name: no_of_songs, dtype: float64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"dead\")[\"no_of_songs\"].mean()" ] }, { "cell_type": "code", "execution_count": 46, "id": "6b705200-7177-4f2e-beae-21c546981058", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanmaxminsum
dead
False30.558361
True43.0622486
\n", "
" ], "text/plain": [ " mean max min sum\n", "dead \n", "False 30.5 58 3 61\n", "True 43.0 62 24 86" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"dead\")[\"no_of_songs\"].agg([\"mean\", \"max\", \"min\", \"sum\"])" ] }, { "cell_type": "markdown", "id": "f73a89e9-055c-40a0-8b34-50c918fa8710", "metadata": {}, "source": [ "### Familie von `apply/map`-Methoden" ] }, { "cell_type": "markdown", "id": "e637804c-fc63-4694-a6ad-e26d84f96630", "metadata": {}, "source": [ "- `apply` arbeitet zeilenweise (`axis=0`, Standard) / spaltenweise (`axis=1`) auf einem `DataFrame`\n", "- `applymap` arbeitet elementweise auf einem `DataFrame`\n", "- `map` arbeitet elementweise mit einer `Series`." ] }, { "cell_type": "code", "execution_count": 47, "id": "badad631-5af7-4fd9-bb34-4736f9b32536", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
1JohnLennonTrue194062
2PaulMcCartneyFalse194258
3GeorgeHarrisonTrue194324
4RingoStarFalse19403
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "1 John Lennon True 1940 62\n", "2 Paul McCartney False 1942 58\n", "3 George Harrison True 1943 24\n", "4 Ringo Star False 1940 3" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 48, "id": "73816330-65c9-4f56-af36-d43d4a7c77bf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name JohnPaulGeorgeRingo\n", "Last Name LennonMcCartneyHarrisonStar\n", "dtype: object" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# (axis=0, default)\n", "df[[\"Name\", \"Last Name\"]].apply(lambda x: x.sum())" ] }, { "cell_type": "code", "execution_count": 49, "id": "d2ae8d95-c193-4ad7-9993-de70518a3489", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id\n", "1 JohnLennon\n", "2 PaulMcCartney\n", "3 GeorgeHarrison\n", "4 RingoStar\n", "dtype: object" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# (axis=1)\n", "df[[\"Name\", \"Last Name\"]].apply(lambda x: x.sum(), axis=1)" ] }, { "cell_type": "markdown", "id": "5439ba74-cf3b-4304-a497-cba7e3c7427c", "metadata": {}, "source": [ "*... vielleicht ein nützlicherer Fall ...*" ] }, { "cell_type": "code", "execution_count": 50, "id": "3d6fbb93-f5c2-4fa8-b358-220cc1a7f2b5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id\n", "1 John Lennon\n", "2 Paul McCartney\n", "3 George Harrison\n", "4 Ringo Star\n", "dtype: object" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.apply(lambda x: \" \".join(x[[\"Name\", \"Last Name\"]]), axis=1)" ] }, { "cell_type": "markdown", "id": "bba7d7ba-4546-47e1-81af-2df44f35d97a", "metadata": {}, "source": [ "## Auswahl und Indizierung\n", "\n", "**Spaltenindex**" ] }, { "cell_type": "code", "execution_count": 51, "id": "1b367e2e-493e-45f5-b00b-61f4742d8b75", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id\n", "1 John\n", "2 Paul\n", "3 George\n", "4 Ringo\n", "Name: Name, dtype: object" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"Name\"]" ] }, { "cell_type": "code", "execution_count": 52, "id": "aea26347-8654-4630-ac39-656fa9e1f2a6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Name
id
1JohnLennon
2PaulMcCartney
3GeorgeHarrison
4RingoStar
\n", "
" ], "text/plain": [ " Name Last Name\n", "id \n", "1 John Lennon\n", "2 Paul McCartney\n", "3 George Harrison\n", "4 Ringo Star" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[[\"Name\", \"Last Name\"]]" ] }, { "cell_type": "code", "execution_count": 53, "id": "dfe567b7-6d99-4d5e-bc21-2fd4c8087e09", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id\n", "1 True\n", "2 False\n", "3 True\n", "4 False\n", "Name: dead, dtype: bool" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dead" ] }, { "cell_type": "markdown", "id": "05d47c1a-39e1-468f-a89f-34139e75a09b", "metadata": {}, "source": [ "**Zeilenindex**\n", "\n", "Neben dem `[ ]`-Operator verfügt Pandas über weitere Indizierungsoperatoren wie `.loc[]` und `.iloc[]`, um nur einige zu nennen.\n", "\n", " - `.loc[]` basiert hauptsächlich auf **Bezeichnungen**, kann aber auch mit einem booleschen Array verwendet werden.\n", " - `.iloc[]` basiert in erster Linie auf **Ganzzahlpositionen** (von $0$ bis Länge $-1$ der Achse), kann aber auch mit einem booleschen Array verwendet werden." ] }, { "cell_type": "code", "execution_count": 54, "id": "da98ccba-b31e-412a-90a3-cb8983e1db19", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
1JohnLennonTrue194062
2PaulMcCartneyFalse194258
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "1 John Lennon True 1940 62\n", "2 Paul McCartney False 1942 58" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(2)" ] }, { "cell_type": "code", "execution_count": 55, "id": "5c3f27f1-b6bd-4700-84e2-8d6ae417385f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name John\n", "Last Name Lennon\n", "dead True\n", "year_born 1940\n", "no_of_songs 62\n", "Name: 1, dtype: object" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[1]" ] }, { "cell_type": "code", "execution_count": 56, "id": "d7ff33c0-377c-4f69-a7b2-5e18d88857da", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name Paul\n", "Last Name McCartney\n", "dead False\n", "year_born 1942\n", "no_of_songs 58\n", "Name: 2, dtype: object" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[1]" ] }, { "cell_type": "markdown", "id": "20f425cf-bcc0-45ad-b060-a8535d2de8ce", "metadata": {}, "source": [ "**Zeilen- und Spaltenindizes**\n", "\n", "`df.loc[row, col]`" ] }, { "cell_type": "code", "execution_count": 57, "id": "19aabf0f-93ee-4da7-beda-3756ef175536", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Lennon'" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[1, \"Last Name\"]" ] }, { "cell_type": "code", "execution_count": 58, "id": "c191954f-8cd2-42a1-9db4-895bf9b59483", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Namedead
id
2PaulFalse
3GeorgeTrue
4RingoFalse
\n", "
" ], "text/plain": [ " Name dead\n", "id \n", "2 Paul False\n", "3 George True\n", "4 Ringo False" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[2:4, [\"Name\", \"dead\"]]" ] }, { "cell_type": "markdown", "id": "f9e5e850-d69e-437d-b672-8578f7c10ee9", "metadata": {}, "source": [ "**logisches Indizieren**" ] }, { "cell_type": "code", "execution_count": 59, "id": "949834d2-0bfe-45c6-b50c-4e27341242f5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
1JohnLennonTrue194062
2PaulMcCartneyFalse194258
3GeorgeHarrisonTrue194324
4RingoStarFalse19403
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "1 John Lennon True 1940 62\n", "2 Paul McCartney False 1942 58\n", "3 George Harrison True 1943 24\n", "4 Ringo Star False 1940 3" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 60, "id": "6263a98a-8835-42dc-83a9-81d41e0dd37b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id\n", "1 True\n", "2 True\n", "3 False\n", "4 False\n", "Name: no_of_songs, dtype: bool" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"no_of_songs\"] > 50" ] }, { "cell_type": "code", "execution_count": 61, "id": "a7bafae2-1b5c-45ea-9312-d3b826249a34", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
1JohnLennonTrue194062
2PaulMcCartneyFalse194258
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "1 John Lennon True 1940 62\n", "2 Paul McCartney False 1942 58" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df[\"no_of_songs\"] > 50]" ] }, { "cell_type": "code", "execution_count": 62, "id": "01eb5d16-423b-4772-941a-ccdf612cdbcc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
2PaulMcCartneyFalse194258
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "2 Paul McCartney False 1942 58" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[(df[\"no_of_songs\"] > 50) & (df[\"year_born\"] >= 1942)]" ] }, { "cell_type": "code", "execution_count": 63, "id": "267200de-277c-4e37-9f4f-7f4b5aaac69d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Last NameName
id
2McCartneyPaul
\n", "
" ], "text/plain": [ " Last Name Name\n", "id \n", "2 McCartney Paul" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[(df[\"no_of_songs\"] > 50) & (df[\"year_born\"] >= 1942), [\"Last Name\", \"Name\"]]" ] }, { "cell_type": "markdown", "id": "8f65bee6-0f8b-45de-aa07-90f7fb398bf8", "metadata": {}, "source": [ "## Manipulation von Spalten, Zeilen und bestimmten Einträgen" ] }, { "cell_type": "markdown", "id": "5e8d5a72-c87e-4ff8-bd97-b241ee084123", "metadata": {}, "source": [ "**Hinzufügen einer Zeile zum Datensatz**" ] }, { "cell_type": "code", "execution_count": 64, "id": "79bb7a6d-372c-4e24-8e10-7179544a6724", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/4l/3kysx_3j3vj4h8l26gg_1q8c0000gn/T/ipykernel_30190/1527406442.py:3: FutureWarning: Behavior when concatenating bool-dtype and numeric-dtype arrays is deprecated; in a future version these will cast to object dtype (instead of coercing bools to numeric values). To retain the old behavior, explicitly cast bool-dtype arrays to numeric dtype.\n", " df.loc[5] = [\"Mickey\", \"Mouse\", nan, 1928, nan]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songs
id
1JohnLennon1.0194062.0
2PaulMcCartney0.0194258.0
3GeorgeHarrison1.0194324.0
4RingoStar0.019403.0
5MickeyMouseNaN1928NaN
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs\n", "id \n", "1 John Lennon 1.0 1940 62.0\n", "2 Paul McCartney 0.0 1942 58.0\n", "3 George Harrison 1.0 1943 24.0\n", "4 Ringo Star 0.0 1940 3.0\n", "5 Mickey Mouse NaN 1928 NaN" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from numpy import nan\n", "\n", "df.loc[5] = [\"Mickey\", \"Mouse\", nan, 1928, nan]\n", "df" ] }, { "cell_type": "code", "execution_count": 65, "id": "bec59a2b-dc03-4eea-b0b3-41a890af8c92", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name object\n", "Last Name object\n", "dead float64\n", "year_born int64\n", "no_of_songs float64\n", "dtype: object" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "id": "80e663a3-8d38-4766-88d9-d1174f7cb666", "metadata": {}, "source": [ "Beachten Sie, dass sich die Variable `dead` geändert hat. Ihre Werte änderten sich von `True`/`False` zu `1.0`/`0.0`. Folglich änderte sich ihr `dtype` von `bool` zu `float64`." ] }, { "cell_type": "markdown", "id": "6c5d3f3b-0fc1-4274-a212-630ee4dd03b8", "metadata": {}, "source": [ "**Hinzufügen einer Spalte zum Datensatz**" ] }, { "cell_type": "code", "execution_count": 66, "id": "469ca23e-687e-4098-993d-1c242e7a09bb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2022, 7, 4, 9, 58, 38, 627755)" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from datetime import datetime\n", "\n", "datetime.today()" ] }, { "cell_type": "code", "execution_count": 67, "id": "fdf0999a-8bf0-4934-bf64-601f3d81013b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2022" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "now = datetime.today().year\n", "now" ] }, { "cell_type": "code", "execution_count": 68, "id": "6462a53e-cb01-48c7-a326-9572f82f1bde", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songsage
id
1JohnLennon1.0194062.082
2PaulMcCartney0.0194258.080
3GeorgeHarrison1.0194324.079
4RingoStar0.019403.082
5MickeyMouseNaN1928NaN94
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs age\n", "id \n", "1 John Lennon 1.0 1940 62.0 82\n", "2 Paul McCartney 0.0 1942 58.0 80\n", "3 George Harrison 1.0 1943 24.0 79\n", "4 Ringo Star 0.0 1940 3.0 82\n", "5 Mickey Mouse NaN 1928 NaN 94" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"age\"] = now - df.year_born\n", "df" ] }, { "cell_type": "markdown", "id": "ce63d74b-77f0-43e4-9317-dca691809d3f", "metadata": {}, "source": [ "**Einen bestimmten Eintrag ändern**" ] }, { "cell_type": "code", "execution_count": 69, "id": "b2a82510-0514-414d-90d3-2f6572dd3845", "metadata": {}, "outputs": [], "source": [ "df.loc[5, \"Name\"] = \"Minnie\"" ] }, { "cell_type": "code", "execution_count": 70, "id": "98029ca6-e64b-472a-87af-3181a63399f9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songsage
id
1JohnLennon1.0194062.082
2PaulMcCartney0.0194258.080
3GeorgeHarrison1.0194324.079
4RingoStar0.019403.082
5MinnieMouseNaN1928NaN94
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs age\n", "id \n", "1 John Lennon 1.0 1940 62.0 82\n", "2 Paul McCartney 0.0 1942 58.0 80\n", "3 George Harrison 1.0 1943 24.0 79\n", "4 Ringo Star 0.0 1940 3.0 82\n", "5 Minnie Mouse NaN 1928 NaN 94" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "194f146c-6dba-476e-ba73-ea57a92fe576", "metadata": {}, "source": [ "## Plotten" ] }, { "cell_type": "markdown", "id": "910d2b24-54e3-4af3-b8d7-d7330d82592a", "metadata": {}, "source": [ "Die Plotting-Funktionalität in Pandas basiert auf Matplotlib. Es ist recht praktisch, den Visualisierungsprozess mit der grundlegenden Pandas-Darstellung zu beginnen und zu matplotlib zu wechseln, um die Pandas-Visualisierung anzupassen." ] }, { "cell_type": "markdown", "id": "667b3a73-fd98-4671-b701-acbfafdf5eb1", "metadata": {}, "source": [ "### `plot`-Methoden" ] }, { "cell_type": "code", "execution_count": 71, "id": "0ab00d67-ac63-4c95-a4b3-3d209232f3be", "metadata": {}, "outputs": [], "source": [ "# dieser Aufruf bewirkt, dass die Zahlen unter den Codezellen eingezeichnet werden\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 72, "id": "11c74355-e63d-429f-81da-cb2aaf25973c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameLast Namedeadyear_bornno_of_songsage
id
1JohnLennon1.0194062.082
2PaulMcCartney0.0194258.080
3GeorgeHarrison1.0194324.079
4RingoStar0.019403.082
5MinnieMouseNaN1928NaN94
\n", "
" ], "text/plain": [ " Name Last Name dead year_born no_of_songs age\n", "id \n", "1 John Lennon 1.0 1940 62.0 82\n", "2 Paul McCartney 0.0 1942 58.0 80\n", "3 George Harrison 1.0 1943 24.0 79\n", "4 Ringo Star 0.0 1940 3.0 82\n", "5 Minnie Mouse NaN 1928 NaN 94" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 73, "id": "51567157-c3ff-4d08-a928-eb1e77ee0c56", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df[[\"no_of_songs\", \"age\"]].plot()" ] }, { "cell_type": "code", "execution_count": 74, "id": "4903e9c0-5193-460c-8cf0-ae8730db45c3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAD9CAYAAABA8iukAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAWgklEQVR4nO3dfbRcVXnH8e8DhGhDECmXCl1CIJRVWwwuDVVebK2lKtimoC21slSsNa4iShEoUARBBAVBDSJq2ipVdFlFSykgqNHKsrxoUEq1gEFsURQNwdBAFKI8/eOc2w7D3GSfyZyZuZnvZ627hrvPPvc+m8k9vzlv+0RmIklSia1GXYAkafYwNCRJxQwNSVIxQ0OSVMzQkCQV22bUBbRpp512ygULFoy6DEmaVW6++eb7MnOq17ItOjQWLFjAypUrR12GJM0qEfHfMy3z8JQkqZihIUkqZmhIkooZGpKkYkMPjYjYOiLeFBG3RcRDEfGfEXFMRMRG1tknIlZExIMRcXdEnLSx/pKkdozi6qnTgJOBs4AbgecC7wF+CTivu3NE7Ax8AfgmcATwTOBs4BfA+UOpWJIEDDk0ImIr4E3AOzPz7Lp5RURMASfQIzSA11PVuSQz1wNXR8Rc4JSIWJaZG4ZRuyRp+IenngR8BPhMV/sdwFREzOuxzsHAijowpl0O7Ajs10aRkqTehhoamfmTzDwmM7/RtegPge9n5kM9VtsbuLOr7a6OZZKkIRn5HeER8RdUexNvnKHL9sC6rrZ1Hcu6f95SYCnAbrvttlm1LTj5qs1av1//9Y4Xj+T3Shq8LW07MtJLbiPiSOADwGXARTN1A2Z6vOCj3Q2ZuTwzF2fm4qmpnlOnSJL6NLLQiIjjgI8CVwJH5szPnX0AmN/VNr9jmSRpSEYSGhFxDvAuqtD448x8ZCPdVwF7drVNf39HC+VJkmYwipv7jgVOAZYBR2Xmzzexygrg4K4rqw4D1gC3tFGjJKm3Yd+nsQtwLvAfwCeAZ3fd2L0S2B2Yyswb67aLgTdQ3Z/xTmBfqtA5eRN7KJKkARv21VMvBOYCTwdu6LF8iuqO8VdRnQAnM38YEQdT7ZlcBvwIODUzvRtckoZsqKGRmZcAl2yi21H1V+d6K4ED26hJklTOWW4lScUMDUlSMUNDklTM0JAkFTM0JEnFDA1JUjFDQ5JUzNCQJBUzNCRJxQwNSVIxQ0OSVMzQkCQVMzQkScUMDUlSMUNDklTM0JAkFTM0JEnFDA1JUjFDQ5JUzNCQJBUzNCRJxQwNSVIxQ0OSVMzQkCQVMzQkScUMDUlSMUNDklTM0JAkFTM0JEnFDA1JUjFDQ5JUzNCQJBUzNCRJxQwNSVIxQ0OSVMzQkCQVMzQkScUMDUlSsZGGRkQsiYh1Bf2ujIjs8bXdMOqUJFW2GdUvjogDgEuBKOi+CFgGfKKrff2g65IkzWzooRERc4FjgbOAh4BtN9F/B+CpwDWZeWPrBUqSZjSKw1OHAKcAJwLvLei/qH69tbWKJElFRhEaXwP2yMwLgSzovwh4GHhbRKyJiPUR8amIeEqrVUqSHmfooZGZ92Tm2garLALmAuuAw4Gjgf2BL9aHuh4jIpZGxMqIWLl69epBlCxJqs2GS27fBTw/M4/NzOsy8xLgpcDTgCO6O2fm8sxcnJmLp6amhlyqJG3Zxj40MvP2zPxSV9tNwFpg35EUJUkTauxDIyJeFhG/3dUWVIes7htNVZI0mUZ2n0YDfwlsHxHPysxH67ZDgScC142uLEmaPGO3pxERCyPiOR1N51Adhro0In4/Il4PfBT4dGZeP5IiJWlCjV1oAKcBN0x/k5nXAkuAvYDLgVOBDwGvGEVxkjTJRhoamXlGZm7X1XZUZkZX25WZ+VuZOS8zd83MEzLzp8OtVpI0jnsakqQxZWhIkooZGpKkYoaGJKmYoSFJKlYcGhHx5DYLkSSNvyZ7GvdGxGUR8UcRMae1iiRJY6tJaLwaeALwKeCHEXFx153bkqQtXHFoZObHM/MPgF2AM4BnANdHxJ0RcXpELGynREnSuGh8Ijwz12TmRZl5APB04B6qEPl2RFwXEYcPuEZJ0phoHBoRMT8iXhkRnwW+ThUcH6R6qt43gX+MiPMHW6YkaRwUT40eES8B/gx4cb3eNcCRwBWZ+Ujd7YqI+AWwFDhhwLVKkkasyfM0LgNuAU4BPp6ZMz2A++t4/4ckbZGahMaizPxmRERmJkBEPAHYOjMfmu6UmR8GPjzgOiVJY6DJHsG3I+Ji4MaOtoOA+yLivIjYerClSZLGTZPQOAd4OfCRjrabgTcBrwH+ZoB1SZLGUJPQ+FPguMx833RDZv4kM98PnAz8+aCLkySNlyahsQNw7wzL7gZ+ZbOrkSSNtSah8XXgdRERPZYtBb4xmJIkSeOqydVTbwE+B9wWEVcDPwamgEOAhcALBl+eJGmcFIdGZv5rRBxEdZ/Gy4EdgQeA64FXZ+ZN7ZQoSRoXTfY0yMyvUk0XIkmaQI1CIyK2AvYF5tHjfEhmXjeguiRJY6jJ3FP7A58EdgV6nQxPwBv8JGkL1mRPYxmwFjga+D7waBsFSZLGV5PQeDrwksz8bFvFSJLGW5P7NO4Gtm+rEEnS+GsSGm8B3hoRz2qrGEnSeGtyeOp44CnAVyPi58DDXcszM580sMokSWOnSWhc2VoVkqRZockd4We2WYgkafw1eixrRDwpIt4cEV+KiNsi4jcj4qSIeGFbBUqSxkdxaETEAuA/qB669D/A3sBcYBFwZUQc0kaBkqTx0fTmvh8Cvwf8DHgEIDOPjIg5wOmA93BI0hasyeGp5wPnZOaDVFOGdPogsM/AqpIkjaUmofEI8MQZlu3I4y/BlSRtYZqExlXA2yLi1zraMiJ2pHrGxrUDrUySNHaahMbxVHsT3wJurdv+HvgO8CTgxMGWJkkaN03u01hdTyHyKuB5wD1UT+77B+BDmbmulQolSWOj6ZP7fkZ10vuDg/jlEbEE+Fhmzt9Ev32ort56NnA/8D7gvMzsPiEvSWpRk4cwnb6pPpn51gY/7wDgUno/0Kmz387AF4BvAkcAzwTOBn4BnF/6+yRJm6/JnsZxPdrm1T9jLXAnsMnQiIi5wLHAWcBDwLabWOX19e9Ykpnrgavrn3FKRCzLzA3FI5AkbZbiE+GZ+eQeX9sCBwL3UX36L3EI1dVWJwLvLeh/MLCiDoxpl1Nd5rtfaf2SpM3XaO6pXjLzBqpnbby9cJWvAXtk5oU8/ibBXvam2ovpdFfHMknSkGx2aNQeAPYo6ZiZ92Tm2gY/e3ug+8qsdR3LHiMilkbEyohYuXr16ga/RpK0KU1OhD+zR/NWwK5U5ydu7bF8EIKZ90ge7W7IzOXAcoDFixd7dZUkDVCTE+Er6b3xDqp7Nv5kIBU93gNA9yW58zuWSZKGpElo/G6PtqSaJv3WzHzcp/4BWQXs2dU2/f0dLf1OSVIPTe4I/3KbhWzECuB1ETEvMx+q2w4D1gC3jKgmSZpITc5pXNjg52ZmHttHPUTEQmAqM2+smy4G3kB1f8Y7gX2pLtk9OTMf6ed3SJL60+Tw1NOo7sZ+MvBd4AdU90rsTXVe43sdfZPqBr5+nEY1v1UAZOYPI+JgqmlELgN+BJyamd4NLklD1uSS208D64H9M3NhZj43M38T+HXgNuCizNyj/uo+B9FTZp6Rmdt1tR2VmdHVtjIzD8zMJ2Tm7pl5boO6JUkD0iQ0TgVOzMybOhsz8ztUewdOjS5JW7gmobE91SSBvcwH5m5+OZKkcdYkND4PnBcRz+5sjIjnAe8APjW4siRJ46hJaBxDdU7j+ohYExG3R8T9VJfE3k7vWXAlSVuQJvdp3BsR+wJLgOcAO1DNbvuvmfm5dsqTJI2Tpk/u+znwmYi4GdiF6sFIkqQJ0WiW24h4aUSsopqa/CtU92h8LCIujYg5bRQoSRofxaEREUcAnwS+DPxpx7r/BBwObPJxsJKk2a3J4anTgWWZ+aaI2Hq6MTMviYgnU031cdqgC5QkjY8mh6f2Aq6eYdk3qM5xSJK2YE1C427goBmW/RaPnXtKkrQFanJ46iLg/IgIqj2OBH61fqLfqVRP75MkbcGa3KdxYX3u4iTgzVSz0P4zsAG40FlnJWnL1+R5Gk/IzDMj4j1UN/f9MtXjVm/KzPtaqk+SNEaaHJ76RkScmpmfAa5tqyBJ0vhqciJ8J2BdW4VIksZfkz2Nc4F3R8RpVA9d+nF3h8y8f1CFSZLGT5PQOIlqksLLNtJn640skyTNck1C44TWqpAkzQobDY2I+AFwaGbekpn/ULftCKzNzEeHUaAkaXxs6kT4U4Btp7+p55xaDTyjxZokSWOq0dTotRh4FZKkWaGf0JAkTShDQ5JUrCQ0srBNkrSFK7nk9oKIWFv/9/T5jPdExANd/TIz/2hglUmSxs6mQuM6qr2K+R1tX65f5z++uyRpS7bR0MjM5w2pDknSLOCJcElSMUNDklTM0JAkFTM0JEnFDA1JUjFDQ5JUzNCQJBUzNCRJxQwNSVIxQ0OSVMzQkCQVG0loRMRrI2JVRPw0Im6IiP030f/KiMgeX9sNq2ZJ0ghCIyJeCXwAuBR4KbAWuDYi9tjIaouAZcD+XV/rWy1WkvQYJc/TGJiICOCtwPLMPLNu+zxwB3Ac8MYe6+wAPBW4JjNvHF61kqRuw97T2AvYHbhiuiEzNwBXAS+aYZ1F9eut7ZYmSdqUYYfG3vXrnV3tdwELI2LrHussAh4G3hYRayJifUR8KiKe0mahkqTHG3ZobF+/rutqX1fXMq/HOouAuXWfw4Gjqc5nfDEi5nZ3joilEbEyIlauXr16YIVLkoYfGtPPGM8Z2h/tsc67gOdn5rGZeV1mXkJ1Av1pwBHdnTNzeWYuzszFU1NTAypbkgTDD40H6tfu54tvRxUYD3WvkJm3Z+aXutpuorrqat8WapQkzWDYobGqft2zq31P4I7M7N4DISJeFhG/3dUWVIes7mulSklST6MIje8Bh003RMQc4MXAihnW+UtgWUR01noo8ETgunbKlCT1MtT7NDIzI+IdwEUR8RPg34BjgJ2AdwNExEJgquOejHOAzwKXRsSHqa7AOgv4dGZeP8z6JWnSDf2O8My8GDgReAVwGbAD8MLMvKvuchpwQ0f/a4ElVPd4XA6cCnyoXl+SNERD3dOYlpkXABfMsOwo4KiutiuBK1svTJK0Uc5yK0kqZmhIkooZGpKkYoaGJKmYoSFJKmZoSJKKGRqSpGKGhiSpmKEhSSpmaEiSihkakqRihoYkqZihIUkqZmhIkooZGpKkYoaGJKmYoSFJKmZoSJKKGRqSpGKGhiSpmKEhSSpmaEiSihkakqRihoYkqZihIUkqZmhIkooZGpKkYoaGJKmYoSFJKmZoSJKKGRqSpGKGhiSpmKEhSSpmaEiSihkakqRihoYkqZihIUkqZmhIkooZGpKkYiMJjYh4bUSsioifRsQNEbH/JvrvExErIuLBiLg7Ik6KiBhWvZKkytBDIyJeCXwAuBR4KbAWuDYi9pih/87AF4AEjgCWA2cDxw+jXknS/9tmmL+s3jt4K7A8M8+s2z4P3AEcB7yxx2qvp6pzSWauB66OiLnAKRGxLDM3DKd6SdKw9zT2AnYHrphuqDf6VwEvmmGdg4EVdWBMuxzYEdivnTIlSb0MOzT2rl/v7Gq/C1gYEVvPsE6v/p0/T5I0BEM9PAVsX7+u62pfRxVg84D/6bFOr/6dP+//RMRSYGn97YMRcUff1cJOwH2bsX5f4txh/8bHGMmYR2jSxguOeSLEuZs15t1nWjDs0Ji+4ilnaH90hnW6+097XP/MXE51snyzRcTKzFw8iJ81W0zamCdtvOCYJ0VbYx724akH6tf5Xe3bUQXAQzOs091/fscySdKQDDs0VtWve3a17wnckZm99ihWzdAfqquuJElDMorQ+B5w2HRDRMwBXgysmGGdFcDBETGvo+0wYA1wSxtFdhjIYa5ZZtLGPGnjBcc8KVoZc/T+cN+eiDgauAh4O/BvwDHAQcAzMvOuiFgITGXmjXX/XYDbgH8H3gnsC5wJnJyZ5w+1eEmacEMPDYCIOB44luqKhluA4zPzhnrZJcCrMjM6+i8GlgHPAn4EXJyZo73GSJIm0EhCQ5I0O03sLLeTOGliH2M+ICK+FBFrI+IHEfGRiPiVYdU7CE3H3LXuGREx6z5V9fE+T9Xv7f31e31FRHRffDLW+vy3/ZWIWBcRd0XEW+rzq7NORCyJiO572Xr1G8w2LDMn7gt4JfAL4C3AocBnqW4q3GOG/jsD91JNnHgo8Gbg58AJox5Li2N+GvBTqilfDgFeDnyH6nDinFGPp40xd627D/Bw9Scy+rG0+D7Pqd/T26kmED0M+BbVlYnbjno8LY15IfAgcA3wAuANwHrg/FGPpY+xH1CP9cFN9BvYNmzkgx7B/+QA/gt4f0fbHKqpSS6cYZ0zqe6s/KWOtrOoruAa+w1on2N+X718TkfbflQ3Wh466jG1MeaOflsDNwHfn02h0ef7/Jp6g7lbR9szgB8Azxr1mFoa80lUH4jmdbSdU298Y9RjKhz3XOCvqT7Y3F8QGgPbhk3i4alJnDSxnzF/C7ggHzuL8PR9MT2nsR8z/Yx52nFUU9S8t7Xq2tHPmA8HrsnMuzvWuSUzd83Mm9ssdkD6GfNcYANVcExbQ3WT8dx2yhy4Q4BTgBMp+3c6sG3YJIbGJE6a2HjMmXlxZr6vq/kP69fbB1xfG/p5n4mIvYAzgNdSfYqbTfoZ8yLg9vqY/r0R8XBEXBURu7Va6eD0M+aPUR3OentE7BgR+wF/BfxTZv6stUoH62tUh98uZOZpljoNbBs2iaFRMmlir3WKJ00cQ/2M+TEi4qnA+cBK4IsDra4djcdcnxT8O+CjmfmVdstrRT/v8xTwaqpP5a8BXgH8BnBVRAx7brp+NB5zZn4HOKH+WgN8Ffgx1f+HWSEz78nMtQ1WGdg2bDb8oxi01idNHEP9jPn/O1WBsYLqj/BlWR8QHXP9jPl1VIc7lrRVVMv6GfMcYFvgkOmNUETcRfVJ9iXAJwdf5kA1HnNE/AXwt1R3TP8jsCvVw+GuioiDM3O27WGWGNg2bBL3NCZx0sR+xgxUl+kB11N9Gvn9+lPabNBozHUwnkd10+n6+lP2VvWybSJiNvyt9PM+Pwjc1PmpNTNXUj2G+emDL3Hg+hnzycDVmfm6zPxiZl5KdUXRQcCRrVU6WgPbhs2GP4RBm8RJE/sZMxHxbOA6quO/z83MW9srceCajvn3qP6ILqM6SboBuKBetgE4vaU6B6mf9/lOqj2NbttQdqx81PoZ81OBGzsbMvN2qkNVvzHwCsfDwLZhkxoas2nSxEFoPOaIWEB1vfuPgAMyc1WvfmOs6Zj/heoqks6vd9XL9mN2THjXz7/tzwEHRsSuHev8DtUn9etbq3Rw+hnzt4EDOxvqCyB+GfhuK1WO3uC2YaO+3nhE1zgfTbXrejbVbunVVNdo71kvXwg8p6P/LlS7618G/gA4ldl3c1/TMf9zPcY/A57T9bXLqMfTxph7rP9XzKL7NPp8n6eobvr693oj8nLgHqrJRLca9XhaGvOfUO1F/R3VHuaRVOHzXWD+qMfTx/jPoOs+jTa3YSMf8Aj/Rx8P3E11Y9P1wP4dyy7p3lgAi+s/pJ8B/w2cNOoxtDVmqpOjG+o/rF5fsyksG73PXevOutDoZ8z1BuZyqqtp7q/77DDqcbQ85pcAX6e6rPpu4O+BnUc9jj7H3is0WtuGOWGhJKnYJJ7TkCT1ydCQJBUzNCRJxQwNSVIxQ0OSVMzQkCQVMzQkScUMDUlSsf8F3zp24Y8uSWwAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df[\"dead\"].plot.hist()" ] }, { "cell_type": "code", "execution_count": 75, "id": "88bb0204-bfc6-472c-a1cd-bd72728bd05a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEKCAYAAADticXcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAQtUlEQVR4nO3de7BdZX3G8e8DgdQ2EEHS1mkNEB2KdCxiY8HiDCh2oFIttuJovUCLpq2ijBoEE8rFQcRLHQUtGsHxihSVdqJYLKaU0YLaUJ1xlAGEAt5owQJB7pBf/1grk+POuexDds4O7/l+Zvbss993rbV/Z519nr32uy47VYUkqQ07jLsASdLoGOqS1BBDXZIaYqhLUkMMdUlqyIJxF7DHHnvUXnvtNe4yJOlx5ZprrrmjqpYMto891Pfaay/Wr18/7jIk6XElyS2TtTv8IkkNMdQlqSGGuiQ1xFCXpIYY6pLUEENdkhpiqEtSQwx1SWqIoS5JDRn7GaWSNJf2OvnScZfAzWcfuc2W7Za6JDXEUJekhhjqktQQQ12SGmKoS1JDDHVJaoihLkkNMdQlqSGGuiQ1xFCXpIYY6pLUEENdkhpiqEtSQwx1SWqIoS5JDTHUJakhhrokNcRQl6SGGOqS1BBDXZIaYqhLUkMMdUlqiKEuSQ0x1CWpIUOFepIdk7wtyQ+T/CLJt5I8f0J/kqxOcmuS+5JcnmTfbVe2JGkyC4ac7kTgTOBU4NvAXwGXJTmwqr7Tt58MnATcDJwCrEuyX1XdPfKqB+x18qXb+ilmdPPZR467BGlK/o/MH8MOvxwDXFhVZ1XV14BXA7cBxyXZBVgJnF5V51TVWuBwYBfguG1RtCRpcsOG+kJgw6YHVfUocDewO3AQsAhYO6H/TuBK4IiRVSpJmtGwof5h4NVJDkuyOMkJwO8CFwH79NPcODDPTRP6JElzYNgx9fOA5wNfm9B2SlWtTfJ24MGqemhgnnuAXUdQoyRpSDOGepIAXwX2A14PXAu8ADgtyV1AgJpsVmDjFMtcAawAWLp06WOpW1Nwh5g0vw2zpX4w8FzgZVX1+b7t35MsAN4DrAIWJtmpqh6eMN8iunH3LVTVGmANwPLlyyd7Q5AkPQbDhPpT+vtvDrR/g+4QxqLbKt8buH5C/zLguq0tUHqs/NSi+WiYHaWbgvrggfYDgUeAS4AHgKM2dSTZDTgEWLf1JUqShjXjlnpVXZPkUuAfkuxON6Z+KN1W+ger6sdJzgXOTLKR7k1gNd0hkOdvs8olSVsY9uiXo+nOKF1Nd2z6DcCbgI/2/avodoqupBtLvwo4Zi7OJpUkbTZUqFfV/cBb+9tk/Y/QXSbg5NGVJkmaLa/SKEkNMdQlqSGGuiQ1xFCXpIYY6pLUEENdkhpiqEtSQwx1SWqIoS5JDTHUJakhhrokNcRQl6SGGOqS1BBDXZIaYqhLUkMMdUlqiKEuSQ0x1CWpIYa6JDXEUJekhhjqktQQQ12SGmKoS1JDDHVJaoihLkkNMdQlqSGGuiQ1xFCXpIYY6pLUEENdkhpiqEtSQ4YO9SSHJflWkvuT3JLkjCQ79n1JsjrJrUnuS3J5kn23XdmSpMkMFepJDgb+BbgWOBL4EHAScEo/yan9z+8DXg4sBtYlWTzqgiVJU1sw5HRnA/9aVcf2j/8tyZOA5yV5P7ASOL2qzgFI8nXgFuA44P2jLVmSNJUZt9STLAEOBtZMbK+qk6vqUOAgYBGwdkLfncCVwBGjLFaSNL1hhl+eAQS4N8mXkjyQ5H+TnJ5kB2CffrobB+a7aUKfJGkODDP8sqS//xRwId1wyiF0Y+j3070xPFhVDw3Mdw+w64jqlCQNYZhQ36m//2pVndj/fEWSPeiC/WygJpkvwMbJFphkBbACYOnSpbMqWJI0tWGGX37R31820H453Vj6XcDCJDsN9C8C7p5sgVW1pqqWV9XyJUuWTDaJJOkxGCbUf9jf7zzQvinEH6bbKt97oH8ZcN1jL02SNFvDhPoPgJ8ARw+0Hwn8FLgIeAA4alNHkt3oxt3XjaRKSdJQZhxTr6qNSVYBn0xyHvAF4AXAMcDfVtWGJOcCZybZCFwPrAY2AOdvu9IlSYOGOvmoqj6V5GFgFfCXwI+Av6mqTceur6LbKbqSbiz9KuCYqpp0TF2StG0Me0YpVfU54HNT9D0CnNzfJElj4lUaJakhhrokNcRQl6SGGOqS1BBDXZIaYqhLUkMMdUlqiKEuSQ0x1CWpIYa6JDXEUJekhhjqktQQQ12SGmKoS1JDDHVJaoihLkkNMdQlqSGGuiQ1xFCXpIYY6pLUEENdkhpiqEtSQwx1SWqIoS5JDTHUJakhhrokNcRQl6SGGOqS1BBDXZIaYqhLUkMMdUlqiKEuSQ2ZVagnWZjk2iSfmNCWJKuT3JrkviSXJ9l35JVKkmY02y3104DBwD4VOAV4H/ByYDGwLsnirS9PkjQbQ4d6kgOANwF3TGjbBVgJnF5V51TVWuBwYBfguBHXKkmawVChnmQB8HHgvcBPJnQdBCwC1m5qqKo7gSuBI0ZXpiRpGMNuqZ8E7Ay8a6B9n/7+xoH2myb0SZLmyIKZJuh3eq4GDquqh5JM7N4VeLCqHhqY7Z6+b6plrgBWACxdunS2NUuSpjDtlnqSHYALgAuq6urJJgFqivaNUy23qtZU1fKqWr5kyZLZ1CtJmsZMW+pvBPYE/qQfV98k/eO7gYVJdqqqhyf0L+r7JElzaKYx9ZcAvwX8H/Bwf9sfeM2ExwH2HphvGXDdSCuVJM1oplD/a+DZA7frgS/3P18EPAActWmGJLsBhwDrRl+uJGk60w6/VNUWW9tJ7gd+XlXr+8fnAmcm2UgX+KuBDcD5oy9XkjSdGY9+GcIqup2iK+nG0q8Cjqkqx9QlaY7NOtSr6pkDjx8BTu5vkqQx8iqNktQQQ12SGmKoS1JDDHVJaoihLkkNMdQlqSGGuiQ1xFCXpIYY6pLUEENdkhpiqEtSQwx1SWqIoS5JDTHUJakhhrokNcRQl6SGGOqS1BBDXZIaYqhLUkMMdUlqiKEuSQ0x1CWpIYa6JDXEUJekhhjqktQQQ12SGmKoS1JDDHVJaoihLkkNMdQlqSGGuiQ1ZKhQT7JjkrckuTbJvUl+kOT4JOn7k2R1kluT3Jfk8iT7btvSJUmDht1S/zvgLOAzwIuBi4EPACf2/acCpwDvA14OLAbWJVk8ymIlSdNbMNMESXYA3gK8t6re2TevS7IEWJnkPGAlcHpVndPP83XgFuA44P3bpHJJ0haG2VJfDHwKuGSg/TpgCfB8YBGwdlNHVd0JXAkcMZoyJUnDmHFLvQ/o4yfpehHwY+C3+8c3DvTfBPzpVlUnSZqVx3T0S5LXAi8A3gPsCjxYVQ8NTHZP3zfZ/CuSrE+y/vbbb38sJUiSJjHrUE/ySuAjwBeADwEBarJJgY2TLaOq1lTV8qpavmTJktmWIEmawqxCPcmbgU8DXwZeWVUF3A0sTLLTwOSL+j5J0hwZOtSTnEV3JMungZdOGG65gW6rfO+BWZbR7UyVJM2RYU8+OgF4O/BB4NiqemRC91XAA8BRE6bfDTgEWDeySiVJMxrmOPUnA+8GvgdcBBzYn0i6yXrgXODMJBuB64HVwAbg/FEXLEma2oyhDhwOLASeAVw9Sf8SYBXdTtGVdGPpVwHHVJVj6pI0h4Y5Tv0TwCeGWNbJ/U2SNCZepVGSGmKoS1JDDHVJaoihLkkNMdQlqSGGuiQ1xFCXpIYY6pLUEENdkhpiqEtSQwx1SWqIoS5JDTHUJakhhrokNcRQl6SGGOqS1BBDXZIaYqhLUkMMdUlqiKEuSQ0x1CWpIYa6JDXEUJekhhjqktQQQ12SGmKoS1JDDHVJaoihLkkNMdQlqSGGuiQ1xFCXpIaMNNSTvC7JDUnuT3J1kueMcvmSpOmNLNSTvAb4CPAZ4M+Bu4CvJtl7VM8hSZreSEI9SYB3AGuq6oyq+grwYuAO4M2jeA5J0sxGtaX+NGBPYO2mhqp6GLgUOGJEzyFJmsGoQn2f/v6HA+03AU9NsuOInkeSNI1U1dYvJHkFcCHw5Kq6bUL7a4GPAYurasOE9hXAiv7h7wDXbXURW2cPuqEiuS4mcl1s5rrYbHtZF3tW1ZLBxgUjWnj6+8F3iE3tGyc2VtUaYM2InnurJVlfVcvHXcf2wHWxmetiM9fFZtv7uhjV8Mvd/f0uA+2L6AL93hE9jyRpGqMK9Rv6+2UD7cuA62oUYzySpBmNMtR/BBy1qSHJTsCRwLoRPce2tN0MBW0HXBebuS42c11stl2vi5HsKAVI8nrgQ8C7gP8AjgeeCzyzqm4ayZNIkqY1slAHSPJW4AS6vcPfBd5aVVeP7AkkSdMaaahLksbLqzRKUkMM9XksyROm6dshye5zWc/2JMmOSX593HVsL5IsTTKq81oel5LsnGS/7f11YajPQ0lWJrkN+EWSW/qd3IOeDdw+x6XNuSRPSbIqyRlJnta3nQHcA/wsyc/6K5DOW/1lPv4beMa4a5kLSS5O8tSBtlOAnwPfo3tdXJvkyLEUOAPH1OeZJG8APgB8lO7yDC8GDgMuBl5VVY/00x0IXFVVzV63J8kBwBXATnRnQ28E3g2cBpwDfAc4HHgl8NKq+qcxlbrNJfn4dN3AMcCX6IKtquq4OSlsDJJsBA6qqm/3j0+kO6pvDXAZ8ATgpcBLgD+rqrVTLWscDPV5Jsn3gX+sqndMaHstcB7dVTaPrqqN8yTU1wH3AUcDjwAfpwvwMwbWz3nAH1TV74+l0DmQ5Fq66zDdDvx0kkl+j+58lPvpQv1Zc1jenJok1H8EXFhVJw1M9zHgWdvb62JeDb8k2TCL290zL/FxaU/g6xMbqup84Fi6k8fOn/uSxuZA4INV9UD/CeU0uq3SKwam+yKw31wXN8f2B84GfhW4BHh2VR1QVQfQDcUF+Iu+rdlAn8IewFcmab8YePoc1zKj+bbj41XAp4GH6U6Umo8fU26lC7NfCq6q+my/A+jvk9xJ94Jt3R388qUtbgZOB+4cmG4Z8LO5KWk8quohYFWSzwMXAC9Lcly/tTof/08mXsfqv4CnTDLN09keXxdVNa9uwB8CDwBvGHctY/r930I35HA6sP8k/WfRjS1/F3h03PVu43VxJt3XLp5Ad3nowf5foxtLvgN417jrncP1siOwmu5CfB8AFveviWeNu7Y5+v03Ao/SDUNdRvfJ9n+AZX3/k/rXzAbgzHHXu0X94y5gTH+0t9Ht8Nl13LWM4XffoQ/0u4FzppjmBLqx09ZDfSfgw/2b/P6T9B/b/4NfDPzKuOsdw/rZF/gG3ae7R+dRqO8OPK//P7gA+M/+De7Qvv91/evis8DCcdc7eJuXO0qT7Az8MfDtqtr+Pj7Ngf57ZRdX1V1T9P8mcHhVfXJOCxuDJLsC91bVowPtvwE8sarG/SUuY9O/Tt5Et79lRVXdMP0cberXQ6o7iODJdGF+85jLmtS8DHVJatW8OvpFklpnqEtSQwx1CUhSSVZO039oP812+92UEsy/49SlqTwHuGXcRUhby1CXgKr65rhrkEbB4ReJLYdfkrwwyXeS3J/kKmDvMZYnDc1QlwYkOYju4mbX0V2J72t0FzyTtnsOv0hbehtwPfCK6k7kuCzJE4E3jrUqaQhuqUtbOhi4rH75zLwvjqsYaTYMdWlLu9FdxGui28ZRiDRbhrq0pZ8Dg99D+aRxFCLNlqEubekK4EUDX7T8wnEVI82GO0qlLb0TWA/8c5IP032V2/HjLUkajlvq0oCq+j7wR3RfY3YJ8ArgDWMtShqSl96VpIa4pS5JDTHUJakhhrokNcRQl6SGGOqS1BBDXZIaYqhLUkMMdUlqyP8Dh/CZEzKOORsAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df[\"age\"].plot.bar()" ] }, { "cell_type": "markdown", "id": "aa15d8e0-464e-4d2f-b1b2-a3214ac3cbef", "metadata": { "tags": [] }, "source": [ "> __...einige Anmerkungen zum Plotten mit Python__" ] }, { "cell_type": "markdown", "id": "2ece07df-292c-4614-b170-da13ca473120", "metadata": {}, "source": [ "Das Plotten ist ein wesentlicher Bestandteil der Datenanalyse. Die Welt der Python-Visualisierung kann jedoch ein frustrierender Ort sein. Es gibt viele verschiedene Optionen, und die Auswahl der richtigen ist eine Herausforderung. (Wenn Sie sich trauen, werfen Sie einen Blick auf die Python-Visualisierungslandschaft).\n", "\n", "matplotlib ist wahrscheinlich die bekannteste Python-Bibliothek für 2D-Diagramme. Mit ihr lassen sich plattformübergreifend Zahlen in Publikationsqualität in einer Vielzahl von Formaten und interaktiven Umgebungen erstellen. Allerdings ist matplotlib aufgrund der komplexen Syntax und der Existenz zweier Schnittstellen, einer **MATLAB-ähnlichen zustandsbasierten Schnittstelle** und einer **objektorientierten Schnittstelle**, schwer zugänglich. Daher gibt **es immer mehr als eine Möglichkeit, eine Visualisierung zu erstellen**. Eine weitere Quelle der Verwirrung ist die Tatsache, dass matplotlib gut in andere Python-Bibliotheken integriert ist, wie z. B. pandas, seaborn, xarray und andere. Daher gibt es Verwirrung darüber, wann man die reine matplotlib oder ein Tool, das auf matplotlib aufbaut, verwenden sollte.\n", "\n", "Wir importieren die `matplotlib`-Bibliothek und das `pyplot`-Modul von matplotlib mit den folgenden kanonischen Befehlen\n", "\n", "`import matplotlib as mpl`\n", "`import matplotlib.pyplot as plt`\n", "\n", "In Bezug auf die Terminologie von matplotlib ist es wichtig zu verstehen, dass die `Figure` das endgültige Bild ist, das eine oder mehrere `Axes` enthalten kann, und dass die `Axes` eine individuelle Darstellung repräsentieren.\n", "\n", "Um ein `Figure`-Objekt zu erstellen, rufen wir\n", "\n", "`fig = plt.figure()` auf.\n", "\n", "Ein bequemerer Weg, ein `Figure`-Objekt und ein `Axes`-Objekt auf einmal zu erstellen, ist jedoch der Aufruf\n", "\n", "`fig, ax = plt.subplots()` \n", "\n", "Dann können wir das `Axes`-Objekt verwenden, um Daten für die Darstellung hinzuzufügen." ] }, { "cell_type": "code", "execution_count": 76, "id": "88e5d523-8a7d-4424-ad8c-ad4f0bb044a1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'The Beatles and ... something else')" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "# Erzeuge Figure und Axes Objekt\n", "fig, ax = plt.subplots(figsize=(10, 5))\n", "\n", "# plot die Daten und referenzier das Axes Objekt\n", "df[\"age\"].plot.bar(ax=ax)\n", "\n", "# Passe das Axes Objekt an\n", "ax.set_xticklabels(df[\"Name\"], rotation=0)\n", "ax.set_xlabel(\"\")\n", "ax.set_ylabel(\"Age\", size=14)\n", "ax.set_title(\"The Beatles and ... something else\", size=18)" ] }, { "cell_type": "markdown", "id": "60949b63-0419-405b-8c47-2fb26746ef19", "metadata": {}, "source": [ "Beachten Sie, dass wir nur an der Oberfläche der Plot-Möglichkeiten mit Pandas kratzen. In der Online-Dokumentation von Pandas (hier) finden Sie einen umfassenden Überblick." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" } }, "nbformat": 4, "nbformat_minor": 5 }