{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(sec:groupby)=\n",
    "# 데이터 그룹화"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "PREVIOUS_MAX_ROWS = pd.options.display.max_rows\n",
    "pd.options.display.max_columns = 20\n",
    "pd.options.display.max_rows = 20\n",
    "pd.options.display.max_colwidth = 80"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(12345)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.rc(\"figure\", figsize=(10, 6))\n",
    "np.set_printoptions(precision=4, suppress=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 시리즈와 데이터프레임 그룹화"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`groupby()` 메서드는 지정된 키를 기준으로 데이터를 그룹으로 쪼갠 다음에\n",
    "특정 집계 함수를 그룹별로 적용해서 그룹별로 하나의 값을 계산한다.\n",
    "그리고 생성된 그룹과 그룹별로 계산된 값을 조합해서 새로운 데이터프레임 또는 시리즈를 생성한다.\n",
    "이 과정을 간략하게 **쪼개고, 적용하고, 조합하기**<font size='2'>split-apply-combine</font>라 부른다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::{admonition} 집계 함수\n",
    ":class: info\n",
    "\n",
    "집계 함수는 리스트 또는 1차원 어레이가 주어졌을 때 항목들을 대상으로 하나의 값을 계산하는 함수를 가리킨다.\n",
    "예를 들어, 평균값을 계산하거나, 항목의 개수 등을 계산하는 함수 등이 대표적인 집계 함수다.\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 쪼개고, 적용하고, 조합하기"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "다음 데이터프레임을 이용하여 쪼개고, 적용하고, 조합하기 과정을 설명한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Key</th>\n",
       "      <th>Data</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>B</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>B</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>C</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>A</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>B</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>C</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Key  Data\n",
       "0   A     0\n",
       "1   B     5\n",
       "2   C    10\n",
       "3   A     5\n",
       "4   B    10\n",
       "5   C    15\n",
       "6   A    10\n",
       "7   B    15\n",
       "8   C    20"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({\"Key\" : [\"A\", \"B\", \"C\", \"A\", \"B\", \"C\", \"A\", \"B\", \"C\"],\n",
    "                   \"Data\" : [0, 5, 10, 5, 10, 15, 10, 15, 20]})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "아래 코드는 `Key` 열에 속한 `\"A\"`, `\"B\"`, `\"C\"` 세 값을 기준으로\n",
    "3 개의 그룹으로 쪼갠 후 각 그룹에 속합 값들을 열별로 더한 결과를 \n",
    "이용하애 새로운 데이터프레임을 생성한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Data</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Key</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Data\n",
       "Key      \n",
       "A      15\n",
       "B      30\n",
       "C      45"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"Key\").sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " 위 코드의 실행 과정에 포함된\n",
    "쪼개고, 적용하고, 조합하기를 잘 묘사하면 다음과 같다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div align=\"center\" border=\"1px\"><img src=\"https://raw.githubusercontent.com/codingalzi/datapy/master/jupyter-book/images/groupby01.png\" style=\"width:50%;\"></div>\n",
    "\n",
    "<p><div style=\"text-align: center\">&lt;그림 출처: <a href=\"https://wesmckinney.com/book/numpy-basics.html\">Python for Data Analysis</a>&gt;</div></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "다음 데이터프레임을 활용한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.204708</td>\n",
       "      <td>0.281746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>two</td>\n",
       "      <td>0.478943</td>\n",
       "      <td>0.769023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>None</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.519439</td>\n",
       "      <td>1.246435</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.555730</td>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>1.965781</td>\n",
       "      <td>-1.296221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>a</td>\n",
       "      <td>None</td>\n",
       "      <td>1.393406</td>\n",
       "      <td>0.274992</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>None</td>\n",
       "      <td>one</td>\n",
       "      <td>0.092908</td>\n",
       "      <td>0.228913</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   key1  key2     data1     data2\n",
       "0     a   one -0.204708  0.281746\n",
       "1     a   two  0.478943  0.769023\n",
       "2  None   one -0.519439  1.246435\n",
       "3     b   two -0.555730  1.007189\n",
       "4     b   one  1.965781 -1.296221\n",
       "5     a  None  1.393406  0.274992\n",
       "6  None   one  0.092908  0.228913"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({\"key1\" : [\"a\", \"a\", None, \"b\", \"b\", \"a\", None],\n",
    "                   \"key2\" : pd.Series([\"one\", \"two\", \"one\", \"two\", \"one\", None, \"one\"]),\n",
    "                   \"data1\" : np.random.standard_normal(7),\n",
    "                   \"data2\" : np.random.standard_normal(7)})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**시리즈 그룹화**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "아래 시리즈를 이용하여 먼저 시리즈 그룹화를 설명한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   -0.204708\n",
       "1    0.478943\n",
       "2   -0.519439\n",
       "3   -0.555730\n",
       "4    1.965781\n",
       "5    1.393406\n",
       "6    0.092908\n",
       "Name: data1, dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = df[\"data1\"]\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "아래 코드는 `df[\"key1\"]` 시리즈의 항목을 기준으로 그룹화를 진행하며, 동일한 행에 위치한 값들을 그룹화 한다.\n",
    "하지만 `groupby()` 메서드는 `GroupBy` 객체를 생성하며 \n",
    "실제로 그룹화를 진행하지는 않고 그룹화에 필요한 정보만 계산해 놓는다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.SeriesGroupBy object at 0x000001D3F460F5E0>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "k1 = df[\"key1\"]\n",
    "\n",
    "s.groupby(k1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "실제 그룹화는 집계 메서드를 실행할 때 이뤄진다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `valut_counts()` 메서드: 그룹별 항목 및 항목 수 확인"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1  data1    \n",
       "a     -0.204708    1\n",
       "       0.478943    1\n",
       "       1.393406    1\n",
       "b     -0.555730    1\n",
       "       1.965781    1\n",
       "Name: data1, dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.groupby(k1).value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `mean()` 메서드: 그룹별 평균값 계산"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1\n",
       "a    0.555881\n",
       "b    0.705025\n",
       "Name: data1, dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped = s.groupby(k1).mean()\n",
    "grouped"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "두 개 이상의 키를 기준으로 그룹화 진행 가능.\n",
    "아래 코드는 `df[\"key1\"]`와 `df[\"key2\"]` 열에 속한 값들을 기준으로 그룹화 진행."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1  key2\n",
       "a     one    -0.204708\n",
       "      two     0.478943\n",
       "b     one     1.965781\n",
       "      two    -0.555730\n",
       "Name: data1, dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "k2 = df[\"key2\"]\n",
    "\n",
    "means = s.groupby([k1, k2]).mean()\n",
    "means"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "여러 개의 키를 이용하여 그룹화를 한 경우 언스택을 활용하면 보다 보기 좋은 데이터프레임을 생성한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>key2</th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>-0.204708</td>\n",
       "      <td>0.478943</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>1.965781</td>\n",
       "      <td>-0.555730</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "key2       one       two\n",
       "key1                    \n",
       "a    -0.204708  0.478943\n",
       "b     1.965781 -0.555730"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "means.unstack()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "어레이 또는 리스트를 키로 활용하는 것도 가능하다.\n",
    "시리즈와 마찬가지로 각 값의 인덱스에 맞춰 그룹화가 이뤄진다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "states = np.array([\"OH\", \"CA\", \"CA\", \"OH\", \"OH\", \"CA\", \"OH\"])\n",
    "years = [2005, 2005, 2006, 2005, 2006, 2005, 2006]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CA  2005    0.936175\n",
       "    2006   -0.519439\n",
       "OH  2005   -0.380219\n",
       "    2006    1.029344\n",
       "Name: data1, dtype: float64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.groupby([states, years]).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**데이터프레임 그룹화**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "그룹별 집계 연산은 열 별로 독립적으로 적용된다.\n",
    "\n",
    "- `numeric_only=True` 키워드 인자: `key1` 열을 기준으로 그룹화 하면 `key2` 열에 대해서는 `mean()` 함수를 적용할 수\n",
    "    없기에 수치형 데이터를 담고 있는 열에 대해서만 평균값을 구하라고 지정한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>0.555881</td>\n",
       "      <td>0.441920</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>0.705025</td>\n",
       "      <td>-0.144516</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         data1     data2\n",
       "key1                    \n",
       "a     0.555881  0.441920\n",
       "b     0.705025 -0.144516"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"key1\").mean(numeric_only=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**다중 키 활용 그룹화**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "여러 개의 키를 이용한 그룹화도 동일하게 작동한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>one</th>\n",
       "      <td>-0.204708</td>\n",
       "      <td>0.281746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>0.478943</td>\n",
       "      <td>0.769023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>one</th>\n",
       "      <td>1.965781</td>\n",
       "      <td>-1.296221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>-0.555730</td>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              data1     data2\n",
       "key1 key2                    \n",
       "a    one  -0.204708  0.281746\n",
       "     two   0.478943  0.769023\n",
       "b    one   1.965781 -1.296221\n",
       "     two  -0.555730  1.007189"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"key1\", \"key2\"]).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `dropna=True` 키워드 인자: 결측치 자동 제거"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1\n",
       "a    3\n",
       "b    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"key1\", dropna=True).size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1  key2\n",
       "a     one     1\n",
       "      two     1\n",
       "b     one     1\n",
       "      two     1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"key1\", \"key2\"]).size() # dropna = True 가 기본값"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `dropna=False` 로 지정하면 결측치를 그대로 둚"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1\n",
       "a      3\n",
       "b      2\n",
       "NaN    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"key1\", dropna=False).size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1  key2\n",
       "a     one     1\n",
       "      two     1\n",
       "      NaN     1\n",
       "b     one     1\n",
       "      two     1\n",
       "NaN   one     2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"key1\", \"key2\"], dropna=False).size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `count()` 집계 함수: 그룹별 항목 수 계산"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      key2  data1  data2\n",
       "key1                    \n",
       "a        2      3      3\n",
       "b        2      2      2"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(\"key1\").count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 그룹 확인"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**for 반복문 활용**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "집계 함수를 적용하기 이전의 그룹들의 상태를 확인하기 위해 `for` 반복문을 이용할 수 있다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a\n",
      "---\n",
      "  key1  key2     data1     data2\n",
      "0    a   one -0.204708  0.281746\n",
      "1    a   two  0.478943  0.769023\n",
      "5    a  None  1.393406  0.274992\n",
      "\n",
      "b\n",
      "---\n",
      "  key1 key2     data1     data2\n",
      "3    b  two -0.555730  1.007189\n",
      "4    b  one  1.965781 -1.296221\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for name, group in df.groupby(\"key1\"):\n",
    "    print(name)\n",
    "    print(\"---\")\n",
    "    print(group)\n",
    "    print() # 한 칸 띄우기 용도"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('a', 'one')\n",
      "---\n",
      "  key1 key2     data1     data2\n",
      "0    a  one -0.204708  0.281746\n",
      "\n",
      "('a', 'two')\n",
      "---\n",
      "  key1 key2     data1     data2\n",
      "1    a  two  0.478943  0.769023\n",
      "\n",
      "('b', 'one')\n",
      "---\n",
      "  key1 key2     data1     data2\n",
      "4    b  one  1.965781 -1.296221\n",
      "\n",
      "('b', 'two')\n",
      "---\n",
      "  key1 key2    data1     data2\n",
      "3    b  two -0.55573  1.007189\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for (k1, k2), group in df.groupby([\"key1\", \"key2\"]):\n",
    "    print((k1, k2))\n",
    "    print(\"---\")\n",
    "    print(group)\n",
    "    print() # 한 칸 띄우기 용도"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**사전 활용**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "사전 형식으로 저장하면 쉽게 그룹 단위로 확인할 수 있다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.204708</td>\n",
       "      <td>0.281746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>two</td>\n",
       "      <td>0.478943</td>\n",
       "      <td>0.769023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>None</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.519439</td>\n",
       "      <td>1.246435</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.555730</td>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>1.965781</td>\n",
       "      <td>-1.296221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>a</td>\n",
       "      <td>None</td>\n",
       "      <td>1.393406</td>\n",
       "      <td>0.274992</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>None</td>\n",
       "      <td>one</td>\n",
       "      <td>0.092908</td>\n",
       "      <td>0.228913</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   key1  key2     data1     data2\n",
       "0     a   one -0.204708  0.281746\n",
       "1     a   two  0.478943  0.769023\n",
       "2  None   one -0.519439  1.246435\n",
       "3     b   two -0.555730  1.007189\n",
       "4     b   one  1.965781 -1.296221\n",
       "5     a  None  1.393406  0.274992\n",
       "6  None   one  0.092908  0.228913"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "pieces = {name: group for name, group in df.groupby(\"key1\")}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.555730</td>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>1.965781</td>\n",
       "      <td>-1.296221</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2     data1     data2\n",
       "3    b  two -0.555730  1.007189\n",
       "4    b  one  1.965781 -1.296221"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pieces[\"b\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "pieces = {name: group for name, group in df.groupby([\"key1\", \"key2\"])}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.55573</td>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key1 key2    data1     data2\n",
       "3    b  two -0.55573  1.007189"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pieces[(\"b\", \"two\")]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 열 선택"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "그룹화 후 집계를 진행하기 전에 열 라벨을 선택하면 해당 열에 대해서만 \n",
    "집계 함수가 적용된다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 열 라벨의 리스트를 이용할 때: 데이터프레임 생성"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>one</th>\n",
       "      <td>0.281746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>0.769023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>one</th>\n",
       "      <td>-1.296221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              data2\n",
       "key1 key2          \n",
       "a    one   0.281746\n",
       "     two   0.769023\n",
       "b    one  -1.296221\n",
       "     two   1.007189"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"key1\", \"key2\"])[[\"data2\"]].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 열 라벨을 하나만 지정할 때: 시리즈 생성"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1  key2\n",
       "a     one     0.281746\n",
       "      two     0.769023\n",
       "b     one    -1.296221\n",
       "      two     1.007189\n",
       "Name: data2, dtype: float64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"key1\", \"key2\"])[\"data2\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 열 기준 그룹화"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Joe</th>\n",
       "      <td>1.352917</td>\n",
       "      <td>0.886429</td>\n",
       "      <td>-2.001637</td>\n",
       "      <td>-0.371843</td>\n",
       "      <td>1.669025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Steve</th>\n",
       "      <td>-0.438570</td>\n",
       "      <td>-0.539741</td>\n",
       "      <td>0.476985</td>\n",
       "      <td>3.248944</td>\n",
       "      <td>-1.021228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wanda</th>\n",
       "      <td>-0.577087</td>\n",
       "      <td>0.124121</td>\n",
       "      <td>0.302614</td>\n",
       "      <td>0.523772</td>\n",
       "      <td>0.000940</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jill</th>\n",
       "      <td>1.343810</td>\n",
       "      <td>-0.713544</td>\n",
       "      <td>-0.831154</td>\n",
       "      <td>-2.370232</td>\n",
       "      <td>-1.860761</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Trey</th>\n",
       "      <td>-0.860757</td>\n",
       "      <td>0.560145</td>\n",
       "      <td>-1.265934</td>\n",
       "      <td>0.119827</td>\n",
       "      <td>-1.063512</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              a         b         c         d         e\n",
       "Joe    1.352917  0.886429 -2.001637 -0.371843  1.669025\n",
       "Steve -0.438570 -0.539741  0.476985  3.248944 -1.021228\n",
       "Wanda -0.577087  0.124121  0.302614  0.523772  0.000940\n",
       "Jill   1.343810 -0.713544 -0.831154 -2.370232 -1.860761\n",
       "Trey  -0.860757  0.560145 -1.265934  0.119827 -1.063512"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "people = pd.DataFrame(np.random.standard_normal((5, 5)),\n",
    "                      columns=[\"a\", \"b\", \"c\", \"d\", \"e\"],\n",
    "                      index=[\"Joe\", \"Steve\", \"Wanda\", \"Jill\", \"Trey\"])\n",
    "\n",
    "people"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "연습을 위해 결측치 두 개를 의도적으로 추가한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Joe</th>\n",
       "      <td>1.352917</td>\n",
       "      <td>0.886429</td>\n",
       "      <td>-2.001637</td>\n",
       "      <td>-0.371843</td>\n",
       "      <td>1.669025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Steve</th>\n",
       "      <td>-0.438570</td>\n",
       "      <td>-0.539741</td>\n",
       "      <td>0.476985</td>\n",
       "      <td>3.248944</td>\n",
       "      <td>-1.021228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wanda</th>\n",
       "      <td>-0.577087</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.523772</td>\n",
       "      <td>0.000940</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jill</th>\n",
       "      <td>1.343810</td>\n",
       "      <td>-0.713544</td>\n",
       "      <td>-0.831154</td>\n",
       "      <td>-2.370232</td>\n",
       "      <td>-1.860761</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Trey</th>\n",
       "      <td>-0.860757</td>\n",
       "      <td>0.560145</td>\n",
       "      <td>-1.265934</td>\n",
       "      <td>0.119827</td>\n",
       "      <td>-1.063512</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              a         b         c         d         e\n",
       "Joe    1.352917  0.886429 -2.001637 -0.371843  1.669025\n",
       "Steve -0.438570 -0.539741  0.476985  3.248944 -1.021228\n",
       "Wanda -0.577087       NaN       NaN  0.523772  0.000940\n",
       "Jill   1.343810 -0.713544 -0.831154 -2.370232 -1.860761\n",
       "Trey  -0.860757  0.560145 -1.265934  0.119827 -1.063512"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "people.iloc[2:3, [1, 2]] = np.nan\n",
    "people"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**사전 활용 열 기준 그룹화**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "사전을 이용하여 열 인덱스의 라벨을 `\"red\"`, `\"blue\"` 로 구분한다.\n",
    "즉, 키는 열 인덱스의 라벨을, 키의 값은 그룹을 지정한다.\n",
    "단, `\"f\"` 라벨은 존재하지 않기에 무시된다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "mapping = {\"a\": \"red\", \"b\": \"red\", \"c\": \"blue\",\n",
    "           \"d\": \"blue\", \"e\": \"red\", \"f\" : \"orange\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "아래 코드는 지정된 열로 구성된 그룹별로 `sum()` 메서드를 각 행에 대해 적용한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>blue</th>\n",
       "      <th>red</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Joe</th>\n",
       "      <td>-2.373480</td>\n",
       "      <td>3.908371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Steve</th>\n",
       "      <td>3.725929</td>\n",
       "      <td>-1.999539</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wanda</th>\n",
       "      <td>0.523772</td>\n",
       "      <td>-0.576147</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jill</th>\n",
       "      <td>-3.201385</td>\n",
       "      <td>-1.230495</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Trey</th>\n",
       "      <td>-1.146107</td>\n",
       "      <td>-1.364125</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           blue       red\n",
       "Joe   -2.373480  3.908371\n",
       "Steve  3.725929 -1.999539\n",
       "Wanda  0.523772 -0.576147\n",
       "Jill  -3.201385 -1.230495\n",
       "Trey  -1.146107 -1.364125"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "by_column = people.groupby(mapping, axis=\"columns\")\n",
    "by_column.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "다음은 그룹별, 행별 평균값을 계산한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>blue</th>\n",
       "      <th>red</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Joe</th>\n",
       "      <td>-1.186740</td>\n",
       "      <td>1.302790</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Steve</th>\n",
       "      <td>1.862964</td>\n",
       "      <td>-0.666513</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wanda</th>\n",
       "      <td>0.523772</td>\n",
       "      <td>-0.288074</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jill</th>\n",
       "      <td>-1.600693</td>\n",
       "      <td>-0.410165</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Trey</th>\n",
       "      <td>-0.573054</td>\n",
       "      <td>-0.454708</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           blue       red\n",
       "Joe   -1.186740  1.302790\n",
       "Steve  1.862964 -0.666513\n",
       "Wanda  0.523772 -0.288074\n",
       "Jill  -1.600693 -0.410165\n",
       "Trey  -0.573054 -0.454708"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "by_column.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**시리즈 활용 열 기준 그룹화**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a       red\n",
       "b       red\n",
       "c      blue\n",
       "d      blue\n",
       "e       red\n",
       "f    orange\n",
       "dtype: object"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "map_series = pd.Series(mapping)\n",
    "map_series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "시리즈를 활용하면 행의 라벨이 해당 값에 따라 그룹화된다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>blue</th>\n",
       "      <th>red</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Joe</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Steve</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wanda</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jill</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Trey</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       blue  red\n",
       "Joe       2    3\n",
       "Steve     2    3\n",
       "Wanda     1    2\n",
       "Jill      2    3\n",
       "Trey      2    3"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "people.groupby(map_series, axis=\"columns\").count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 함수 활용 그룹화"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`groupby()` 메서드의 첫째 인자는 `by` 키워드 인자로 지정된다.\n",
    "`by` 키워드의 인자가 함수이면 지정된 축에 따라 \n",
    "행 또는 열 인덱스 라벨에 해당 함수를 적용한 결과를\n",
    "기준으로 그룹화한다.\n",
    "\n",
    "예를 들어 아래 코드는 행 인덱스의 라벨의 길이를 기준으로 그룹화한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Joe</th>\n",
       "      <td>1.352917</td>\n",
       "      <td>0.886429</td>\n",
       "      <td>-2.001637</td>\n",
       "      <td>-0.371843</td>\n",
       "      <td>1.669025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Steve</th>\n",
       "      <td>-0.438570</td>\n",
       "      <td>-0.539741</td>\n",
       "      <td>0.476985</td>\n",
       "      <td>3.248944</td>\n",
       "      <td>-1.021228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wanda</th>\n",
       "      <td>-0.577087</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.523772</td>\n",
       "      <td>0.000940</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jill</th>\n",
       "      <td>1.343810</td>\n",
       "      <td>-0.713544</td>\n",
       "      <td>-0.831154</td>\n",
       "      <td>-2.370232</td>\n",
       "      <td>-1.860761</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Trey</th>\n",
       "      <td>-0.860757</td>\n",
       "      <td>0.560145</td>\n",
       "      <td>-1.265934</td>\n",
       "      <td>0.119827</td>\n",
       "      <td>-1.063512</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              a         b         c         d         e\n",
       "Joe    1.352917  0.886429 -2.001637 -0.371843  1.669025\n",
       "Steve -0.438570 -0.539741  0.476985  3.248944 -1.021228\n",
       "Wanda -0.577087       NaN       NaN  0.523772  0.000940\n",
       "Jill   1.343810 -0.713544 -0.831154 -2.370232 -1.860761\n",
       "Trey  -0.860757  0.560145 -1.265934  0.119827 -1.063512"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "people"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.352917</td>\n",
       "      <td>0.886429</td>\n",
       "      <td>-2.001637</td>\n",
       "      <td>-0.371843</td>\n",
       "      <td>1.669025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.483052</td>\n",
       "      <td>-0.153399</td>\n",
       "      <td>-2.097088</td>\n",
       "      <td>-2.250405</td>\n",
       "      <td>-2.924273</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>-1.015657</td>\n",
       "      <td>-0.539741</td>\n",
       "      <td>0.476985</td>\n",
       "      <td>3.772716</td>\n",
       "      <td>-1.020287</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          a         b         c         d         e\n",
       "3  1.352917  0.886429 -2.001637 -0.371843  1.669025\n",
       "4  0.483052 -0.153399 -2.097088 -2.250405 -2.924273\n",
       "5 -1.015657 -0.539741  0.476985  3.772716 -1.020287"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "people.groupby(len).sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "그룹화의 기준으로 함수와 함께 어레이, 사전, 시리즈, 또는 열 인덱스 라벨 등을 함께 사용할 수 있다.\n",
    "연습을 위해 먼저 새로운 열을 추가한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "people[\"School\"] = [\"A\", \"B\", \"C\", \"A\", \"B\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "      <th>School</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Joe</th>\n",
       "      <td>1.352917</td>\n",
       "      <td>0.886429</td>\n",
       "      <td>-2.001637</td>\n",
       "      <td>-0.371843</td>\n",
       "      <td>1.669025</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Steve</th>\n",
       "      <td>-0.438570</td>\n",
       "      <td>-0.539741</td>\n",
       "      <td>0.476985</td>\n",
       "      <td>3.248944</td>\n",
       "      <td>-1.021228</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wanda</th>\n",
       "      <td>-0.577087</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.523772</td>\n",
       "      <td>0.000940</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jill</th>\n",
       "      <td>1.343810</td>\n",
       "      <td>-0.713544</td>\n",
       "      <td>-0.831154</td>\n",
       "      <td>-2.370232</td>\n",
       "      <td>-1.860761</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Trey</th>\n",
       "      <td>-0.860757</td>\n",
       "      <td>0.560145</td>\n",
       "      <td>-1.265934</td>\n",
       "      <td>0.119827</td>\n",
       "      <td>-1.063512</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              a         b         c         d         e School\n",
       "Joe    1.352917  0.886429 -2.001637 -0.371843  1.669025      A\n",
       "Steve -0.438570 -0.539741  0.476985  3.248944 -1.021228      B\n",
       "Wanda -0.577087       NaN       NaN  0.523772  0.000940      C\n",
       "Jill   1.343810 -0.713544 -0.831154 -2.370232 -1.860761      A\n",
       "Trey  -0.860757  0.560145 -1.265934  0.119827 -1.063512      B"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "people"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "아래 리스트도 그룹화에 사용한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "key_list = [\"one\", \"one\", \"two\", \"three\", \"two\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "아래 코드는 행 인덱스의 라벨의 길이, `key_list`, `\"School\"` 열을 기준으로 그룹하를 진행한 \n",
    "다음에 그룹별 최소값을 계산한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>School</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <th>one</th>\n",
       "      <th>A</th>\n",
       "      <td>1.352917</td>\n",
       "      <td>0.886429</td>\n",
       "      <td>-2.001637</td>\n",
       "      <td>-0.371843</td>\n",
       "      <td>1.669025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">4</th>\n",
       "      <th>three</th>\n",
       "      <th>A</th>\n",
       "      <td>1.343810</td>\n",
       "      <td>-0.713544</td>\n",
       "      <td>-0.831154</td>\n",
       "      <td>-2.370232</td>\n",
       "      <td>-1.860761</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <th>B</th>\n",
       "      <td>-0.860757</td>\n",
       "      <td>0.560145</td>\n",
       "      <td>-1.265934</td>\n",
       "      <td>0.119827</td>\n",
       "      <td>-1.063512</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">5</th>\n",
       "      <th>one</th>\n",
       "      <th>B</th>\n",
       "      <td>-0.438570</td>\n",
       "      <td>-0.539741</td>\n",
       "      <td>0.476985</td>\n",
       "      <td>3.248944</td>\n",
       "      <td>-1.021228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <th>C</th>\n",
       "      <td>-0.577087</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.523772</td>\n",
       "      <td>0.000940</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       a         b         c         d         e\n",
       "        School                                                  \n",
       "3 one   A       1.352917  0.886429 -2.001637 -0.371843  1.669025\n",
       "4 three A       1.343810 -0.713544 -0.831154 -2.370232 -1.860761\n",
       "  two   B      -0.860757  0.560145 -1.265934  0.119827 -1.063512\n",
       "5 one   B      -0.438570 -0.539741  0.476985  3.248944 -1.021228\n",
       "  two   C      -0.577087       NaN       NaN  0.523772  0.000940"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "people.groupby([len, key_list, \"School\"]).min()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 멀티 인덱스 레벨 활용"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "인덱스의 레벨에 포함된 라벨을 기준으로 그룹화를 진행할 수 있다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**행 인덱스 레벨 활용**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.204708</td>\n",
       "      <td>0.281746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>two</td>\n",
       "      <td>0.478943</td>\n",
       "      <td>0.769023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>None</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.519439</td>\n",
       "      <td>1.246435</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.555730</td>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>1.965781</td>\n",
       "      <td>-1.296221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>a</td>\n",
       "      <td>None</td>\n",
       "      <td>1.393406</td>\n",
       "      <td>0.274992</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>None</td>\n",
       "      <td>one</td>\n",
       "      <td>0.092908</td>\n",
       "      <td>0.228913</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   key1  key2     data1     data2\n",
       "0     a   one -0.204708  0.281746\n",
       "1     a   two  0.478943  0.769023\n",
       "2  None   one -0.519439  1.246435\n",
       "3     b   two -0.555730  1.007189\n",
       "4     b   one  1.965781 -1.296221\n",
       "5     a  None  1.393406  0.274992\n",
       "6  None   one  0.092908  0.228913"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>one</th>\n",
       "      <td>-0.204708</td>\n",
       "      <td>0.281746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>0.478943</td>\n",
       "      <td>0.769023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NaN</th>\n",
       "      <th>one</th>\n",
       "      <td>-0.519439</td>\n",
       "      <td>1.246435</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>two</th>\n",
       "      <td>-0.555730</td>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td>1.965781</td>\n",
       "      <td>-1.296221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <th>NaN</th>\n",
       "      <td>1.393406</td>\n",
       "      <td>0.274992</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NaN</th>\n",
       "      <th>one</th>\n",
       "      <td>0.092908</td>\n",
       "      <td>0.228913</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              data1     data2\n",
       "key1 key2                    \n",
       "a    one  -0.204708  0.281746\n",
       "     two   0.478943  0.769023\n",
       "NaN  one  -0.519439  1.246435\n",
       "b    two  -0.555730  1.007189\n",
       "     one   1.965781 -1.296221\n",
       "a    NaN   1.393406  0.274992\n",
       "NaN  one   0.092908  0.228913"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = df.set_index([\"key1\", \"key2\"])\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 행 인덱스의 0-레벨 기준 그룹화"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      data1  data2\n",
       "key1              \n",
       "a         3      3\n",
       "b         2      2"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.groupby(level=0, axis=\"index\").count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 행 인덱스의 1-레벨 기준 그룹화"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      data1  data2\n",
       "key2              \n",
       "one       4      4\n",
       "two       2      2"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.groupby(level=1).count() # axis=\"index\" 가 기본값"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td>0.333636</td>\n",
       "      <td>0.115218</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>-0.038393</td>\n",
       "      <td>0.888106</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         data1     data2\n",
       "key2                    \n",
       "one   0.333636  0.115218\n",
       "two  -0.038393  0.888106"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.groupby(level=1).mean() # axis=\"index\" 가 기본값"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**열 인덱스 레벨 활용**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">Foo</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Bar</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>3</th>\n",
       "      <th>5</th>\n",
       "      <th>1</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.332883</td>\n",
       "      <td>-2.359419</td>\n",
       "      <td>-0.199543</td>\n",
       "      <td>-1.541996</td>\n",
       "      <td>-0.970736</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-1.307030</td>\n",
       "      <td>0.286350</td>\n",
       "      <td>0.377984</td>\n",
       "      <td>-0.753887</td>\n",
       "      <td>0.331286</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.349742</td>\n",
       "      <td>0.069877</td>\n",
       "      <td>0.246674</td>\n",
       "      <td>-0.011862</td>\n",
       "      <td>1.004812</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.327195</td>\n",
       "      <td>-0.919262</td>\n",
       "      <td>-1.549106</td>\n",
       "      <td>0.022185</td>\n",
       "      <td>0.758363</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Foo                           Bar          \n",
       "          1         3         5         1         3\n",
       "0  0.332883 -2.359419 -0.199543 -1.541996 -0.970736\n",
       "1 -1.307030  0.286350  0.377984 -0.753887  0.331286\n",
       "2  1.349742  0.069877  0.246674 -0.011862  1.004812\n",
       "3  1.327195 -0.919262 -1.549106  0.022185  0.758363"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "columns = pd.MultiIndex.from_arrays([[\"Foo\", \"Foo\", \"Foo\", \"Bar\", \"Bar\"],\n",
    "                                    [1, 3, 5, 1, 3]])\n",
    "\n",
    "hier_df = pd.DataFrame(np.random.standard_normal((4, 5)), columns=columns)\n",
    "hier_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 열 인덱스의 0-레벨 기준 그룹화"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Bar</th>\n",
       "      <th>Foo</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Bar  Foo\n",
       "0    2    3\n",
       "1    2    3\n",
       "2    2    3\n",
       "3    2    3"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hier_df.groupby(level=0, axis=\"columns\").count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 열 인덱스의 1-레벨 기준 그룹화"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>3</th>\n",
       "      <th>5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.604556</td>\n",
       "      <td>-1.665077</td>\n",
       "      <td>-0.199543</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-1.030458</td>\n",
       "      <td>0.308818</td>\n",
       "      <td>0.377984</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.668940</td>\n",
       "      <td>0.537344</td>\n",
       "      <td>0.246674</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.674690</td>\n",
       "      <td>-0.080449</td>\n",
       "      <td>-1.549106</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          1         3         5\n",
       "0 -0.604556 -1.665077 -0.199543\n",
       "1 -1.030458  0.308818  0.377984\n",
       "2  0.668940  0.537344  0.246674\n",
       "3  0.674690 -0.080449 -1.549106"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hier_df.groupby(level=1, axis=\"columns\").mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 데이터 집계"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 집계 함수"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**그룹화 집계 메서드**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "아래 함수가 `GroupBy` 객체의 집계 메서드로 최적화되어 있다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| 집계 메서드 | 기능 |\n",
    "| :--- | :--- |\n",
    "| any, all | 최소 하나의 항목이 또는 모든 항목이 참인지 여부 확인 |\n",
    "| count | 그룹별 항목 수. NaN 제외. |\n",
    "| cummin, cummax | 누적 최소값 또는 최대값. NaN 제외 |\n",
    "| cumsum | 누적합. NaN 제외 |\n",
    "| cumprod | 누적곱, NaN 제외 |\n",
    "| first, last | 처음 또는 마지막 항목. NaN 제외 |\n",
    "| mean | 평균값. NaN 제외 |\n",
    "| median | 중앙값. NaN 제외 |\n",
    "| min, max | 최소값 또는 최대값. NaN 제외 |\n",
    "| nth | 정렬했을 때 n 번째 값 | \n",
    "| ohlc | 시계열 데이터의 \"open-high-low-close\" 값 네 개 계산 |\n",
    "| prod | 모든 항목의 곱. NaN 제외 |\n",
    "| quantile | 백분위수 계산 |\n",
    "| rank | 오름차순으로 정렬했을 때의 항목별 순서. NaN 제외 |\n",
    "| size | 그룹 크기 |\n",
    "| sum | 모든 항목의 합. NaN 제외 |\n",
    "| std, var | 표준편차 또는 분산 |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**기타 시리즈 집계 메서드** "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "언급된 집계 메서드 이외에 시리즈의 메서드를 모두 집계 함수로 사용할 수 있다.\n",
    "단, 속도가 좀 느릴 수 있다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.204708</td>\n",
       "      <td>0.281746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>two</td>\n",
       "      <td>0.478943</td>\n",
       "      <td>0.769023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>None</td>\n",
       "      <td>one</td>\n",
       "      <td>-0.519439</td>\n",
       "      <td>1.246435</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>two</td>\n",
       "      <td>-0.555730</td>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>one</td>\n",
       "      <td>1.965781</td>\n",
       "      <td>-1.296221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>a</td>\n",
       "      <td>None</td>\n",
       "      <td>1.393406</td>\n",
       "      <td>0.274992</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>None</td>\n",
       "      <td>one</td>\n",
       "      <td>0.092908</td>\n",
       "      <td>0.228913</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   key1  key2     data1     data2\n",
       "0     a   one -0.204708  0.281746\n",
       "1     a   two  0.478943  0.769023\n",
       "2  None   one -0.519439  1.246435\n",
       "3     b   two -0.555730  1.007189\n",
       "4     b   one  1.965781 -1.296221\n",
       "5     a  None  1.393406  0.274992\n",
       "6  None   one  0.092908  0.228913"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `Series.nsmallest(n=5)` 메서드: 가장 작은 n 개의 항목 반환. `n=5`가 기본값."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1   \n",
       "a     0   -0.204708\n",
       "      1    0.478943\n",
       "b     3   -0.555730\n",
       "      4    1.965781\n",
       "Name: data1, dtype: float64"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped = df.groupby(\"key1\")\n",
    "grouped[\"data1\"].nsmallest(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**사용자 정의 집계 함수**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1차원 어레이를 집계하는 임의의 함수를 그룹화 집계 함수로 이용할 수 있다.\n",
    "예를 들어, 아래 함수는 어레이에 포함된 최대값과 최소값의 차이,\n",
    "즉, 값들의 범위를 계산한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "def peak_to_peak(arr):\n",
    "    return arr.max() - arr.min()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `agg()` 메서드: 사용자 정의 집계 함수를 그룹화 집계 함수로 사용하려면 `agg()` 메서드의 인자로 지정한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1.598113</td>\n",
       "      <td>0.494031</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>2.521511</td>\n",
       "      <td>2.303410</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         data1     data2\n",
       "key1                    \n",
       "a     1.598113  0.494031\n",
       "b     2.521511  2.303410"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped[[\"data1\", \"data2\"]].agg(peak_to_peak)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**비집계 함수**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`DataFrame.describe()` 처럼 집계 함수가 아닌 경우도 작동하기도 한다. \n",
    "\n",
    "- `DataFrame.describe()` 메서드: 수치 데이터셋의 분포를 요약한다. 여기서는 그룹별로 작동한다.\n",
    "\n",
    "이와같은 비집계 함수가 작동하는 원리를 이해하려면 먼저 `apply()` 메서드를 이해해야 한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"8\" halign=\"left\">data1</th>\n",
       "      <th colspan=\"8\" halign=\"left\">data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>3.0</td>\n",
       "      <td>0.555881</td>\n",
       "      <td>0.801830</td>\n",
       "      <td>-0.204708</td>\n",
       "      <td>0.137118</td>\n",
       "      <td>0.478943</td>\n",
       "      <td>0.936175</td>\n",
       "      <td>1.393406</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0.441920</td>\n",
       "      <td>0.283299</td>\n",
       "      <td>0.274992</td>\n",
       "      <td>0.278369</td>\n",
       "      <td>0.281746</td>\n",
       "      <td>0.525384</td>\n",
       "      <td>0.769023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.705025</td>\n",
       "      <td>1.782977</td>\n",
       "      <td>-0.555730</td>\n",
       "      <td>0.074647</td>\n",
       "      <td>0.705025</td>\n",
       "      <td>1.335403</td>\n",
       "      <td>1.965781</td>\n",
       "      <td>2.0</td>\n",
       "      <td>-0.144516</td>\n",
       "      <td>1.628757</td>\n",
       "      <td>-1.296221</td>\n",
       "      <td>-0.720368</td>\n",
       "      <td>-0.144516</td>\n",
       "      <td>0.431337</td>\n",
       "      <td>1.007189</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     data1                                                              \\\n",
       "     count      mean       std       min       25%       50%       75%   \n",
       "key1                                                                     \n",
       "a      3.0  0.555881  0.801830 -0.204708  0.137118  0.478943  0.936175   \n",
       "b      2.0  0.705025  1.782977 -0.555730  0.074647  0.705025  1.335403   \n",
       "\n",
       "               data2                                                    \\\n",
       "           max count      mean       std       min       25%       50%   \n",
       "key1                                                                     \n",
       "a     1.393406   3.0  0.441920  0.283299  0.274992  0.278369  0.281746   \n",
       "b     1.965781   2.0 -0.144516  1.628757 -1.296221 -0.720368 -0.144516   \n",
       "\n",
       "                          \n",
       "           75%       max  \n",
       "key1                      \n",
       "a     0.525384  0.769023  \n",
       "b     0.431337  1.007189  "
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 열별로 여러 함수 적용하기"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "팁 데이터를 다시 이용한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "base_url = \"https://raw.githubusercontent.com/codingalzi/datapy/master/jupyter-book/examples/\"\n",
    "file = \"tips.csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>16.99</td>\n",
       "      <td>1.01</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10.34</td>\n",
       "      <td>1.66</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21.01</td>\n",
       "      <td>3.50</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>23.68</td>\n",
       "      <td>3.31</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24.59</td>\n",
       "      <td>3.61</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   total_bill   tip smoker  day    time  size\n",
       "0       16.99  1.01     No  Sun  Dinner     2\n",
       "1       10.34  1.66     No  Sun  Dinner     3\n",
       "2       21.01  3.50     No  Sun  Dinner     3\n",
       "3       23.68  3.31     No  Sun  Dinner     2\n",
       "4       24.59  3.61     No  Sun  Dinner     4"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips = pd.read_csv(base_url + file)\n",
    "\n",
    "tips.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "전체 수입에 대한 팁이 비율을 새로운 열로 추가한다.\n",
    "열의 라벨은 `\"tip_pct\"`이다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>tip_pct</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>16.99</td>\n",
       "      <td>1.01</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.059447</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10.34</td>\n",
       "      <td>1.66</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "      <td>0.160542</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21.01</td>\n",
       "      <td>3.50</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "      <td>0.166587</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>23.68</td>\n",
       "      <td>3.31</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.139780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24.59</td>\n",
       "      <td>3.61</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.146808</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   total_bill   tip smoker  day    time  size   tip_pct\n",
       "0       16.99  1.01     No  Sun  Dinner     2  0.059447\n",
       "1       10.34  1.66     No  Sun  Dinner     3  0.160542\n",
       "2       21.01  3.50     No  Sun  Dinner     3  0.166587\n",
       "3       23.68  3.31     No  Sun  Dinner     2  0.139780\n",
       "4       24.59  3.61     No  Sun  Dinner     4  0.146808"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips[\"tip_pct\"] = tips[\"tip\"] / tips[\"total_bill\"]\n",
    "tips.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`\"day\"`와 `\"smoker\"` 기준으로 그룹을 짓는다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [],
   "source": [
    "grouped = tips.groupby([\"day\", \"smoker\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "그룹별 팁 비율의 평균값을 계산한다.\n",
    "`agg()` 메서드에 `GroupBy` 객체의 집계 메서드를 인자로 사용할 수 있으며\n",
    "이때 함수의 이름을 문자열로 지정한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "day   smoker\n",
       "Fri   No        0.151650\n",
       "      Yes       0.174783\n",
       "Sat   No        0.158048\n",
       "      Yes       0.147906\n",
       "Sun   No        0.160113\n",
       "      Yes       0.187250\n",
       "Thur  No        0.160298\n",
       "      Yes       0.163863\n",
       "Name: tip_pct, dtype: float64"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_pct = grouped[\"tip_pct\"]\n",
    "grouped_pct.agg(\"mean\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "여러 개의 집계 함수를 사용하면 사용된 집계 함수별로 열이 생성된다.\n",
    "아래 코드는 그룹별로 팁 비율의 평균값, 표준편차, 최대-최소 오차 값을 계산한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>peak_to_peak</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>day</th>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Fri</th>\n",
       "      <th>No</th>\n",
       "      <td>0.151650</td>\n",
       "      <td>0.028123</td>\n",
       "      <td>0.067349</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.174783</td>\n",
       "      <td>0.051293</td>\n",
       "      <td>0.159925</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
       "      <th>No</th>\n",
       "      <td>0.158048</td>\n",
       "      <td>0.039767</td>\n",
       "      <td>0.235193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.147906</td>\n",
       "      <td>0.061375</td>\n",
       "      <td>0.290095</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
       "      <th>No</th>\n",
       "      <td>0.160113</td>\n",
       "      <td>0.042347</td>\n",
       "      <td>0.193226</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.187250</td>\n",
       "      <td>0.154134</td>\n",
       "      <td>0.644685</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Thur</th>\n",
       "      <th>No</th>\n",
       "      <td>0.160298</td>\n",
       "      <td>0.038774</td>\n",
       "      <td>0.193350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.163863</td>\n",
       "      <td>0.039389</td>\n",
       "      <td>0.151240</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 mean       std  peak_to_peak\n",
       "day  smoker                                  \n",
       "Fri  No      0.151650  0.028123      0.067349\n",
       "     Yes     0.174783  0.051293      0.159925\n",
       "Sat  No      0.158048  0.039767      0.235193\n",
       "     Yes     0.147906  0.061375      0.290095\n",
       "Sun  No      0.160113  0.042347      0.193226\n",
       "     Yes     0.187250  0.154134      0.644685\n",
       "Thur No      0.160298  0.038774      0.193350\n",
       "     Yes     0.163863  0.039389      0.151240"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_pct.agg([\"mean\", \"std\", peak_to_peak])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "집계 함수와 생성되는 열의 라벨을 쌍으로 구성하면 지정된 라벨이 사용된다.\n",
    "\n",
    "- `np.std()` 함수: 집계 함수 `\"std\"` 대신 사용됨"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>average</th>\n",
       "      <th>stdev</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>day</th>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Fri</th>\n",
       "      <th>No</th>\n",
       "      <td>0.151650</td>\n",
       "      <td>0.028123</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.174783</td>\n",
       "      <td>0.051293</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
       "      <th>No</th>\n",
       "      <td>0.158048</td>\n",
       "      <td>0.039767</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.147906</td>\n",
       "      <td>0.061375</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
       "      <th>No</th>\n",
       "      <td>0.160113</td>\n",
       "      <td>0.042347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.187250</td>\n",
       "      <td>0.154134</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Thur</th>\n",
       "      <th>No</th>\n",
       "      <td>0.160298</td>\n",
       "      <td>0.038774</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.163863</td>\n",
       "      <td>0.039389</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              average     stdev\n",
       "day  smoker                    \n",
       "Fri  No      0.151650  0.028123\n",
       "     Yes     0.174783  0.051293\n",
       "Sat  No      0.158048  0.039767\n",
       "     Yes     0.147906  0.061375\n",
       "Sun  No      0.160113  0.042347\n",
       "     Yes     0.187250  0.154134\n",
       "Thur No      0.160298  0.038774\n",
       "     Yes     0.163863  0.039389"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_pct.agg([(\"average\", \"mean\"), (\"stdev\", np.std)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "여러 개의 함수를 여러 개의 열에 적용하면 멀티 인덱스가 열의 인덱스로 사용된다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">tip_pct</th>\n",
       "      <th colspan=\"3\" halign=\"left\">total_bill</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>day</th>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Fri</th>\n",
       "      <th>No</th>\n",
       "      <td>4</td>\n",
       "      <td>0.151650</td>\n",
       "      <td>0.187735</td>\n",
       "      <td>4</td>\n",
       "      <td>18.420000</td>\n",
       "      <td>22.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>15</td>\n",
       "      <td>0.174783</td>\n",
       "      <td>0.263480</td>\n",
       "      <td>15</td>\n",
       "      <td>16.813333</td>\n",
       "      <td>40.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
       "      <th>No</th>\n",
       "      <td>45</td>\n",
       "      <td>0.158048</td>\n",
       "      <td>0.291990</td>\n",
       "      <td>45</td>\n",
       "      <td>19.661778</td>\n",
       "      <td>48.33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>42</td>\n",
       "      <td>0.147906</td>\n",
       "      <td>0.325733</td>\n",
       "      <td>42</td>\n",
       "      <td>21.276667</td>\n",
       "      <td>50.81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
       "      <th>No</th>\n",
       "      <td>57</td>\n",
       "      <td>0.160113</td>\n",
       "      <td>0.252672</td>\n",
       "      <td>57</td>\n",
       "      <td>20.506667</td>\n",
       "      <td>48.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>19</td>\n",
       "      <td>0.187250</td>\n",
       "      <td>0.710345</td>\n",
       "      <td>19</td>\n",
       "      <td>24.120000</td>\n",
       "      <td>45.35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Thur</th>\n",
       "      <th>No</th>\n",
       "      <td>45</td>\n",
       "      <td>0.160298</td>\n",
       "      <td>0.266312</td>\n",
       "      <td>45</td>\n",
       "      <td>17.113111</td>\n",
       "      <td>41.19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>17</td>\n",
       "      <td>0.163863</td>\n",
       "      <td>0.241255</td>\n",
       "      <td>17</td>\n",
       "      <td>19.190588</td>\n",
       "      <td>43.11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            tip_pct                     total_bill                  \n",
       "              count      mean       max      count       mean    max\n",
       "day  smoker                                                         \n",
       "Fri  No           4  0.151650  0.187735          4  18.420000  22.75\n",
       "     Yes         15  0.174783  0.263480         15  16.813333  40.17\n",
       "Sat  No          45  0.158048  0.291990         45  19.661778  48.33\n",
       "     Yes         42  0.147906  0.325733         42  21.276667  50.81\n",
       "Sun  No          57  0.160113  0.252672         57  20.506667  48.17\n",
       "     Yes         19  0.187250  0.710345         19  24.120000  45.35\n",
       "Thur No          45  0.160298  0.266312         45  17.113111  41.19\n",
       "     Yes         17  0.163863  0.241255         17  19.190588  43.11"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "functions = [\"count\", \"mean\", \"max\"]\n",
    "result = grouped[[\"tip_pct\", \"total_bill\"]].agg(functions)\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "열의 라벨을 지정하는 방식도 동일하게 작동한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">tip_pct</th>\n",
       "      <th colspan=\"2\" halign=\"left\">total_bill</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Average</th>\n",
       "      <th>Variance</th>\n",
       "      <th>Average</th>\n",
       "      <th>Variance</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>day</th>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Fri</th>\n",
       "      <th>No</th>\n",
       "      <td>0.151650</td>\n",
       "      <td>0.000791</td>\n",
       "      <td>18.420000</td>\n",
       "      <td>25.596333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.174783</td>\n",
       "      <td>0.002631</td>\n",
       "      <td>16.813333</td>\n",
       "      <td>82.562438</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
       "      <th>No</th>\n",
       "      <td>0.158048</td>\n",
       "      <td>0.001581</td>\n",
       "      <td>19.661778</td>\n",
       "      <td>79.908965</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.147906</td>\n",
       "      <td>0.003767</td>\n",
       "      <td>21.276667</td>\n",
       "      <td>101.387535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
       "      <th>No</th>\n",
       "      <td>0.160113</td>\n",
       "      <td>0.001793</td>\n",
       "      <td>20.506667</td>\n",
       "      <td>66.099980</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.187250</td>\n",
       "      <td>0.023757</td>\n",
       "      <td>24.120000</td>\n",
       "      <td>109.046044</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Thur</th>\n",
       "      <th>No</th>\n",
       "      <td>0.160298</td>\n",
       "      <td>0.001503</td>\n",
       "      <td>17.113111</td>\n",
       "      <td>59.625081</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.163863</td>\n",
       "      <td>0.001551</td>\n",
       "      <td>19.190588</td>\n",
       "      <td>69.808518</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              tip_pct           total_bill            \n",
       "              Average  Variance    Average    Variance\n",
       "day  smoker                                           \n",
       "Fri  No      0.151650  0.000791  18.420000   25.596333\n",
       "     Yes     0.174783  0.002631  16.813333   82.562438\n",
       "Sat  No      0.158048  0.001581  19.661778   79.908965\n",
       "     Yes     0.147906  0.003767  21.276667  101.387535\n",
       "Sun  No      0.160113  0.001793  20.506667   66.099980\n",
       "     Yes     0.187250  0.023757  24.120000  109.046044\n",
       "Thur No      0.160298  0.001503  17.113111   59.625081\n",
       "     Yes     0.163863  0.001551  19.190588   69.808518"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ftuples = [(\"Average\", \"mean\"), (\"Variance\", np.var)]\n",
    "grouped[[\"tip_pct\", \"total_bill\"]].agg(ftuples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "열별로 다른 집계 함수를 적용하려면 사전을 이용한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>tip</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>day</th>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Fri</th>\n",
       "      <th>No</th>\n",
       "      <td>3.50</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>4.73</td>\n",
       "      <td>31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
       "      <th>No</th>\n",
       "      <td>9.00</td>\n",
       "      <td>115</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>10.00</td>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
       "      <th>No</th>\n",
       "      <td>6.00</td>\n",
       "      <td>167</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>6.50</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Thur</th>\n",
       "      <th>No</th>\n",
       "      <td>6.70</td>\n",
       "      <td>112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>5.00</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               tip  size\n",
       "day  smoker             \n",
       "Fri  No       3.50     9\n",
       "     Yes      4.73    31\n",
       "Sat  No       9.00   115\n",
       "     Yes     10.00   104\n",
       "Sun  No       6.00   167\n",
       "     Yes      6.50    49\n",
       "Thur No       6.70   112\n",
       "     Yes      5.00    40"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped.agg({\"tip\" : np.max, \"size\" : \"sum\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "열별로 여러 함수를 적용할 수도 있다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"4\" halign=\"left\">tip_pct</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>sum</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>day</th>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Fri</th>\n",
       "      <th>No</th>\n",
       "      <td>0.120385</td>\n",
       "      <td>0.187735</td>\n",
       "      <td>0.151650</td>\n",
       "      <td>0.028123</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.103555</td>\n",
       "      <td>0.263480</td>\n",
       "      <td>0.174783</td>\n",
       "      <td>0.051293</td>\n",
       "      <td>31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
       "      <th>No</th>\n",
       "      <td>0.056797</td>\n",
       "      <td>0.291990</td>\n",
       "      <td>0.158048</td>\n",
       "      <td>0.039767</td>\n",
       "      <td>115</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.035638</td>\n",
       "      <td>0.325733</td>\n",
       "      <td>0.147906</td>\n",
       "      <td>0.061375</td>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
       "      <th>No</th>\n",
       "      <td>0.059447</td>\n",
       "      <td>0.252672</td>\n",
       "      <td>0.160113</td>\n",
       "      <td>0.042347</td>\n",
       "      <td>167</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.065660</td>\n",
       "      <td>0.710345</td>\n",
       "      <td>0.187250</td>\n",
       "      <td>0.154134</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Thur</th>\n",
       "      <th>No</th>\n",
       "      <td>0.072961</td>\n",
       "      <td>0.266312</td>\n",
       "      <td>0.160298</td>\n",
       "      <td>0.038774</td>\n",
       "      <td>112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.090014</td>\n",
       "      <td>0.241255</td>\n",
       "      <td>0.163863</td>\n",
       "      <td>0.039389</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              tip_pct                               size\n",
       "                  min       max      mean       std  sum\n",
       "day  smoker                                             \n",
       "Fri  No      0.120385  0.187735  0.151650  0.028123    9\n",
       "     Yes     0.103555  0.263480  0.174783  0.051293   31\n",
       "Sat  No      0.056797  0.291990  0.158048  0.039767  115\n",
       "     Yes     0.035638  0.325733  0.147906  0.061375  104\n",
       "Sun  No      0.059447  0.252672  0.160113  0.042347  167\n",
       "     Yes     0.065660  0.710345  0.187250  0.154134   49\n",
       "Thur No      0.072961  0.266312  0.160298  0.038774  112\n",
       "     Yes     0.090014  0.241255  0.163863  0.039389   40"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped.agg({\"tip_pct\" : [\"min\", \"max\", \"mean\", \"std\"],\n",
    "             \"size\" : \"sum\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Apply: 다목적 데이터 집계"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "지금까지 `groupby`를 이용한 그룹화 이후에 사용한 집계 함수는 \n",
    "모두 그룹별로 하나의 값만 생성하였다.\n",
    "반면에 `apply()` 메서드를 이용하면 그런 제한 없이 임의의 함수를\n",
    "그룹별로 적용할 수 있다.\n",
    "실행 결과는 그룹별 결과를 합친 데이터프레임이다. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**서빙 팁 데이터 활용**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>tip_pct</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>16.99</td>\n",
       "      <td>1.01</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.059447</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10.34</td>\n",
       "      <td>1.66</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "      <td>0.160542</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21.01</td>\n",
       "      <td>3.50</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "      <td>0.166587</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>23.68</td>\n",
       "      <td>3.31</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.139780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24.59</td>\n",
       "      <td>3.61</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.146808</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239</th>\n",
       "      <td>29.03</td>\n",
       "      <td>5.92</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "      <td>0.203927</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>240</th>\n",
       "      <td>27.18</td>\n",
       "      <td>2.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.073584</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>241</th>\n",
       "      <td>22.67</td>\n",
       "      <td>2.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.088222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>242</th>\n",
       "      <td>17.82</td>\n",
       "      <td>1.75</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.098204</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>243</th>\n",
       "      <td>18.78</td>\n",
       "      <td>3.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.159744</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>244 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     total_bill   tip smoker   day    time  size   tip_pct\n",
       "0         16.99  1.01     No   Sun  Dinner     2  0.059447\n",
       "1         10.34  1.66     No   Sun  Dinner     3  0.160542\n",
       "2         21.01  3.50     No   Sun  Dinner     3  0.166587\n",
       "3         23.68  3.31     No   Sun  Dinner     2  0.139780\n",
       "4         24.59  3.61     No   Sun  Dinner     4  0.146808\n",
       "..          ...   ...    ...   ...     ...   ...       ...\n",
       "239       29.03  5.92     No   Sat  Dinner     3  0.203927\n",
       "240       27.18  2.00    Yes   Sat  Dinner     2  0.073584\n",
       "241       22.67  2.00    Yes   Sat  Dinner     2  0.088222\n",
       "242       17.82  1.75     No   Sat  Dinner     2  0.098204\n",
       "243       18.78  3.00     No  Thur  Dinner     2  0.159744\n",
       "\n",
       "[244 rows x 7 columns]"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "아래 `top()` 함수는 지정된 데이터프레임을 특정 열을 기준으로 내림차순으로 정렬한 다음에\n",
    "처음 `n` 개의 행을 반환한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [],
   "source": [
    "def top(df, n=5, column=\"tip_pct\"):\n",
    "    return df.sort_values(column, ascending=False)[:n]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "서빙팁을 가장 많이 받은 6일에 대한 정보는 다음과 같다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>tip_pct</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>172</th>\n",
       "      <td>7.25</td>\n",
       "      <td>5.15</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.710345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>9.60</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.416667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>3.07</td>\n",
       "      <td>1.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>1</td>\n",
       "      <td>0.325733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>232</th>\n",
       "      <td>11.61</td>\n",
       "      <td>3.39</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.291990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>183</th>\n",
       "      <td>23.17</td>\n",
       "      <td>6.50</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.280535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>14.31</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.279525</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     total_bill   tip smoker  day    time  size   tip_pct\n",
       "172        7.25  5.15    Yes  Sun  Dinner     2  0.710345\n",
       "178        9.60  4.00    Yes  Sun  Dinner     2  0.416667\n",
       "67         3.07  1.00    Yes  Sat  Dinner     1  0.325733\n",
       "232       11.61  3.39     No  Sat  Dinner     2  0.291990\n",
       "183       23.17  6.50    Yes  Sun  Dinner     4  0.280535\n",
       "109       14.31  4.00    Yes  Sat  Dinner     2  0.279525"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top(tips, n=6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "흡연 여부를 기준으로 그룹을 나눈 뒤 흡연 그룹과 비흡연 그룹에 대해\n",
    "서빙팁이 가장 많았던 5일에 대한 정보를\n",
    "`top()` 함수를 이용하여 구한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>tip_pct</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">No</th>\n",
       "      <th>232</th>\n",
       "      <td>11.61</td>\n",
       "      <td>3.39</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.291990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>7.51</td>\n",
       "      <td>2.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "      <td>0.266312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>10.29</td>\n",
       "      <td>2.60</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.252672</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185</th>\n",
       "      <td>20.69</td>\n",
       "      <td>5.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>5</td>\n",
       "      <td>0.241663</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>24.71</td>\n",
       "      <td>5.85</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "      <td>0.236746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">Yes</th>\n",
       "      <th>172</th>\n",
       "      <td>7.25</td>\n",
       "      <td>5.15</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.710345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>9.60</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.416667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>3.07</td>\n",
       "      <td>1.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>1</td>\n",
       "      <td>0.325733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>183</th>\n",
       "      <td>23.17</td>\n",
       "      <td>6.50</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.280535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>14.31</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.279525</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            total_bill   tip smoker   day    time  size   tip_pct\n",
       "smoker                                                           \n",
       "No     232       11.61  3.39     No   Sat  Dinner     2  0.291990\n",
       "       149        7.51  2.00     No  Thur   Lunch     2  0.266312\n",
       "       51        10.29  2.60     No   Sun  Dinner     2  0.252672\n",
       "       185       20.69  5.00     No   Sun  Dinner     5  0.241663\n",
       "       88        24.71  5.85     No  Thur   Lunch     2  0.236746\n",
       "Yes    172        7.25  5.15    Yes   Sun  Dinner     2  0.710345\n",
       "       178        9.60  4.00    Yes   Sun  Dinner     2  0.416667\n",
       "       67         3.07  1.00    Yes   Sat  Dinner     1  0.325733\n",
       "       183       23.17  6.50    Yes   Sun  Dinner     4  0.280535\n",
       "       109       14.31  4.00    Yes   Sat  Dinner     2  0.279525"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.groupby(\"smoker\").apply(top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`top()` 함수의 키워드 인자를 변경하려면 `apply()` 함수의 키워드 인자로 지정하면 된다.\n",
    "아래 코드는 흡연 여부와 요일 기준에 따른 그룹화 후에 그룹별로 가장 많은 수입을 올린 날에 대한 정보를 보여준다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>tip_pct</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">No</th>\n",
       "      <th>Fri</th>\n",
       "      <th>94</th>\n",
       "      <td>22.75</td>\n",
       "      <td>3.25</td>\n",
       "      <td>No</td>\n",
       "      <td>Fri</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.142857</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sat</th>\n",
       "      <th>212</th>\n",
       "      <td>48.33</td>\n",
       "      <td>9.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.186220</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sun</th>\n",
       "      <th>156</th>\n",
       "      <td>48.17</td>\n",
       "      <td>5.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>6</td>\n",
       "      <td>0.103799</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thur</th>\n",
       "      <th>142</th>\n",
       "      <td>41.19</td>\n",
       "      <td>5.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>5</td>\n",
       "      <td>0.121389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">Yes</th>\n",
       "      <th>Fri</th>\n",
       "      <th>95</th>\n",
       "      <td>40.17</td>\n",
       "      <td>4.73</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Fri</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.117750</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sat</th>\n",
       "      <th>170</th>\n",
       "      <td>50.81</td>\n",
       "      <td>10.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "      <td>0.196812</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sun</th>\n",
       "      <th>182</th>\n",
       "      <td>45.35</td>\n",
       "      <td>3.50</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "      <td>0.077178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thur</th>\n",
       "      <th>197</th>\n",
       "      <td>43.11</td>\n",
       "      <td>5.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>4</td>\n",
       "      <td>0.115982</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 total_bill    tip smoker   day    time  size   tip_pct\n",
       "smoker day                                                             \n",
       "No     Fri  94        22.75   3.25     No   Fri  Dinner     2  0.142857\n",
       "       Sat  212       48.33   9.00     No   Sat  Dinner     4  0.186220\n",
       "       Sun  156       48.17   5.00     No   Sun  Dinner     6  0.103799\n",
       "       Thur 142       41.19   5.00     No  Thur   Lunch     5  0.121389\n",
       "Yes    Fri  95        40.17   4.73    Yes   Fri  Dinner     4  0.117750\n",
       "       Sat  170       50.81  10.00    Yes   Sat  Dinner     3  0.196812\n",
       "       Sun  182       45.35   3.50    Yes   Sun  Dinner     3  0.077178\n",
       "       Thur 197       43.11   5.00    Yes  Thur   Lunch     4  0.115982"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.groupby([\"smoker\", \"day\"]).apply(top, n=1, column=\"total_bill\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>No</th>\n",
       "      <td>151.0</td>\n",
       "      <td>0.159328</td>\n",
       "      <td>0.039910</td>\n",
       "      <td>0.056797</td>\n",
       "      <td>0.136906</td>\n",
       "      <td>0.155625</td>\n",
       "      <td>0.185014</td>\n",
       "      <td>0.291990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>93.0</td>\n",
       "      <td>0.163196</td>\n",
       "      <td>0.085119</td>\n",
       "      <td>0.035638</td>\n",
       "      <td>0.106771</td>\n",
       "      <td>0.153846</td>\n",
       "      <td>0.195059</td>\n",
       "      <td>0.710345</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        count      mean       std       min       25%       50%       75%  \\\n",
       "smoker                                                                      \n",
       "No      151.0  0.159328  0.039910  0.056797  0.136906  0.155625  0.185014   \n",
       "Yes      93.0  0.163196  0.085119  0.035638  0.106771  0.153846  0.195059   \n",
       "\n",
       "             max  \n",
       "smoker            \n",
       "No      0.291990  \n",
       "Yes     0.710345  "
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = tips.groupby(\"smoker\")[\"tip_pct\"].describe()\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "       smoker\n",
       "count  No        151.000000\n",
       "       Yes        93.000000\n",
       "mean   No          0.159328\n",
       "       Yes         0.163196\n",
       "std    No          0.039910\n",
       "       Yes         0.085119\n",
       "min    No          0.056797\n",
       "       Yes         0.035638\n",
       "25%    No          0.136906\n",
       "       Yes         0.106771\n",
       "50%    No          0.155625\n",
       "       Yes         0.153846\n",
       "75%    No          0.185014\n",
       "       Yes         0.195059\n",
       "max    No          0.291990\n",
       "       Yes         0.710345\n",
       "dtype: float64"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result.unstack(\"smoker\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 그룹 키 제거"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>tip_pct</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>232</th>\n",
       "      <td>11.61</td>\n",
       "      <td>3.39</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.291990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>7.51</td>\n",
       "      <td>2.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "      <td>0.266312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>10.29</td>\n",
       "      <td>2.60</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.252672</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185</th>\n",
       "      <td>20.69</td>\n",
       "      <td>5.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>5</td>\n",
       "      <td>0.241663</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>24.71</td>\n",
       "      <td>5.85</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "      <td>0.236746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172</th>\n",
       "      <td>7.25</td>\n",
       "      <td>5.15</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.710345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>9.60</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.416667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>3.07</td>\n",
       "      <td>1.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>1</td>\n",
       "      <td>0.325733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>183</th>\n",
       "      <td>23.17</td>\n",
       "      <td>6.50</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.280535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>14.31</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.279525</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     total_bill   tip smoker   day    time  size   tip_pct\n",
       "232       11.61  3.39     No   Sat  Dinner     2  0.291990\n",
       "149        7.51  2.00     No  Thur   Lunch     2  0.266312\n",
       "51        10.29  2.60     No   Sun  Dinner     2  0.252672\n",
       "185       20.69  5.00     No   Sun  Dinner     5  0.241663\n",
       "88        24.71  5.85     No  Thur   Lunch     2  0.236746\n",
       "172        7.25  5.15    Yes   Sun  Dinner     2  0.710345\n",
       "178        9.60  4.00    Yes   Sun  Dinner     2  0.416667\n",
       "67         3.07  1.00    Yes   Sat  Dinner     1  0.325733\n",
       "183       23.17  6.50    Yes   Sun  Dinner     4  0.280535\n",
       "109       14.31  4.00    Yes   Sat  Dinner     2  0.279525"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.groupby(\"smoker\", group_keys=False).apply(top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>tip_pct</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">No</th>\n",
       "      <th>232</th>\n",
       "      <td>11.61</td>\n",
       "      <td>3.39</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.291990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>7.51</td>\n",
       "      <td>2.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "      <td>0.266312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>10.29</td>\n",
       "      <td>2.60</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.252672</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185</th>\n",
       "      <td>20.69</td>\n",
       "      <td>5.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>5</td>\n",
       "      <td>0.241663</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>24.71</td>\n",
       "      <td>5.85</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "      <td>0.236746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">Yes</th>\n",
       "      <th>172</th>\n",
       "      <td>7.25</td>\n",
       "      <td>5.15</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.710345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>9.60</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.416667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>3.07</td>\n",
       "      <td>1.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>1</td>\n",
       "      <td>0.325733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>183</th>\n",
       "      <td>23.17</td>\n",
       "      <td>6.50</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.280535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>14.31</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.279525</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            total_bill   tip smoker   day    time  size   tip_pct\n",
       "smoker                                                           \n",
       "No     232       11.61  3.39     No   Sat  Dinner     2  0.291990\n",
       "       149        7.51  2.00     No  Thur   Lunch     2  0.266312\n",
       "       51        10.29  2.60     No   Sun  Dinner     2  0.252672\n",
       "       185       20.69  5.00     No   Sun  Dinner     5  0.241663\n",
       "       88        24.71  5.85     No  Thur   Lunch     2  0.236746\n",
       "Yes    172        7.25  5.15    Yes   Sun  Dinner     2  0.710345\n",
       "       178        9.60  4.00    Yes   Sun  Dinner     2  0.416667\n",
       "       67         3.07  1.00    Yes   Sat  Dinner     1  0.325733\n",
       "       183       23.17  6.50    Yes   Sun  Dinner     4  0.280535\n",
       "       109       14.31  4.00    Yes   Sat  Dinner     2  0.279525"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.groupby(\"smoker\").apply(top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>tip_pct</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">0</th>\n",
       "      <th>232</th>\n",
       "      <td>11.61</td>\n",
       "      <td>3.39</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.291990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>7.51</td>\n",
       "      <td>2.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "      <td>0.266312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>10.29</td>\n",
       "      <td>2.60</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.252672</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185</th>\n",
       "      <td>20.69</td>\n",
       "      <td>5.00</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>5</td>\n",
       "      <td>0.241663</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>24.71</td>\n",
       "      <td>5.85</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "      <td>0.236746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">1</th>\n",
       "      <th>172</th>\n",
       "      <td>7.25</td>\n",
       "      <td>5.15</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.710345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>9.60</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.416667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>3.07</td>\n",
       "      <td>1.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>1</td>\n",
       "      <td>0.325733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>183</th>\n",
       "      <td>23.17</td>\n",
       "      <td>6.50</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.280535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>14.31</td>\n",
       "      <td>4.00</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.279525</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       total_bill   tip smoker   day    time  size   tip_pct\n",
       "0 232       11.61  3.39     No   Sat  Dinner     2  0.291990\n",
       "  149        7.51  2.00     No  Thur   Lunch     2  0.266312\n",
       "  51        10.29  2.60     No   Sun  Dinner     2  0.252672\n",
       "  185       20.69  5.00     No   Sun  Dinner     5  0.241663\n",
       "  88        24.71  5.85     No  Thur   Lunch     2  0.236746\n",
       "1 172        7.25  5.15    Yes   Sun  Dinner     2  0.710345\n",
       "  178        9.60  4.00    Yes   Sun  Dinner     2  0.416667\n",
       "  67         3.07  1.00    Yes   Sat  Dinner     1  0.325733\n",
       "  183       23.17  6.50    Yes   Sun  Dinner     4  0.280535\n",
       "  109       14.31  4.00    Yes   Sat  Dinner     2  0.279525"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.groupby(\"smoker\", as_index=False).apply(top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 분위/구간 분석"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.660524</td>\n",
       "      <td>-0.612905</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.862580</td>\n",
       "      <td>0.316447</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.010032</td>\n",
       "      <td>0.838295</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.050009</td>\n",
       "      <td>-1.034423</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.670216</td>\n",
       "      <td>0.434304</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      data1     data2\n",
       "0 -0.660524 -0.612905\n",
       "1  0.862580  0.316447\n",
       "2 -0.010032  0.838295\n",
       "3  0.050009 -1.034423\n",
       "4  0.670216  0.434304"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame = pd.DataFrame({\"data1\": np.random.standard_normal(1000),\n",
    "                      \"data2\": np.random.standard_normal(1000)})\n",
    "frame.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     (-1.23, 0.489]\n",
       "1     (0.489, 2.208]\n",
       "2     (-1.23, 0.489]\n",
       "3     (-1.23, 0.489]\n",
       "4     (0.489, 2.208]\n",
       "5     (0.489, 2.208]\n",
       "6     (-1.23, 0.489]\n",
       "7     (-1.23, 0.489]\n",
       "8    (-2.956, -1.23]\n",
       "9     (-1.23, 0.489]\n",
       "Name: data1, dtype: category\n",
       "Categories (4, interval[float64, right]): [(-2.956, -1.23] < (-1.23, 0.489] < (0.489, 2.208] < (2.208, 3.928]]"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quartiles = pd.cut(frame[\"data1\"], 4)\n",
    "quartiles.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_stats(group):\n",
    "    return pd.DataFrame(\n",
    "        {\"min\": group.min(), \"max\": group.max(),\n",
    "        \"count\": group.count(), \"mean\": group.mean()}\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "grouped = frame.groupby(quartiles)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(-2.956, -1.23]</th>\n",
       "      <th>data1</th>\n",
       "      <td>-2.949343</td>\n",
       "      <td>-1.230179</td>\n",
       "      <td>94</td>\n",
       "      <td>-1.658818</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-3.399312</td>\n",
       "      <td>1.670835</td>\n",
       "      <td>94</td>\n",
       "      <td>-0.033333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(-1.23, 0.489]</th>\n",
       "      <th>data1</th>\n",
       "      <td>-1.228918</td>\n",
       "      <td>0.488675</td>\n",
       "      <td>598</td>\n",
       "      <td>-0.329524</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.989741</td>\n",
       "      <td>3.260383</td>\n",
       "      <td>598</td>\n",
       "      <td>-0.002622</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(0.489, 2.208]</th>\n",
       "      <th>data1</th>\n",
       "      <td>0.489965</td>\n",
       "      <td>2.200997</td>\n",
       "      <td>298</td>\n",
       "      <td>1.065727</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-3.745356</td>\n",
       "      <td>2.954439</td>\n",
       "      <td>298</td>\n",
       "      <td>0.078249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">(2.208, 3.928]</th>\n",
       "      <th>data1</th>\n",
       "      <td>2.212303</td>\n",
       "      <td>3.927528</td>\n",
       "      <td>10</td>\n",
       "      <td>2.644253</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-1.929776</td>\n",
       "      <td>1.765640</td>\n",
       "      <td>10</td>\n",
       "      <td>0.024750</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            min       max  count      mean\n",
       "data1                                                     \n",
       "(-2.956, -1.23] data1 -2.949343 -1.230179     94 -1.658818\n",
       "                data2 -3.399312  1.670835     94 -0.033333\n",
       "(-1.23, 0.489]  data1 -1.228918  0.488675    598 -0.329524\n",
       "                data2 -2.989741  3.260383    598 -0.002622\n",
       "(0.489, 2.208]  data1  0.489965  2.200997    298  1.065727\n",
       "                data2 -3.745356  2.954439    298  0.078249\n",
       "(2.208, 3.928]  data1  2.212303  3.927528     10  2.644253\n",
       "                data2 -1.929776  1.765640     10  0.024750"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped.apply(get_stats)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"4\" halign=\"left\">data1</th>\n",
       "      <th colspan=\"4\" halign=\"left\">data2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>(-2.956, -1.23]</th>\n",
       "      <td>-2.949343</td>\n",
       "      <td>-1.230179</td>\n",
       "      <td>94</td>\n",
       "      <td>-1.658818</td>\n",
       "      <td>-3.399312</td>\n",
       "      <td>1.670835</td>\n",
       "      <td>94</td>\n",
       "      <td>-0.033333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(-1.23, 0.489]</th>\n",
       "      <td>-1.228918</td>\n",
       "      <td>0.488675</td>\n",
       "      <td>598</td>\n",
       "      <td>-0.329524</td>\n",
       "      <td>-2.989741</td>\n",
       "      <td>3.260383</td>\n",
       "      <td>598</td>\n",
       "      <td>-0.002622</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(0.489, 2.208]</th>\n",
       "      <td>0.489965</td>\n",
       "      <td>2.200997</td>\n",
       "      <td>298</td>\n",
       "      <td>1.065727</td>\n",
       "      <td>-3.745356</td>\n",
       "      <td>2.954439</td>\n",
       "      <td>298</td>\n",
       "      <td>0.078249</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(2.208, 3.928]</th>\n",
       "      <td>2.212303</td>\n",
       "      <td>3.927528</td>\n",
       "      <td>10</td>\n",
       "      <td>2.644253</td>\n",
       "      <td>-1.929776</td>\n",
       "      <td>1.765640</td>\n",
       "      <td>10</td>\n",
       "      <td>0.024750</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    data1                               data2                  \\\n",
       "                      min       max count      mean       min       max count   \n",
       "data1                                                                           \n",
       "(-2.956, -1.23] -2.949343 -1.230179    94 -1.658818 -3.399312  1.670835    94   \n",
       "(-1.23, 0.489]  -1.228918  0.488675   598 -0.329524 -2.989741  3.260383   598   \n",
       "(0.489, 2.208]   0.489965  2.200997   298  1.065727 -3.745356  2.954439   298   \n",
       "(2.208, 3.928]   2.212303  3.927528    10  2.644253 -1.929776  1.765640    10   \n",
       "\n",
       "                           \n",
       "                     mean  \n",
       "data1                      \n",
       "(-2.956, -1.23] -0.033333  \n",
       "(-1.23, 0.489]  -0.002622  \n",
       "(0.489, 2.208]   0.078249  \n",
       "(2.208, 3.928]   0.024750  "
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped.agg([\"min\", \"max\", \"count\", \"mean\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    3\n",
       "2    2\n",
       "3    2\n",
       "4    3\n",
       "Name: data1, dtype: int64"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quartiles_samp = pd.qcut(frame[\"data1\"], 4, labels=False)\n",
    "quartiles_samp.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [],
   "source": [
    "grouped = frame.groupby(quartiles_samp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">0</th>\n",
       "      <th>data1</th>\n",
       "      <td>-2.949343</td>\n",
       "      <td>-0.685484</td>\n",
       "      <td>250</td>\n",
       "      <td>-1.212173</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-3.399312</td>\n",
       "      <td>2.628441</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.027045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>data1</th>\n",
       "      <td>-0.683066</td>\n",
       "      <td>-0.030280</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.368334</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-2.630247</td>\n",
       "      <td>3.260383</td>\n",
       "      <td>250</td>\n",
       "      <td>-0.027845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2</th>\n",
       "      <th>data1</th>\n",
       "      <td>-0.027734</td>\n",
       "      <td>0.618965</td>\n",
       "      <td>250</td>\n",
       "      <td>0.295812</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-3.056990</td>\n",
       "      <td>2.458842</td>\n",
       "      <td>250</td>\n",
       "      <td>0.014450</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">3</th>\n",
       "      <th>data1</th>\n",
       "      <td>0.623587</td>\n",
       "      <td>3.927528</td>\n",
       "      <td>250</td>\n",
       "      <td>1.248875</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>data2</th>\n",
       "      <td>-3.745356</td>\n",
       "      <td>2.954439</td>\n",
       "      <td>250</td>\n",
       "      <td>0.115899</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  min       max  count      mean\n",
       "data1                                           \n",
       "0     data1 -2.949343 -0.685484    250 -1.212173\n",
       "      data2 -3.399312  2.628441    250 -0.027045\n",
       "1     data1 -0.683066 -0.030280    250 -0.368334\n",
       "      data2 -2.630247  3.260383    250 -0.027845\n",
       "2     data1 -0.027734  0.618965    250  0.295812\n",
       "      data2 -3.056990  2.458842    250  0.014450\n",
       "3     data1  0.623587  3.927528    250  1.248875\n",
       "      data2 -3.745356  2.954439    250  0.115899"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped.apply(get_stats)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 예제: 그룹별 결측치 채우기"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = pd.Series(np.random.standard_normal(6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0         NaN\n",
       "1    0.227290\n",
       "2         NaN\n",
       "3   -2.153545\n",
       "4         NaN\n",
       "5   -0.375842\n",
       "dtype: float64"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[::2] = np.nan\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   -0.767366\n",
       "1    0.227290\n",
       "2   -0.767366\n",
       "3   -2.153545\n",
       "4   -0.767366\n",
       "5   -0.375842\n",
       "dtype: float64"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.fillna(s.mean())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [],
   "source": [
    "states = [\"Ohio\", \"New York\", \"Vermont\", \"Florida\",\n",
    "          \"Oregon\", \"Nevada\", \"California\", \"Idaho\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [],
   "source": [
    "group_key = [\"East\", \"East\", \"East\", \"East\",\n",
    "             \"West\", \"West\", \"West\", \"West\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ohio          0.329939\n",
       "New York      0.981994\n",
       "Vermont       1.105913\n",
       "Florida      -1.613716\n",
       "Oregon        1.561587\n",
       "Nevada        0.406510\n",
       "California    0.359244\n",
       "Idaho        -0.614436\n",
       "dtype: float64"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.Series(np.random.standard_normal(8), index=states)\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ohio          0.329939\n",
       "New York      0.981994\n",
       "Vermont            NaN\n",
       "Florida      -1.613716\n",
       "Oregon        1.561587\n",
       "Nevada             NaN\n",
       "California    0.359244\n",
       "Idaho              NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[[\"Vermont\", \"Nevada\", \"Idaho\"]] = np.nan\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "East    4\n",
       "West    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.groupby(group_key).size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "East    3\n",
       "West    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.groupby(group_key).count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "East   -0.100594\n",
       "West    0.960416\n",
       "dtype: float64"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.groupby(group_key).mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fill_mean(group):\n",
    "    return group.fillna(group.mean())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ohio          0.329939\n",
       "New York      0.981994\n",
       "Vermont      -0.100594\n",
       "Florida      -1.613716\n",
       "Oregon        1.561587\n",
       "Nevada        0.960416\n",
       "California    0.359244\n",
       "Idaho         0.960416\n",
       "dtype: float64"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.groupby(group_key, group_keys=False).apply(fill_mean)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "East  Ohio          0.329939\n",
       "      New York      0.981994\n",
       "      Vermont      -0.100594\n",
       "      Florida      -1.613716\n",
       "West  Oregon        1.561587\n",
       "      Nevada        0.960416\n",
       "      California    0.359244\n",
       "      Idaho         0.960416\n",
       "dtype: float64"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.groupby(group_key, group_keys=True).apply(fill_mean)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": [
    "fill_values = {\"East\": 0.5, \"West\": -1}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fill_func(group):\n",
    "    return group.fillna(fill_values[group.name])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ohio          0.329939\n",
       "New York      0.981994\n",
       "Vermont       0.500000\n",
       "Florida      -1.613716\n",
       "Oregon        1.561587\n",
       "Nevada       -1.000000\n",
       "California    0.359244\n",
       "Idaho        -1.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.groupby(group_key, group_keys=False).apply(fill_func)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 예제: 무작위 샘플링"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [],
   "source": [
    "suits = [\"H\", \"S\", \"C\", \"D\"]  # Hearts, Spades, Clubs, Diamonds\n",
    "card_val = (list(range(1, 11)) + [10] * 3) * 4\n",
    "base_names = [\"A\"] + list(range(2, 11)) + [\"J\", \"K\", \"Q\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [],
   "source": [
    "cards = []\n",
    "for suit in suits:\n",
    "    cards.extend(str(num) + suit for num in base_names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AH      1\n",
       "2H      2\n",
       "3H      3\n",
       "4H      4\n",
       "5H      5\n",
       "6H      6\n",
       "7H      7\n",
       "8H      8\n",
       "9H      9\n",
       "10H    10\n",
       "JH     10\n",
       "KH     10\n",
       "QH     10\n",
       "dtype: int64"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deck = pd.Series(card_val, index=cards)\n",
    "\n",
    "deck.head(13)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [],
   "source": [
    "def draw(deck, n=5):\n",
    "    return deck.sample(n)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4D     4\n",
       "QH    10\n",
       "8S     8\n",
       "7D     7\n",
       "9C     9\n",
       "dtype: int64"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "draw(deck)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_suit(card):\n",
    "    # last letter is suit\n",
    "    return card[-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "C  6C     6\n",
       "   KC    10\n",
       "D  7D     7\n",
       "   3D     3\n",
       "H  7H     7\n",
       "   9H     9\n",
       "S  2S     2\n",
       "   QS    10\n",
       "dtype: int64"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deck.groupby(get_suit).apply(draw, n=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AC      1\n",
       "3C      3\n",
       "5D      5\n",
       "4D      4\n",
       "10H    10\n",
       "7H      7\n",
       "QS     10\n",
       "7S      7\n",
       "dtype: int64"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deck.groupby(get_suit, group_keys=False).apply(draw, n=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 그룹별 가중치 합"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>data</th>\n",
       "      <th>weights</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>-1.691656</td>\n",
       "      <td>0.955905</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>0.511622</td>\n",
       "      <td>0.012745</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>-0.401675</td>\n",
       "      <td>0.137009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>a</td>\n",
       "      <td>0.968578</td>\n",
       "      <td>0.763037</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>-1.818215</td>\n",
       "      <td>0.492472</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>0.279963</td>\n",
       "      <td>0.832908</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>b</td>\n",
       "      <td>-0.200819</td>\n",
       "      <td>0.658331</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>b</td>\n",
       "      <td>-0.217221</td>\n",
       "      <td>0.612009</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  category      data   weights\n",
       "0        a -1.691656  0.955905\n",
       "1        a  0.511622  0.012745\n",
       "2        a -0.401675  0.137009\n",
       "3        a  0.968578  0.763037\n",
       "4        b -1.818215  0.492472\n",
       "5        b  0.279963  0.832908\n",
       "6        b -0.200819  0.658331\n",
       "7        b -0.217221  0.612009"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({\"category\": [\"a\", \"a\", \"a\", \"a\",\n",
    "                                \"b\", \"b\", \"b\", \"b\"],\n",
    "                   \"data\": np.random.standard_normal(8),\n",
    "                   \"weights\": np.random.uniform(size=8)})\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [],
   "source": [
    "grouped = df.groupby(\"category\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_wavg(group):\n",
    "    return np.average(group[\"data\"], weights=group[\"weights\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "category\n",
       "a   -0.495807\n",
       "b   -0.357273\n",
       "dtype: float64"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped.apply(get_wavg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [],
   "source": [
    "base_url = \"https://raw.githubusercontent.com/codingalzi/datapy/master/jupyter-book/examples/\"\n",
    "file = \"stock_px.csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [],
   "source": [
    "close_px = pd.read_csv(base_url+file, parse_dates=True,\n",
    "                       index_col=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "DatetimeIndex: 5472 entries, 1990-02-01 to 2011-10-14\n",
      "Data columns (total 9 columns):\n",
      " #   Column  Non-Null Count  Dtype  \n",
      "---  ------  --------------  -----  \n",
      " 0   AA      5472 non-null   float64\n",
      " 1   AAPL    5472 non-null   float64\n",
      " 2   GE      5472 non-null   float64\n",
      " 3   IBM     5472 non-null   float64\n",
      " 4   JNJ     5472 non-null   float64\n",
      " 5   MSFT    5472 non-null   float64\n",
      " 6   PEP     5471 non-null   float64\n",
      " 7   SPX     5472 non-null   float64\n",
      " 8   XOM     5472 non-null   float64\n",
      "dtypes: float64(9)\n",
      "memory usage: 427.5 KB\n"
     ]
    }
   ],
   "source": [
    "close_px.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AA</th>\n",
       "      <th>AAPL</th>\n",
       "      <th>GE</th>\n",
       "      <th>IBM</th>\n",
       "      <th>JNJ</th>\n",
       "      <th>MSFT</th>\n",
       "      <th>PEP</th>\n",
       "      <th>SPX</th>\n",
       "      <th>XOM</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2011-10-11</th>\n",
       "      <td>10.30</td>\n",
       "      <td>400.29</td>\n",
       "      <td>16.14</td>\n",
       "      <td>185.00</td>\n",
       "      <td>63.96</td>\n",
       "      <td>27.00</td>\n",
       "      <td>60.95</td>\n",
       "      <td>1195.54</td>\n",
       "      <td>76.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011-10-12</th>\n",
       "      <td>10.05</td>\n",
       "      <td>402.19</td>\n",
       "      <td>16.40</td>\n",
       "      <td>186.12</td>\n",
       "      <td>64.33</td>\n",
       "      <td>26.96</td>\n",
       "      <td>62.70</td>\n",
       "      <td>1207.25</td>\n",
       "      <td>77.16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011-10-13</th>\n",
       "      <td>10.10</td>\n",
       "      <td>408.43</td>\n",
       "      <td>16.22</td>\n",
       "      <td>186.82</td>\n",
       "      <td>64.23</td>\n",
       "      <td>27.18</td>\n",
       "      <td>62.36</td>\n",
       "      <td>1203.66</td>\n",
       "      <td>76.37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011-10-14</th>\n",
       "      <td>10.26</td>\n",
       "      <td>422.00</td>\n",
       "      <td>16.60</td>\n",
       "      <td>190.53</td>\n",
       "      <td>64.72</td>\n",
       "      <td>27.27</td>\n",
       "      <td>62.24</td>\n",
       "      <td>1224.58</td>\n",
       "      <td>78.11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               AA    AAPL     GE     IBM    JNJ   MSFT    PEP      SPX    XOM\n",
       "2011-10-11  10.30  400.29  16.14  185.00  63.96  27.00  60.95  1195.54  76.27\n",
       "2011-10-12  10.05  402.19  16.40  186.12  64.33  26.96  62.70  1207.25  77.16\n",
       "2011-10-13  10.10  408.43  16.22  186.82  64.23  27.18  62.36  1203.66  76.37\n",
       "2011-10-14  10.26  422.00  16.60  190.53  64.72  27.27  62.24  1224.58  78.11"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "close_px.tail(4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [],
   "source": [
    "def spx_corr(group):\n",
    "    return group.corrwith(group[\"SPX\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [],
   "source": [
    "rets = close_px.pct_change().dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_year(x):\n",
    "    return x.year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [],
   "source": [
    "by_year = rets.groupby(get_year)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AA</th>\n",
       "      <th>AAPL</th>\n",
       "      <th>GE</th>\n",
       "      <th>IBM</th>\n",
       "      <th>JNJ</th>\n",
       "      <th>MSFT</th>\n",
       "      <th>PEP</th>\n",
       "      <th>SPX</th>\n",
       "      <th>XOM</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1990</th>\n",
       "      <td>0.595024</td>\n",
       "      <td>0.545067</td>\n",
       "      <td>0.752187</td>\n",
       "      <td>0.738361</td>\n",
       "      <td>0.801145</td>\n",
       "      <td>0.586691</td>\n",
       "      <td>0.783168</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.517586</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1991</th>\n",
       "      <td>0.453574</td>\n",
       "      <td>0.365315</td>\n",
       "      <td>0.759607</td>\n",
       "      <td>0.557046</td>\n",
       "      <td>0.646401</td>\n",
       "      <td>0.524225</td>\n",
       "      <td>0.641775</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.569335</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1992</th>\n",
       "      <td>0.398180</td>\n",
       "      <td>0.498732</td>\n",
       "      <td>0.632685</td>\n",
       "      <td>0.262232</td>\n",
       "      <td>0.515740</td>\n",
       "      <td>0.492345</td>\n",
       "      <td>0.473871</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.318408</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1993</th>\n",
       "      <td>0.259069</td>\n",
       "      <td>0.238578</td>\n",
       "      <td>0.447257</td>\n",
       "      <td>0.211269</td>\n",
       "      <td>0.451503</td>\n",
       "      <td>0.425377</td>\n",
       "      <td>0.385089</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.318952</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1994</th>\n",
       "      <td>0.428549</td>\n",
       "      <td>0.268420</td>\n",
       "      <td>0.572996</td>\n",
       "      <td>0.385162</td>\n",
       "      <td>0.372962</td>\n",
       "      <td>0.436585</td>\n",
       "      <td>0.450516</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.395078</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2007</th>\n",
       "      <td>0.642427</td>\n",
       "      <td>0.508118</td>\n",
       "      <td>0.796945</td>\n",
       "      <td>0.603906</td>\n",
       "      <td>0.568423</td>\n",
       "      <td>0.658770</td>\n",
       "      <td>0.651911</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.786264</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2008</th>\n",
       "      <td>0.781057</td>\n",
       "      <td>0.681434</td>\n",
       "      <td>0.777337</td>\n",
       "      <td>0.833074</td>\n",
       "      <td>0.801005</td>\n",
       "      <td>0.804626</td>\n",
       "      <td>0.709264</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.828303</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009</th>\n",
       "      <td>0.735642</td>\n",
       "      <td>0.707103</td>\n",
       "      <td>0.713086</td>\n",
       "      <td>0.684513</td>\n",
       "      <td>0.603146</td>\n",
       "      <td>0.654902</td>\n",
       "      <td>0.541474</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.797921</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>0.745700</td>\n",
       "      <td>0.710105</td>\n",
       "      <td>0.822285</td>\n",
       "      <td>0.783638</td>\n",
       "      <td>0.689896</td>\n",
       "      <td>0.730118</td>\n",
       "      <td>0.626655</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.839057</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011</th>\n",
       "      <td>0.882045</td>\n",
       "      <td>0.691931</td>\n",
       "      <td>0.864595</td>\n",
       "      <td>0.802730</td>\n",
       "      <td>0.752379</td>\n",
       "      <td>0.800996</td>\n",
       "      <td>0.592029</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.859975</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>22 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            AA      AAPL        GE       IBM       JNJ      MSFT       PEP  \\\n",
       "1990  0.595024  0.545067  0.752187  0.738361  0.801145  0.586691  0.783168   \n",
       "1991  0.453574  0.365315  0.759607  0.557046  0.646401  0.524225  0.641775   \n",
       "1992  0.398180  0.498732  0.632685  0.262232  0.515740  0.492345  0.473871   \n",
       "1993  0.259069  0.238578  0.447257  0.211269  0.451503  0.425377  0.385089   \n",
       "1994  0.428549  0.268420  0.572996  0.385162  0.372962  0.436585  0.450516   \n",
       "...        ...       ...       ...       ...       ...       ...       ...   \n",
       "2007  0.642427  0.508118  0.796945  0.603906  0.568423  0.658770  0.651911   \n",
       "2008  0.781057  0.681434  0.777337  0.833074  0.801005  0.804626  0.709264   \n",
       "2009  0.735642  0.707103  0.713086  0.684513  0.603146  0.654902  0.541474   \n",
       "2010  0.745700  0.710105  0.822285  0.783638  0.689896  0.730118  0.626655   \n",
       "2011  0.882045  0.691931  0.864595  0.802730  0.752379  0.800996  0.592029   \n",
       "\n",
       "      SPX       XOM  \n",
       "1990  1.0  0.517586  \n",
       "1991  1.0  0.569335  \n",
       "1992  1.0  0.318408  \n",
       "1993  1.0  0.318952  \n",
       "1994  1.0  0.395078  \n",
       "...   ...       ...  \n",
       "2007  1.0  0.786264  \n",
       "2008  1.0  0.828303  \n",
       "2009  1.0  0.797921  \n",
       "2010  1.0  0.839057  \n",
       "2011  1.0  0.859975  \n",
       "\n",
       "[22 rows x 9 columns]"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "by_year.apply(spx_corr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [],
   "source": [
    "def corr_aapl_msft(group):\n",
    "    return group[\"AAPL\"].corr(group[\"MSFT\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1990    0.408271\n",
       "1991    0.266807\n",
       "1992    0.450592\n",
       "1993    0.236917\n",
       "1994    0.361638\n",
       "          ...   \n",
       "2007    0.417738\n",
       "2008    0.611901\n",
       "2009    0.432738\n",
       "2010    0.571946\n",
       "2011    0.581987\n",
       "Length: 22, dtype: float64"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "by_year.apply(corr_aapl_msft)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 예제: 그룹 단위 선형 회귀"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [],
   "source": [
    "import statsmodels.api as sm\n",
    "\n",
    "def regress(data, yvar=None, xvars=None):\n",
    "    Y = data[yvar]\n",
    "    X = data[xvars]\n",
    "    X[\"intercept\"] = 1.\n",
    "    result = sm.OLS(Y, X).fit()\n",
    "    return result.params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SPX</th>\n",
       "      <th>intercept</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1990</th>\n",
       "      <td>1.512772</td>\n",
       "      <td>0.001395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1991</th>\n",
       "      <td>1.187351</td>\n",
       "      <td>0.000396</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1992</th>\n",
       "      <td>1.832427</td>\n",
       "      <td>0.000164</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1993</th>\n",
       "      <td>1.390470</td>\n",
       "      <td>-0.002657</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1994</th>\n",
       "      <td>1.190277</td>\n",
       "      <td>0.001617</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2007</th>\n",
       "      <td>1.198761</td>\n",
       "      <td>0.003438</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2008</th>\n",
       "      <td>0.968016</td>\n",
       "      <td>-0.001110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009</th>\n",
       "      <td>0.879103</td>\n",
       "      <td>0.002954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>1.052608</td>\n",
       "      <td>0.001261</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011</th>\n",
       "      <td>0.806605</td>\n",
       "      <td>0.001514</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>22 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           SPX  intercept\n",
       "1990  1.512772   0.001395\n",
       "1991  1.187351   0.000396\n",
       "1992  1.832427   0.000164\n",
       "1993  1.390470  -0.002657\n",
       "1994  1.190277   0.001617\n",
       "...        ...        ...\n",
       "2007  1.198761   0.003438\n",
       "2008  0.968016  -0.001110\n",
       "2009  0.879103   0.002954\n",
       "2010  1.052608   0.001261\n",
       "2011  0.806605   0.001514\n",
       "\n",
       "[22 rows x 2 columns]"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "by_year.apply(regress, yvar=\"AAPL\", xvars=[\"SPX\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 그룹 변환"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>b</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>a</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>b</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>a</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>b</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>c</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   key  value\n",
       "0    a    0.0\n",
       "1    b    1.0\n",
       "2    c    2.0\n",
       "3    a    3.0\n",
       "4    b    4.0\n",
       "5    c    5.0\n",
       "6    a    6.0\n",
       "7    b    7.0\n",
       "8    c    8.0\n",
       "9    a    9.0\n",
       "10   b   10.0\n",
       "11   c   11.0"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'key': ['a', 'b', 'c'] * 4,\n",
    "                   'value': np.arange(12.)})\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = df.groupby('key', group_keys=False)['value']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key\n",
       "a    4.5\n",
       "b    5.5\n",
       "c    6.5\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_mean(group):\n",
    "    return group.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4.5\n",
       "1     5.5\n",
       "2     6.5\n",
       "3     4.5\n",
       "4     5.5\n",
       "5     6.5\n",
       "6     4.5\n",
       "7     5.5\n",
       "8     6.5\n",
       "9     4.5\n",
       "10    5.5\n",
       "11    6.5\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform(get_mean)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4.5\n",
       "1     5.5\n",
       "2     6.5\n",
       "3     4.5\n",
       "4     5.5\n",
       "5     6.5\n",
       "6     4.5\n",
       "7     5.5\n",
       "8     6.5\n",
       "9     4.5\n",
       "10    5.5\n",
       "11    6.5\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform('mean')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [],
   "source": [
    "def times_two(group):\n",
    "    return group * 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0.0\n",
       "1      2.0\n",
       "2      4.0\n",
       "3      6.0\n",
       "4      8.0\n",
       "5     10.0\n",
       "6     12.0\n",
       "7     14.0\n",
       "8     16.0\n",
       "9     18.0\n",
       "10    20.0\n",
       "11    22.0\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 128,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform(times_two)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_ranks(group):\n",
    "    return group.rank(ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4.0\n",
       "1     4.0\n",
       "2     4.0\n",
       "3     3.0\n",
       "4     3.0\n",
       "5     3.0\n",
       "6     2.0\n",
       "7     2.0\n",
       "8     2.0\n",
       "9     1.0\n",
       "10    1.0\n",
       "11    1.0\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform(get_ranks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [],
   "source": [
    "def normalize(x):\n",
    "    return (x - x.mean()) / x.std()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    -1.161895\n",
       "1    -1.161895\n",
       "2    -1.161895\n",
       "3    -0.387298\n",
       "4    -0.387298\n",
       "5    -0.387298\n",
       "6     0.387298\n",
       "7     0.387298\n",
       "8     0.387298\n",
       "9     1.161895\n",
       "10    1.161895\n",
       "11    1.161895\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform(normalize)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    -1.161895\n",
       "1    -1.161895\n",
       "2    -1.161895\n",
       "3    -0.387298\n",
       "4    -0.387298\n",
       "5    -0.387298\n",
       "6     0.387298\n",
       "7     0.387298\n",
       "8     0.387298\n",
       "9     1.161895\n",
       "10    1.161895\n",
       "11    1.161895\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.apply(normalize)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4.5\n",
       "1     5.5\n",
       "2     6.5\n",
       "3     4.5\n",
       "4     5.5\n",
       "5     6.5\n",
       "6     4.5\n",
       "7     5.5\n",
       "8     6.5\n",
       "9     4.5\n",
       "10    5.5\n",
       "11    6.5\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform('mean')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    -1.161895\n",
       "1    -1.161895\n",
       "2    -1.161895\n",
       "3    -0.387298\n",
       "4    -0.387298\n",
       "5    -0.387298\n",
       "6     0.387298\n",
       "7     0.387298\n",
       "8     0.387298\n",
       "9     1.161895\n",
       "10    1.161895\n",
       "11    1.161895\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized = (df['value'] - g.transform('mean')) / g.transform('std')\n",
    "normalized"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 피벗 테이블"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>tip_pct</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>16.99</td>\n",
       "      <td>1.01</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.059447</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10.34</td>\n",
       "      <td>1.66</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "      <td>0.160542</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21.01</td>\n",
       "      <td>3.50</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "      <td>0.166587</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>23.68</td>\n",
       "      <td>3.31</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "      <td>0.139780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24.59</td>\n",
       "      <td>3.61</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "      <td>0.146808</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   total_bill   tip smoker  day    time  size   tip_pct\n",
       "0       16.99  1.01     No  Sun  Dinner     2  0.059447\n",
       "1       10.34  1.66     No  Sun  Dinner     3  0.160542\n",
       "2       21.01  3.50     No  Sun  Dinner     3  0.166587\n",
       "3       23.68  3.31     No  Sun  Dinner     2  0.139780\n",
       "4       24.59  3.61     No  Sun  Dinner     4  0.146808"
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>size</th>\n",
       "      <th>tip</th>\n",
       "      <th>tip_pct</th>\n",
       "      <th>total_bill</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>day</th>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Fri</th>\n",
       "      <th>No</th>\n",
       "      <td>2.250000</td>\n",
       "      <td>2.812500</td>\n",
       "      <td>0.151650</td>\n",
       "      <td>18.420000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>2.066667</td>\n",
       "      <td>2.714000</td>\n",
       "      <td>0.174783</td>\n",
       "      <td>16.813333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
       "      <th>No</th>\n",
       "      <td>2.555556</td>\n",
       "      <td>3.102889</td>\n",
       "      <td>0.158048</td>\n",
       "      <td>19.661778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>2.476190</td>\n",
       "      <td>2.875476</td>\n",
       "      <td>0.147906</td>\n",
       "      <td>21.276667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
       "      <th>No</th>\n",
       "      <td>2.929825</td>\n",
       "      <td>3.167895</td>\n",
       "      <td>0.160113</td>\n",
       "      <td>20.506667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>2.578947</td>\n",
       "      <td>3.516842</td>\n",
       "      <td>0.187250</td>\n",
       "      <td>24.120000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Thur</th>\n",
       "      <th>No</th>\n",
       "      <td>2.488889</td>\n",
       "      <td>2.673778</td>\n",
       "      <td>0.160298</td>\n",
       "      <td>17.113111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>2.352941</td>\n",
       "      <td>3.030000</td>\n",
       "      <td>0.163863</td>\n",
       "      <td>19.190588</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 size       tip   tip_pct  total_bill\n",
       "day  smoker                                          \n",
       "Fri  No      2.250000  2.812500  0.151650   18.420000\n",
       "     Yes     2.066667  2.714000  0.174783   16.813333\n",
       "Sat  No      2.555556  3.102889  0.158048   19.661778\n",
       "     Yes     2.476190  2.875476  0.147906   21.276667\n",
       "Sun  No      2.929825  3.167895  0.160113   20.506667\n",
       "     Yes     2.578947  3.516842  0.187250   24.120000\n",
       "Thur No      2.488889  2.673778  0.160298   17.113111\n",
       "     Yes     2.352941  3.030000  0.163863   19.190588"
      ]
     },
     "execution_count": 137,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.pivot_table(index=[\"day\", \"smoker\"], values=[\"size\", \"tip\", \"tip_pct\", \"total_bill\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>size</th>\n",
       "      <th>tip</th>\n",
       "      <th>tip_pct</th>\n",
       "      <th>total_bill</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>day</th>\n",
       "      <th>smoker</th>\n",
       "      <th>time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">Fri</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">No</th>\n",
       "      <th>Dinner</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.750000</td>\n",
       "      <td>0.139622</td>\n",
       "      <td>19.233333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lunch</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>0.187735</td>\n",
       "      <td>15.980000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Yes</th>\n",
       "      <th>Dinner</th>\n",
       "      <td>2.222222</td>\n",
       "      <td>3.003333</td>\n",
       "      <td>0.165347</td>\n",
       "      <td>19.806667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lunch</th>\n",
       "      <td>1.833333</td>\n",
       "      <td>2.280000</td>\n",
       "      <td>0.188937</td>\n",
       "      <td>12.323333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sat</th>\n",
       "      <th>No</th>\n",
       "      <th>Dinner</th>\n",
       "      <td>2.555556</td>\n",
       "      <td>3.102889</td>\n",
       "      <td>0.158048</td>\n",
       "      <td>19.661778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <th>Dinner</th>\n",
       "      <td>2.476190</td>\n",
       "      <td>2.875476</td>\n",
       "      <td>0.147906</td>\n",
       "      <td>21.276667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Sun</th>\n",
       "      <th>No</th>\n",
       "      <th>Dinner</th>\n",
       "      <td>2.929825</td>\n",
       "      <td>3.167895</td>\n",
       "      <td>0.160113</td>\n",
       "      <td>20.506667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <th>Dinner</th>\n",
       "      <td>2.578947</td>\n",
       "      <td>3.516842</td>\n",
       "      <td>0.187250</td>\n",
       "      <td>24.120000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">Thur</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">No</th>\n",
       "      <th>Dinner</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>0.159744</td>\n",
       "      <td>18.780000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lunch</th>\n",
       "      <td>2.500000</td>\n",
       "      <td>2.666364</td>\n",
       "      <td>0.160311</td>\n",
       "      <td>17.075227</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <th>Lunch</th>\n",
       "      <td>2.352941</td>\n",
       "      <td>3.030000</td>\n",
       "      <td>0.163863</td>\n",
       "      <td>19.190588</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        size       tip   tip_pct  total_bill\n",
       "day  smoker time                                            \n",
       "Fri  No     Dinner  2.000000  2.750000  0.139622   19.233333\n",
       "            Lunch   3.000000  3.000000  0.187735   15.980000\n",
       "     Yes    Dinner  2.222222  3.003333  0.165347   19.806667\n",
       "            Lunch   1.833333  2.280000  0.188937   12.323333\n",
       "Sat  No     Dinner  2.555556  3.102889  0.158048   19.661778\n",
       "     Yes    Dinner  2.476190  2.875476  0.147906   21.276667\n",
       "Sun  No     Dinner  2.929825  3.167895  0.160113   20.506667\n",
       "     Yes    Dinner  2.578947  3.516842  0.187250   24.120000\n",
       "Thur No     Dinner  2.000000  3.000000  0.159744   18.780000\n",
       "            Lunch   2.500000  2.666364  0.160311   17.075227\n",
       "     Yes    Lunch   2.352941  3.030000  0.163863   19.190588"
      ]
     },
     "execution_count": 138,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.pivot_table(index=[\"day\", \"smoker\", \"time\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">size</th>\n",
       "      <th colspan=\"2\" halign=\"left\">tip_pct</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>smoker</th>\n",
       "      <th>No</th>\n",
       "      <th>Yes</th>\n",
       "      <th>No</th>\n",
       "      <th>Yes</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>time</th>\n",
       "      <th>day</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">Dinner</th>\n",
       "      <th>Fri</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.222222</td>\n",
       "      <td>0.139622</td>\n",
       "      <td>0.165347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sat</th>\n",
       "      <td>2.555556</td>\n",
       "      <td>2.476190</td>\n",
       "      <td>0.158048</td>\n",
       "      <td>0.147906</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sun</th>\n",
       "      <td>2.929825</td>\n",
       "      <td>2.578947</td>\n",
       "      <td>0.160113</td>\n",
       "      <td>0.187250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thur</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.159744</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Lunch</th>\n",
       "      <th>Fri</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>1.833333</td>\n",
       "      <td>0.187735</td>\n",
       "      <td>0.188937</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thur</th>\n",
       "      <td>2.500000</td>\n",
       "      <td>2.352941</td>\n",
       "      <td>0.160311</td>\n",
       "      <td>0.163863</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 size             tip_pct          \n",
       "smoker             No       Yes        No       Yes\n",
       "time   day                                         \n",
       "Dinner Fri   2.000000  2.222222  0.139622  0.165347\n",
       "       Sat   2.555556  2.476190  0.158048  0.147906\n",
       "       Sun   2.929825  2.578947  0.160113  0.187250\n",
       "       Thur  2.000000       NaN  0.159744       NaN\n",
       "Lunch  Fri   3.000000  1.833333  0.187735  0.188937\n",
       "       Thur  2.500000  2.352941  0.160311  0.163863"
      ]
     },
     "execution_count": 139,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.pivot_table(index=[\"time\", \"day\"], columns=\"smoker\",\n",
    "                 values=[\"tip_pct\", \"size\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">size</th>\n",
       "      <th colspan=\"3\" halign=\"left\">tip_pct</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>smoker</th>\n",
       "      <th>No</th>\n",
       "      <th>Yes</th>\n",
       "      <th>All</th>\n",
       "      <th>No</th>\n",
       "      <th>Yes</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>time</th>\n",
       "      <th>day</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">Dinner</th>\n",
       "      <th>Fri</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.222222</td>\n",
       "      <td>2.166667</td>\n",
       "      <td>0.139622</td>\n",
       "      <td>0.165347</td>\n",
       "      <td>0.158916</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sat</th>\n",
       "      <td>2.555556</td>\n",
       "      <td>2.476190</td>\n",
       "      <td>2.517241</td>\n",
       "      <td>0.158048</td>\n",
       "      <td>0.147906</td>\n",
       "      <td>0.153152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sun</th>\n",
       "      <td>2.929825</td>\n",
       "      <td>2.578947</td>\n",
       "      <td>2.842105</td>\n",
       "      <td>0.160113</td>\n",
       "      <td>0.187250</td>\n",
       "      <td>0.166897</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thur</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>0.159744</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.159744</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Lunch</th>\n",
       "      <th>Fri</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>1.833333</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>0.187735</td>\n",
       "      <td>0.188937</td>\n",
       "      <td>0.188765</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thur</th>\n",
       "      <td>2.500000</td>\n",
       "      <td>2.352941</td>\n",
       "      <td>2.459016</td>\n",
       "      <td>0.160311</td>\n",
       "      <td>0.163863</td>\n",
       "      <td>0.161301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <th></th>\n",
       "      <td>2.668874</td>\n",
       "      <td>2.408602</td>\n",
       "      <td>2.569672</td>\n",
       "      <td>0.159328</td>\n",
       "      <td>0.163196</td>\n",
       "      <td>0.160803</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 size                       tip_pct                    \n",
       "smoker             No       Yes       All        No       Yes       All\n",
       "time   day                                                             \n",
       "Dinner Fri   2.000000  2.222222  2.166667  0.139622  0.165347  0.158916\n",
       "       Sat   2.555556  2.476190  2.517241  0.158048  0.147906  0.153152\n",
       "       Sun   2.929825  2.578947  2.842105  0.160113  0.187250  0.166897\n",
       "       Thur  2.000000       NaN  2.000000  0.159744       NaN  0.159744\n",
       "Lunch  Fri   3.000000  1.833333  2.000000  0.187735  0.188937  0.188765\n",
       "       Thur  2.500000  2.352941  2.459016  0.160311  0.163863  0.161301\n",
       "All          2.668874  2.408602  2.569672  0.159328  0.163196  0.160803"
      ]
     },
     "execution_count": 140,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.pivot_table(index=[\"time\", \"day\"], columns=\"smoker\",\n",
    "                 values=[\"tip_pct\", \"size\"], margins=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>Fri</th>\n",
       "      <th>Sat</th>\n",
       "      <th>Sun</th>\n",
       "      <th>Thur</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>time</th>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Dinner</th>\n",
       "      <th>No</th>\n",
       "      <td>3.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>57.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>9.0</td>\n",
       "      <td>42.0</td>\n",
       "      <td>19.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Lunch</th>\n",
       "      <th>No</th>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>44.0</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>6.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>17.0</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <th></th>\n",
       "      <td>19.0</td>\n",
       "      <td>87.0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>62.0</td>\n",
       "      <td>244</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "day             Fri   Sat   Sun  Thur  All\n",
       "time   smoker                             \n",
       "Dinner No       3.0  45.0  57.0   1.0  106\n",
       "       Yes      9.0  42.0  19.0   NaN   70\n",
       "Lunch  No       1.0   NaN   NaN  44.0   45\n",
       "       Yes      6.0   NaN   NaN  17.0   23\n",
       "All            19.0  87.0  76.0  62.0  244"
      ]
     },
     "execution_count": 141,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.pivot_table(index=[\"time\", \"smoker\"], columns=\"day\",\n",
    "                 values=\"tip_pct\", aggfunc=len, margins=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>Fri</th>\n",
       "      <th>Sat</th>\n",
       "      <th>Sun</th>\n",
       "      <th>Thur</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "      <th>smoker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">Dinner</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">1</th>\n",
       "      <th>No</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.137931</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.325733</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2</th>\n",
       "      <th>No</th>\n",
       "      <td>0.139622</td>\n",
       "      <td>0.162705</td>\n",
       "      <td>0.168859</td>\n",
       "      <td>0.159744</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.171297</td>\n",
       "      <td>0.148668</td>\n",
       "      <td>0.207893</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <th>No</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.154661</td>\n",
       "      <td>0.152663</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">Lunch</th>\n",
       "      <th>3</th>\n",
       "      <th>Yes</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.204952</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">4</th>\n",
       "      <th>No</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.138919</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yes</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.155410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <th>No</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.121389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <th>No</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.173706</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>21 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "day                      Fri       Sat       Sun      Thur\n",
       "time   size smoker                                        \n",
       "Dinner 1    No      0.000000  0.137931  0.000000  0.000000\n",
       "            Yes     0.000000  0.325733  0.000000  0.000000\n",
       "       2    No      0.139622  0.162705  0.168859  0.159744\n",
       "            Yes     0.171297  0.148668  0.207893  0.000000\n",
       "       3    No      0.000000  0.154661  0.152663  0.000000\n",
       "...                      ...       ...       ...       ...\n",
       "Lunch  3    Yes     0.000000  0.000000  0.000000  0.204952\n",
       "       4    No      0.000000  0.000000  0.000000  0.138919\n",
       "            Yes     0.000000  0.000000  0.000000  0.155410\n",
       "       5    No      0.000000  0.000000  0.000000  0.121389\n",
       "       6    No      0.000000  0.000000  0.000000  0.173706\n",
       "\n",
       "[21 rows x 4 columns]"
      ]
     },
     "execution_count": 142,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tips.pivot_table(index=[\"time\", \"size\", \"smoker\"], columns=\"day\",\n",
    "                 values=\"tip_pct\", fill_value=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cross-Tabulations: Crosstab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {},
   "outputs": [],
   "source": [
    "from io import StringIO\n",
    "\n",
    "data = \"\"\"Sample  Nationality  Handedness\n",
    "1   USA  Right-handed\n",
    "2   Japan    Left-handed\n",
    "3   USA  Right-handed\n",
    "4   Japan    Right-handed\n",
    "5   Japan    Left-handed\n",
    "6   Japan    Right-handed\n",
    "7   USA  Right-handed\n",
    "8   USA  Left-handed\n",
    "9   Japan    Right-handed\n",
    "10  USA  Right-handed\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sample</th>\n",
       "      <th>Nationality</th>\n",
       "      <th>Handedness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>USA</td>\n",
       "      <td>Right-handed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Japan</td>\n",
       "      <td>Left-handed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>USA</td>\n",
       "      <td>Right-handed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Japan</td>\n",
       "      <td>Right-handed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Japan</td>\n",
       "      <td>Left-handed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>Japan</td>\n",
       "      <td>Right-handed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>USA</td>\n",
       "      <td>Right-handed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>USA</td>\n",
       "      <td>Left-handed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>Japan</td>\n",
       "      <td>Right-handed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>USA</td>\n",
       "      <td>Right-handed</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Sample Nationality    Handedness\n",
       "0       1         USA  Right-handed\n",
       "1       2       Japan   Left-handed\n",
       "2       3         USA  Right-handed\n",
       "3       4       Japan  Right-handed\n",
       "4       5       Japan   Left-handed\n",
       "5       6       Japan  Right-handed\n",
       "6       7         USA  Right-handed\n",
       "7       8         USA   Left-handed\n",
       "8       9       Japan  Right-handed\n",
       "9      10         USA  Right-handed"
      ]
     },
     "execution_count": 144,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_table(StringIO(data), sep=\"\\s+\")\n",
    "\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Handedness</th>\n",
       "      <th>Left-handed</th>\n",
       "      <th>Right-handed</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Nationality</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Japan</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>USA</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <td>3</td>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Handedness   Left-handed  Right-handed  All\n",
       "Nationality                                \n",
       "Japan                  2             3    5\n",
       "USA                    1             4    5\n",
       "All                    3             7   10"
      ]
     },
     "execution_count": 145,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(data[\"Nationality\"], data[\"Handedness\"], margins=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>smoker</th>\n",
       "      <th>No</th>\n",
       "      <th>Yes</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>time</th>\n",
       "      <th>day</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">Dinner</th>\n",
       "      <th>Fri</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sat</th>\n",
       "      <td>45</td>\n",
       "      <td>42</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sun</th>\n",
       "      <td>57</td>\n",
       "      <td>19</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thur</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Lunch</th>\n",
       "      <th>Fri</th>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thur</th>\n",
       "      <td>44</td>\n",
       "      <td>17</td>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <th></th>\n",
       "      <td>151</td>\n",
       "      <td>93</td>\n",
       "      <td>244</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "smoker        No  Yes  All\n",
       "time   day                \n",
       "Dinner Fri     3    9   12\n",
       "       Sat    45   42   87\n",
       "       Sun    57   19   76\n",
       "       Thur    1    0    1\n",
       "Lunch  Fri     1    6    7\n",
       "       Thur   44   17   61\n",
       "All          151   93  244"
      ]
     },
     "execution_count": 146,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab([tips[\"time\"], tips[\"day\"]], tips[\"smoker\"], margins=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 연습문제"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "참고: [(실습) 데이터 그룹화](https://colab.research.google.com/github/codingalzi/datapy/blob/master/practices/practice-pandas_6.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}