pandas
: DataFrame¶Ha Khanh Nguyen (hknguyen)
csv
(comma separated value) files.read_csv()
to import data into Python.import pandas as pd
ramen = pd.read_csv('https://stat430.hknguyen.org/files/datasets/clean-ramen.csv')
ramen
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
0 | New Touch | T's Restaurant Tantanmen | Cup | Japan | 3.75 |
1 | Just Way | Noodles Spicy Hot Sesame Spicy Hot Sesame Guan... | Pack | Taiwan | 1.00 |
2 | Nissin | Cup Noodles Chicken Vegetable | Cup | USA | 2.25 |
3 | Wei Lih | GGE Ramen Snack Tomato Flavor | Pack | Taiwan | 2.75 |
4 | Ching's Secret | Singapore Curry | Pack | India | 3.75 |
... | ... | ... | ... | ... | ... |
2570 | Vifon | Hu Tiu Nam Vang ["Phnom Penh" style] Asian Sty... | Bowl | Vietnam | 3.50 |
2571 | Wai Wai | Oriental Style Instant Noodles | Pack | Thailand | 1.00 |
2572 | Wai Wai | Tom Yum Shrimp | Pack | Thailand | 2.00 |
2573 | Wai Wai | Tom Yum Chili Flavor | Pack | Thailand | 2.00 |
2574 | Westbrae | Miso Ramen | Pack | USA | 0.50 |
2575 rows × 5 columns
read_csv()
function is a pandas DataFrame.ramen
dataset, there are 2575 observations and 5 attributes, which are:Brand
Variety
Style
Country
Stars
read_csv()
.data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002, 2003],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)
frame
state | year | pop | |
---|---|---|---|
0 | Ohio | 2000 | 1.5 |
1 | Ohio | 2001 | 1.7 |
2 | Ohio | 2002 | 3.6 |
3 | Nevada | 2001 | 2.4 |
4 | Nevada | 2002 | 2.9 |
5 | Nevada | 2003 | 3.2 |
ramen['Brand']
0 New Touch 1 Just Way 2 Nissin 3 Wei Lih 4 Ching's Secret ... 2570 Vifon 2571 Wai Wai 2572 Wai Wai 2573 Wai Wai 2574 Westbrae Name: Brand, Length: 2575, dtype: object
type(ramen['Brand'])
pandas.core.series.Series
describe()
of the DataFrame object to get the summary statistics of the variables in the DataFrame.ramen.describe()
Stars | |
---|---|
count | 2575.000000 |
mean | 3.654893 |
std | 1.015641 |
min | 0.000000 |
25% | 3.250000 |
50% | 3.750000 |
75% | 4.250000 |
max | 5.000000 |
ramen.dtypes
Brand object Variety object Style object Country object Stars float64 dtype: object
unique()
function to find out!ramen['Country']
0 Japan 1 Taiwan 2 USA 3 Taiwan 4 India ... 2570 Vietnam 2571 Thailand 2572 Thailand 2573 Thailand 2574 USA Name: Country, Length: 2575, dtype: object
ramen.Country
0 Japan 1 Taiwan 2 USA 3 Taiwan 4 India ... 2570 Vietnam 2571 Thailand 2572 Thailand 2573 Thailand 2574 USA Name: Country, Length: 2575, dtype: object
[]
:ramen[['Brand', 'Country']]
Brand | Country | |
---|---|---|
0 | New Touch | Japan |
1 | Just Way | Taiwan |
2 | Nissin | USA |
3 | Wei Lih | Taiwan |
4 | Ching's Secret | India |
... | ... | ... |
2570 | Vifon | Vietnam |
2571 | Wai Wai | Thailand |
2572 | Wai Wai | Thailand |
2573 | Wai Wai | Thailand |
2574 | Westbrae | USA |
2575 rows × 2 columns
Notes:
[]
is a pandas Series.[]
is a pandas DataFrame.dplyr
) or a DataFrame.BIG NOTES: As DataFrame is a dict of Series (where each key is the column name and each value is a Series), we cannot access the row using []
.
iloc
¶iloc
which is short for interger-location based indexing.iloc
syntax: dataframe.iloc[<row index>, <column index>]
.Stars
rating of the first observation (index 0):ramen.iloc[0, 4]
3.75
ramen.iloc[0, 1:5]
Variety T's Restaurant Tantanmen Style Cup Country Japan Stars 3.75 Name: 0, dtype: object
ramen.iloc[0, :]
Brand New Touch Variety T's Restaurant Tantanmen Style Cup Country Japan Stars 3.75 Name: 0, dtype: object
iloc
?type(ramen.iloc[0, 4])
numpy.float64
type(ramen.iloc[0, 1:5])
pandas.core.series.Series
NumPy
data structures and types in pandas
? There is a strong relationship between the 2 famous libraries.ramen.iloc[0:5, 4]
0 3.75 1 1.00 2 2.25 3 2.75 4 3.75 Name: Stars, dtype: float64
ramen.iloc[[0, 5, 10, 15], :]
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
0 | New Touch | T's Restaurant Tantanmen | Cup | Japan | 3.75 |
5 | Samyang Foods | Kimchi song Song Ramen | Pack | South Korea | 4.75 |
10 | Tao Kae Noi | Creamy tom Yum Kung Flavour | Pack | Thailand | 5.00 |
15 | KOKA | Mushroom Flavour Instant Noodles | Cup | Singapore | 3.50 |
loc
¶i
in iloc
stands for integer. That is why with iloc
, we always use numbers for indexing.loc
, we use label (names) or a Boolean list/array for indexing instead.ramen.loc[0, 'Stars']
3.75
Try to use loc
to select the following rows, columns:
ramen
and only print out the rows where Country
is USA
.ramen['Country'] == 'USA'
0 False 1 False 2 True 3 False 4 False ... 2570 False 2571 False 2572 False 2573 False 2574 True Name: Country, Length: 2575, dtype: bool
# method 1
ramen[ramen['Country'] == 'USA']
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
2 | Nissin | Cup Noodles Chicken Vegetable | Cup | USA | 2.25 |
11 | Yamachan | Yokohama Tonkotsu Shoyu | Pack | USA | 5.00 |
17 | Yamachan | Tokyo Shoyu Ramen | Pack | USA | 5.00 |
21 | Jackpot Teriyaki | Beef Ramen | Pack | USA | 5.00 |
23 | Yamachan | Sapporo Miso Ramen | Pack | USA | 4.75 |
... | ... | ... | ... | ... | ... |
2511 | Sapporo Ichiban | Chicken Flavor | Pack | USA | 3.50 |
2541 | Maruchan | Ramen Noodle Soup Shrimp | Pack | USA | 2.00 |
2552 | Nissin | Top Ramen Creamy Chicken | Pack | USA | 4.50 |
2565 | Smack | Vegetable Beef | Pack | USA | 1.50 |
2574 | Westbrae | Miso Ramen | Pack | USA | 0.50 |
323 rows × 5 columns
# method 2
ramen.loc[ramen['Country'] == 'USA']
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
2 | Nissin | Cup Noodles Chicken Vegetable | Cup | USA | 2.25 |
11 | Yamachan | Yokohama Tonkotsu Shoyu | Pack | USA | 5.00 |
17 | Yamachan | Tokyo Shoyu Ramen | Pack | USA | 5.00 |
21 | Jackpot Teriyaki | Beef Ramen | Pack | USA | 5.00 |
23 | Yamachan | Sapporo Miso Ramen | Pack | USA | 4.75 |
... | ... | ... | ... | ... | ... |
2511 | Sapporo Ichiban | Chicken Flavor | Pack | USA | 3.50 |
2541 | Maruchan | Ramen Noodle Soup Shrimp | Pack | USA | 2.00 |
2552 | Nissin | Top Ramen Creamy Chicken | Pack | USA | 4.50 |
2565 | Smack | Vegetable Beef | Pack | USA | 1.50 |
2574 | Westbrae | Miso Ramen | Pack | USA | 0.50 |
323 rows × 5 columns
Brand
is Nissin
?ramen.loc[(ramen['Country'] == 'USA') & (ramen['Brand'] == 'Nissin')]
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
2 | Nissin | Cup Noodles Chicken Vegetable | Cup | USA | 2.25 |
38 | Nissin | Cup Noodles Very Veggie Spicy Chicken Flavor R... | Cup | USA | 5.00 |
41 | Nissin | Cup Noodles Very Veggie Beef Flavor Ramen Nood... | Cup | USA | 5.00 |
44 | Nissin | Cup Noodles Very Veggie Chicken Flavor Ramen N... | Cup | USA | 5.00 |
195 | Nissin | Cup Noodles Beef Flavor Ramen Noodle Soup (New... | Cup | USA | 3.50 |
... | ... | ... | ... | ... | ... |
2422 | Nissin | Creamy Chicken | Cup | USA | 1.75 |
2434 | Nissin | Chow Mein Teriyaki Beef | Tray | USA | 4.50 |
2438 | Nissin | Bowl Noodles Rich & Savory Chicken | Bowl | USA | 1.75 |
2510 | Nissin | Top Ramen Oriental | Pack | USA | 2.50 |
2552 | Nissin | Top Ramen Creamy Chicken | Pack | USA | 4.50 |
95 rows × 5 columns
# or
ramen[(ramen['Country'] == 'USA') & (ramen['Brand'] == 'Nissin')]
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
2 | Nissin | Cup Noodles Chicken Vegetable | Cup | USA | 2.25 |
38 | Nissin | Cup Noodles Very Veggie Spicy Chicken Flavor R... | Cup | USA | 5.00 |
41 | Nissin | Cup Noodles Very Veggie Beef Flavor Ramen Nood... | Cup | USA | 5.00 |
44 | Nissin | Cup Noodles Very Veggie Chicken Flavor Ramen N... | Cup | USA | 5.00 |
195 | Nissin | Cup Noodles Beef Flavor Ramen Noodle Soup (New... | Cup | USA | 3.50 |
... | ... | ... | ... | ... | ... |
2422 | Nissin | Creamy Chicken | Cup | USA | 1.75 |
2434 | Nissin | Chow Mein Teriyaki Beef | Tray | USA | 4.50 |
2438 | Nissin | Bowl Noodles Rich & Savory Chicken | Bowl | USA | 1.75 |
2510 | Nissin | Top Ramen Oriental | Pack | USA | 2.50 |
2552 | Nissin | Top Ramen Creamy Chicken | Pack | USA | 4.50 |
95 rows × 5 columns
# step 1
ramen.loc[(ramen['Brand'] == 'Nissin') & (ramen['Country'] == 'USA') & (ramen['Stars'] == 1.5)]
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
2313 | Nissin | Chow Mein Kung Pao Chicken | Tray | USA | 1.5 |
# step 2
ramen.loc[(ramen['Brand'] == 'Nissin') & (ramen['Country'] == 'USA') & (ramen['Stars'] == 1.5), 'Stars'] = 2.5
ramen.loc[(ramen['Brand'] == 'Nissin') & (ramen['Country'] == 'USA') & (ramen['Stars'] <= 2.5)]
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
2 | Nissin | Cup Noodles Chicken Vegetable | Cup | USA | 2.250 |
199 | Nissin | Cup Noodles Hot & Spicy Shrimp Flavor Ramen No... | Cup | USA | 2.500 |
2197 | Nissin | Chow Mein Thai Peanut | Tray | USA | 2.500 |
2216 | Nissin | Cup Noodles Hearty Chicken | Cup | USA | 2.500 |
2238 | Nissin | Big Cup Noodles Chicken | Cup | USA | 2.250 |
2242 | Nissin | Big Cup Noodles Beef | Cup | USA | 2.125 |
2243 | Nissin | Big Cup Noodles Shrimp | Cup | USA | 2.250 |
2313 | Nissin | Chow Mein Kung Pao Chicken | Tray | USA | 2.500 |
2393 | Nissin | Bowl Noodles Hot & Spicy Chicken | Bowl | USA | 2.000 |
2422 | Nissin | Creamy Chicken | Cup | USA | 1.750 |
2438 | Nissin | Bowl Noodles Rich & Savory Chicken | Bowl | USA | 1.750 |
2510 | Nissin | Top Ramen Oriental | Pack | USA | 2.500 |
[]
instead of loc
:ramen = pd.read_csv('https://stat107.hknguyen.org/files/datasets/clean-ramen.csv')
ramen.loc[(ramen['Brand'] == 'Nissin') & (ramen['Country'] == 'USA') & (ramen['Stars'] == 1.5)]
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
2313 | Nissin | Chow Mein Kung Pao Chicken | Tray | USA | 1.5 |
ramen[(ramen['Brand'] == 'Nissin') & (ramen['Country'] == 'USA') & (ramen['Stars'] == 1.5)]['Stars'] = 2.5
<ipython-input-27-7c925de0f5d3>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ramen[(ramen['Brand'] == 'Nissin') & (ramen['Country'] == 'USA') & (ramen['Stars'] == 1.5)]['Stars'] = 2.5
ramen.loc[(ramen['Brand'] == 'Nissin') & (ramen['Country'] == 'USA') & (ramen['Stars'] <= 2.5)]
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
2 | Nissin | Cup Noodles Chicken Vegetable | Cup | USA | 2.250 |
199 | Nissin | Cup Noodles Hot & Spicy Shrimp Flavor Ramen No... | Cup | USA | 2.500 |
2197 | Nissin | Chow Mein Thai Peanut | Tray | USA | 2.500 |
2216 | Nissin | Cup Noodles Hearty Chicken | Cup | USA | 2.500 |
2238 | Nissin | Big Cup Noodles Chicken | Cup | USA | 2.250 |
2242 | Nissin | Big Cup Noodles Beef | Cup | USA | 2.125 |
2243 | Nissin | Big Cup Noodles Shrimp | Cup | USA | 2.250 |
2313 | Nissin | Chow Mein Kung Pao Chicken | Tray | USA | 1.500 |
2393 | Nissin | Bowl Noodles Hot & Spicy Chicken | Bowl | USA | 2.000 |
2422 | Nissin | Creamy Chicken | Cup | USA | 1.750 |
2438 | Nissin | Bowl Noodles Rich & Savory Chicken | Bowl | USA | 1.750 |
2510 | Nissin | Top Ramen Oriental | Pack | USA | 2.500 |
ramen = pd.read_csv(...)
ramen['Stars']
ramen['Stars']
ramen['Stars'] = 0
ramen[ramen['Brand'] == 'Nissin']['Stars']
This lecture notes reference materials from Chapter 5 of Wes McKinney's Python for Data Analysis 2nd Ed.