pandas教程-DataFram对象（2） • 前端技术分享

DataFram：DataFram对象类似于二维表，由行和列组成。与Series一样支持多种数据类型。一个表格型的数据结构，包含有一组有序的列，每列可以是不同的值类型(数值、字符串、布尔型等)，DataFrame即有行索引也有列索引，可以被看做是由Series组成的字典。

安装pandas

在读取excel文件时还需安装依赖库：

pip install pandas
pip install xlrd  # 读取excel
pip install xlwt  # 写入excel

引入并创建对象

创建DataFram对象：pd.DataFram(data,index,columns,dtype,copy)

data:输入的数据，可以是 ndarray，series，list，dict，标量以及一个 DataFrame。
index:行索引,如果没有传递 index 值，则默认行标签是 np.arange(n)，n 代表 data 的元素个数。默认从0开始
columns:列索引,如果没有传递 columns 值，则默认列标签是 np.arange(n)
dtype:每一列的数据类型
copy:默认为 False，表示复制数据 data。

import pandas as pd # 别名pd
# 列表方式创建DataFram对象
data = [['Mate50pro',8999.99,100],['无线鼠标',205.6,500],['充电宝',88.8,1001]]
# 创建列索引
columns = ['名称','价格','数量']
df = pd.DataFrame(data=data,columns=columns)
print(df)
print(type(df))
'''
         名称       价格    数量
0  Mate50pro  8999.99   100
1       无线鼠标   205.60   500
2        充电宝    88.80  1001

<class 'pandas.core.frame.DataFrame'>
'''
# 字典方式创建DataFram对象
data2 = {
    '名称':['Mate50pro','无线鼠标','充电宝'],
    '价格':[8999.99,205.6,88.8],
    '数量':[100,500,1001]
}
df2 = pd.DataFrame(data=data2)
print(df2)

# 列表嵌套字典创建DataFram对象
data3 = [
    {'名称':'Mate50pro','价格':8999},
    {'名称':'无线鼠标','价格':205,'数量':500},
    {'名称':'充电宝','价格':88.8,'数量':1001}
]
df3 = pd.DataFrame(data=data3)
print(df3)
# 如果其中某个元素值缺失，也就是字典的 key 无法找到对应的 value，将使用 NaN 代替
'''
          名称      价格      数量
0  Mate50pro  8999.0     NaN
1       无线鼠标   205.0   500.0
2        充电宝    88.8  1001.0
'''

# Series创建DataFram
data4 = {'one':pd.Series([1,2,3],index=['a','b','c']),'two':pd.Series([1,2,3,4],index=['a','b','c','d'])}
df4 = pd.DataFrame(data=data4)
print(df4)
'''
   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4
'''

手动修改索引值：

# 手动修改索引
s = pd.Series(data=data,index=[1,2,3])
print(s)
'''
1    张三
2    李四
3    王二
dtype: object
'''

列索引操作

DataFrame 可以使用列索（columns index）引来完成数据的选取、添加和删除操作。

列索引选取数据列

通过访问列名的方式来获取列数据

cd = {'one':[1,2,3],'two':[4,5,6]}
df = pd.DataFrame(data=cd,index=['a','b','c'])
print(df['one'])
'''
a    1
b    2
c    3
Name: one, dtype: int64
'''

列索引添加数、合并据列

使用 columns 列索引表标签可以实现添加新的数据列# 使用df[‘列’]=值，插入新的数据列

cd = {'one':[1,2,3],'two':[4,5,6]}
df = pd.DataFrame(data=cd,index=['a','b','c'])
df['three'] = [7,8,9]
print(df)
'''
   one  two  three
a    1    4      7
b    2    5      8
c    3    6      9
'''
# 将已经存在的数据进行相加
df['four'] = df['one']+df['three']
print(df)
'''
   one  two  three  four
a    1    4      7     8
b    2    5      8    10
c    3    6      9    12
'''
# 同时也可以使用insert()方法插入一列
# 数值1代表插入到columns列表的索引位置,column表示列名，value要写入的数据
df.insert(1,column='five',value=[10,11,12])
print(df)
'''
   one  five  two  three  four
a    1    10    4      7     8
b    2    11    5      8    10
c    3    12    6      9    12
'''

列索引删除操作

通过del和pop()都可以进行DataFram中的列删除

cd = {'one':[1,2,3],'two':[4,5,6]，'three':[7,8,9]}
df = pd.DataFrame(data=cd,index=['a','b','c'])
del df['one']
print(df)
'''
  two  three 
a  4      7 
b  5      8 
c  6      9 
'''
df.pop('two')
print(df)
'''
    three 
a     7 
b     8 
c     9 
'''

行索引操作

loc函数

通过loc[]函数获取对应行数据。只能使用标签索引，不能使用整数索引。当通过标签索引的切片方式来筛选数据时，它的取值前闭后闭，也就是只包括边界值标签（开始和结束）,loc[] 接受两个参数，并以’,’分隔。第一个位置表示行，第二个位置表示列。

data = {'one':[1,2,3],'two':[4,5,6],'three':[7,8,9]}
df = pd.DataFrame(data=data,index=['a','b','c'])
print(df.loc['a'])
'''
one      1
two      4
three    7
Name: a, dtype: int64
'''
# 表示a行three列的数据
print(df.loc['a','three']) # 1
print(df.loc['a':'b',:]) # 等同于df.loc['a':'b']
'''
   one  two  three
a    1    4      7
b    2    5      8

'''

iloc函数

iloc[]函数只能使用整数索引，不能使用标签索引，通过整数索引切片选择数据时，前闭后开(不包含边界结束值)。同 Python 和 NumPy 一样，它们的索引都是从 0 开始。

iloc[] 提供了以下方式来选择数据：

整数索引
整数列表
数值范围

data = {'one':[1,2,3],'two':[4,5,6],'three':[7,8,9]}
df = pd.DataFrame(data=data,index=['a','b','c'])

# 通过整数索引获取，默认从0开始
print(df.iloc[2])
'''
one      3
two      6
three    9
Name: c, dtype: int64

'''
# 左闭右开,下面表示打印第2行
print(df.iloc[1:2])
# 打印从第2行开始往后的数据
print(df.iloc[1:,])
'''
   one  two  three
b    2    5      8
c    3    6      9
'''

添加行

使用 concat() 函数，可以将新的数据行添加到 DataFrame 中，该函数会在行末追加数据行。

data = {'one':[1,2,3],'two':[4,5,6],'three':[7,8,9]}
data2 = {'one':[4,5,6],'two':[7,8,9],'three':[10,11,12]}
df1 = pd.DataFrame(data=data,index=['a','b','c'])
df2 = pd.DataFrame(data=data2,index=['d','e','f'])
df1 = pd.concat([df1,df2])
print(df1)
'''
   one  two  three
a    1    4      7
b    2    5      8
c    3    6      9
d    4    7     10
e    5    8     11
f    6    9     12

'''

删除行

drop(labels, axis=0, level=None, inplace=False, errors=’raise’)方法。其中labels表示要删除的行或列的标签，无默认值，axis表示要删除行还是列，0表示行，1表示列，默认为0。

data = {'one':[1,2,3],'two':[4,5,6],'three':[7,8,9]}
data2 = {'one':[4,5,6],'two':[7,8,9],'three':[10,11,12]}
df1 = pd.DataFrame(data=data,index=['a','b','c'])
df2 = pd.DataFrame(data=data2,index=['d','e','f'])
df1 = pd.concat([df1,df2])
# 删除第a行，如果有相同的行标签，则同时删除
df1 = df1.drop('a')
print(df1)
'''
   one  two  three
b    2    5      8
c    3    6      9
d    4    7     10
e    5    8     11
f    6    9     12
'''

### 常用属性和方法

DataFrame 的属性和方法，与 Series 相差无几，相同的可以看上一篇。重点讲一下T和shift()方法。

T ：行和列转置。
axes：返回一个仅以行轴标签和列轴标签为成员的列表。
dtypes ：返回每列数据的数据类型。
empty ：DataFrame中没有数据或者任意坐标轴的长度为0，则返回True。
ndim ：轴的数量，也指数组的维数。
shape ：返回一个元组，表示了 DataFrame 维度。
size：DataFrame中的元素数量。
values ：使用 numpy 数组表示 DataFrame 中的元素值。
head() ：返回前 n 行数据。
tail()：返回后 n 行数据。
shift() ：将行或列移动指定的步幅长度

# T 行和列转置
data = {'one':[1,2,3],'two':[4,5,6],'three':[7,8,9]}
df = pd.DataFrame(data=data,index=['a','b','c'])

print(df.T)
'''
       a  b  c
one    1  2  3
two    4  5  6
three  7  8  9
'''

# shift(periods=1, freq=None, axis=0,fill_value)移动行或列
# peroids   类型为int，表示移动的幅度，可以是正数，也可以是负数，默认值为1。
# freq  日期偏移量，默认值为None，适用于时间序。取值为符合时间规则的字符串。
# axis  如果是 0 或者 "index" 表示上下移动，如果是 1 或者 "columns" 则会左右移动。
# fill_value    该参数用来填充缺失值。
data = {'one':[1,2,3],'two':[4,5,6],'three':[7,8,9]}
df = pd.DataFrame(data=data,index=['a','b','c'])
print(df.shift(periods=2))
'''
  one  two  three
a  NaN  NaN    NaN
b  NaN  NaN    NaN
c  1.0  4.0    7.0
'''
# 左右移动
print(df.shift(periods=2,axis=1))
'''
   one  two  three
a  NaN  NaN      1
b  NaN  NaN      2
c  NaN  NaN      3
'''
print(df.shift(periods=2,fill_value='10'))
'''
  one two three
a  10  10    10
b  10  10    10
c   1   4     7
'''