目錄
- 1.read_excel函數(shù)原型
- 2.參數(shù)使用舉例
- 2.1. io和sheet_name參數(shù)
- 2.2. header參數(shù)
- 2.3. skipfooter參數(shù)
- 2.5. parse_dates參數(shù)
- 2.6. converters參數(shù)
- 2.7. na_values參數(shù)
- 2.8. usecols參數(shù)
- 總結(jié)
Pandas read_excel()參數(shù)使用詳解
1.read_excel函數(shù)原型
def read_excel(io, sheet_name=0, header=0, names=None, index_col=None, parse_cols=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skip_footer=0, skipfooter=0, convert_float=True, mangle_dupe_cols=True, **kwds)
參數(shù)說(shuō)明:
2.參數(shù)使用舉例
2.1. io和sheet_name參數(shù)
【例1】通過(guò)io和sheet_name讀取Excel表
records.xlsx內(nèi)容:
date val percent2014/3/1 0.947014982 10%2014/6/1 0.746103818 11%2014/9/1 0.736764841 12%2014/12/1 0.724937624 13%2015/3/1 0.85043738 14%2015/6/1 0.332503212 15%2015/9/1 0.75289366 16%2015/12/1 0.358275104 17%2016/3/1 0.077250716 18%2016/6/1 0.436182277 19%2016/9/1 0.424714671 20%2016/12/1 0.842471104 21%2017/3/1 0.740035625 22%2017/6/1 0.183588529 23%2017/9/1 0.143363207 24%
Code:
In [166]: import pandas as pd ...: df = pd.read_excel(io="records.xlsx", sheet_name="Sheet1") ...: df ...:Out[166]: date val percent0 2014/3/1 0.947015 10%1 2014/6/1 0.746104 11%2 2014/9/1 0.736765 12%3 2014/12/1 0.724938 13%4 2015/3/1 0.850437 14%5 2015/6/1 0.332503 15%6 2015/9/1 0.752894 16%7 2015/12/1 0.358275 17%8 2016/3/1 0.077251 18%9 2016/6/1 0.436182 19%10 2016/9/1 0.424715 20%11 2016/12/1 0.842471 21%12 2017/3/1 0.740036 22%13 2017/6/1 0.183589 23%14 2017/9/1 0.143363 24%
說(shuō)明:此處io和sheet_name參數(shù)都可以不明確指定,直接使用:
df = pd.read_excel("records.xlsx", "Sheet1")
如果records.xlsx文件只有一張表,或者要讀取得數(shù)據(jù)表為第一張表,sheet_name參數(shù)可以省略:
df = pd.read_excel("records.xlsx")
2.2. header參數(shù)
【例2】通過(guò)header參數(shù)指定表頭位置
records.xlsx內(nèi)容:
2020年XXX表date val percent2014/3/1 0.947014982 10%2014/6/1 0.746103818 11%2014/9/1 0.736764841 12%2014/12/1 0.724937624 13%2015/3/1 0.85043738 14%2015/6/1 0.332503212 15%2015/9/1 0.75289366 16%2015/12/1 0.358275104 17%2016/3/1 0.077250716 18%2016/6/1 0.436182277 19%2016/9/1 0.424714671 20%2016/12/1 0.842471104 21%2017/3/1 0.740035625 22%2017/6/1 0.183588529 23%2017/9/1 0.143363207 24%
我們?cè)凇纠?】得基礎(chǔ)上為records.xlsx得“Sheet1”表增加了一行表頭說(shuō)明,如果繼續(xù)使用【例1】得代碼,得到得結(jié)果是這樣得:
In [169]: import pandas as pd ...: df = pd.read_excel("records.xlsx", "Sheet1") ...: df ...:Out[169]: 2020年XXX表 Unnamed: 1 Unnamed: 20 date val percent1 2014/3/1 0.947015 10%2 2014/6/1 0.746104 11%3 2014/9/1 0.736765 12%4 2014/12/1 0.724938 13%5 2015/3/1 0.850437 14%6 2015/6/1 0.332503 15%7 2015/9/1 0.752894 16%8 2015/12/1 0.358275 17%9 2016/3/1 0.077251 18%10 2016/6/1 0.436182 19%11 2016/9/1 0.424715 20%12 2016/12/1 0.842471 21%13 2017/3/1 0.740036 22%14 2017/6/1 0.183589 23%15 2017/9/1 0.143363 24%
這樣得到得列標(biāo)及數(shù)據(jù)都不是我們想要得,這種情況下就需要通過(guò)header參數(shù)來(lái)指定表頭了,注意到表頭是在第2行,根據(jù)header參數(shù)得說(shuō)明可知,行號(hào)是從0開始計(jì)算得,所以header參數(shù)應(yīng)該為1.
Code:
In [170]: import pandas as pd ...: df = pd.read_excel("records.xlsx", "Sheet1", header=1) ...: df ...:Out[170]: date val percent0 2014/3/1 0.947015 10%1 2014/6/1 0.746104 11%2 2014/9/1 0.736765 12%3 2014/12/1 0.724938 13%4 2015/3/1 0.850437 14%5 2015/6/1 0.332503 15%6 2015/9/1 0.752894 16%7 2015/12/1 0.358275 17%8 2016/3/1 0.077251 18%9 2016/6/1 0.436182 19%10 2016/9/1 0.424715 20%11 2016/12/1 0.842471 21%12 2017/3/1 0.740036 22%
2.3. skipfooter參數(shù)
【例3】通過(guò)skipfooter參數(shù)忽略表尾數(shù)據(jù)
有時(shí)我們得數(shù)據(jù)是從第3方獲取到得,往往會(huì)在表得末尾添加一行“數(shù)據(jù)來(lái)源:xxx”.如:
2020年XXX表date val percent2014/3/1 0.947014982 10%2014/6/1 0.746103818 11%2014/9/1 0.736764841 12%2014/12/1 0.724937624 13%2015/3/1 0.85043738 14%2015/6/1 0.332503212 15%2015/9/1 0.75289366 16%2015/12/1 0.358275104 17%2016/3/1 0.077250716 18%2016/6/1 0.436182277 19%2016/9/1 0.424714671 20%2016/12/1 0.842471104 21%2017/3/1 0.740035625 22%2017/6/1 0.183588529 23%2017/9/1 0.143363207 24%數(shù)據(jù)來(lái)源: XXX
這種情況下,可以通過(guò)skipfooter參數(shù)來(lái)忽略該數(shù)據(jù)。
Code:
In [173]: import pandas as pd ...: df = pd.read_excel("records.xlsx", "Sheet1", header=1, skipfooter=1) ...: df ...:Out[173]: date val percent0 2014/3/1 0.947015 10%1 2014/6/1 0.746104 11%2 2014/9/1 0.736765 12%3 2014/12/1 0.724938 13%4 2015/3/1 0.850437 14%5 2015/6/1 0.332503 15%6 2015/9/1 0.752894 16%7 2015/12/1 0.358275 17%8 2016/3/1 0.077251 18%9 2016/6/1 0.436182 19%10 2016/9/1 0.424715 20%11 2016/12/1 0.842471 21%12 2017/3/1 0.740036 22%13 2017/6/1 0.183589 23%14 2017/9/1 0.143363 24%2.4. index_col參數(shù)
【例4】通過(guò)index_col參數(shù)指定DataFrame index
在【例3】中,查看我們讀取得到得DataFrame得索引:
In [174]: df.indexOut[174]: RangeIndex(start=0, stop=15, step=1)
它是一個(gè)自動(dòng)添加得整型索引,但如果現(xiàn)在我想要使用“date”列作為索引,可以通過(guò)index_col參數(shù)指定:
In [175]: import pandas as pd ...: df = pd.read_excel("records.xlsx", "Sheet1", header=1, skipfooter=1,index_col=0) ...: df ...:Out[175]: val percentdate2014/3/1 0.947015 10%2014/6/1 0.746104 11%2014/9/1 0.736765 12%2014/12/1 0.724938 13%2015/3/1 0.850437 14%2015/6/1 0.332503 15%2015/9/1 0.752894 16%2015/12/1 0.358275 17%2016/3/1 0.077251 18%2016/6/1 0.436182 19%2016/9/1 0.424715 20%2016/12/1 0.842471 21%2017/3/1 0.740036 22%2017/6/1 0.183589 23%2017/9/1 0.143363 24%In [176]: df.indexOut[176]:Index(['2014/3/1', '2014/6/1', '2014/9/1', '2014/12/1', '2015/3/1', '2015/6/1', '2015/9/1', '2015/12/1', '2016/3/1', '2016/6/1', '2016/9/1', '2016/12/1', '2017/3/1', '2017/6/1', '2017/9/1'], dtype='object', name='date')
或者改成這樣:
df = pd.read_excel("records.xlsx", "Sheet1", header=1, skipfooter=1, index_col=“date”)
2.5. parse_dates參數(shù)
查看【例4】index得參數(shù)類型:
In [183]: type(df.index[0])Out[183]: str
發(fā)現(xiàn)并不是我們想要得日期類型,而是str。現(xiàn)在我們想把它轉(zhuǎn)換為日期類型,可選得一種方法就是通過(guò)parse_dates參數(shù)來(lái)實(shí)現(xiàn)。
【例5】parse_dates參數(shù)處理日期
Code:
In [184]: import pandas as pd ...: df = pd.read_excel("records.xlsx", "Sheet1", header=1, skipfooter=1,i ...: ndex_col="date", parse_dates=True) ...: df ...:Out[184]: val percentdate2014-03-01 0.947015 10%2014-06-01 0.746104 11%2014-09-01 0.736765 12%2014-12-01 0.724938 13%2015-03-01 0.850437 14%2015-06-01 0.332503 15%2015-09-01 0.752894 16%2015-12-01 0.358275 17%2016-03-01 0.077251 18%2016-06-01 0.436182 19%2016-09-01 0.424715 20%2016-12-01 0.842471 21%2017-03-01 0.740036 22%2017-06-01 0.183589 23%2017-09-01 0.143363 24%In [185]: type(df.index[0])Out[185]: pandas._libs.tslibs.timestamps.Timestamp
當(dāng)parase_date設(shè)置為True時(shí),默認(rèn)將index處理為日期類型。
如果要處理得列不是index列,可以通過(guò)parse_dates= "date"來(lái)實(shí)現(xiàn)。
如果要處理得列包含多個(gè),可以通過(guò)parse_dates= [“col1”,“col2”,…]來(lái)實(shí)現(xiàn)。
2.6. converters參數(shù)
在前面幾個(gè)例子中,我們發(fā)現(xiàn)percent列得數(shù)據(jù)都是xx%這樣得表示,且是str類型:
In [187]: type(df["percent"][0])Out[187]: str
str類型并不是我們所希望得,現(xiàn)在我們希望可以將之轉(zhuǎn)化為float類型,這可以通過(guò)converters參數(shù)來(lái)實(shí)現(xiàn)。
【例6】converters參數(shù)進(jìn)行數(shù)據(jù)類型轉(zhuǎn)換
Code:
In [189]: import pandas as pd ...: def convertPercent(val): ...: return float(val.split("%")[0])*0.01 ...: ...: df = pd.read_excel("records.xlsx", "Sheet1", header=1, skipfooter=1,i ...: ndex_col="date", parse_dates=True, converters={"percent":convertPerce ...: nt}) ...: df ...:Out[189]: val percentdate2014-03-01 0.947015 0.102014-06-01 0.746104 0.112014-09-01 0.736765 0.122014-12-01 0.724938 0.132015-03-01 0.850437 0.142015-06-01 0.332503 0.152015-09-01 0.752894 0.162015-12-01 0.358275 0.172016-03-01 0.077251 0.182016-06-01 0.436182 0.192016-09-01 0.424715 0.202016-12-01 0.842471 0.212017-03-01 0.740036 0.222017-06-01 0.183589 0.232017-09-01 0.143363 0.24
2.7. na_values參數(shù)
【例7】na_values參數(shù)處理na數(shù)據(jù)
很多時(shí)候,并不是所有得數(shù)據(jù)都是有效數(shù)據(jù),例如下表中2014/12/1和2016/6/1兩行得數(shù)據(jù)均為“–”:
2020年XXX表date val percent2014/3/1 0.947014982 10%2014/6/1 0.746103818 11%2014/9/1 0.736764841 12%2014/12/1 -- --2015/3/1 0.85043738 14%2015/6/1 0.332503212 15%2015/9/1 0.75289366 16%2015/12/1 0.358275104 17%2016/3/1 0.077250716 18%2016/6/1 -- --2016/9/1 0.424714671 20%2016/12/1 0.842471104 21%2017/3/1 0.740035625 22%2017/6/1 0.183588529 23%2017/9/1 0.143363207 24%數(shù)據(jù)來(lái)源: XXX
這種情況下可以通過(guò)na_values參數(shù)來(lái)處理。
Code
In [191]: import pandas as pd ...: df = pd.read_excel("records.xlsx", "Sheet1", header=1, skipfooter=1,i ...: ndex_col="date", parse_dates=True, na_values="--") ...: df ...:Out[191]: val percentdate2014-03-01 0.947015 10%2014-06-01 0.746104 11%2014-09-01 0.736765 12%2014-12-01 NaN NaN2015-03-01 0.850437 14%2015-06-01 0.332503 15%2015-09-01 0.752894 16%2015-12-01 0.358275 17%2016-03-01 0.077251 18%2016-06-01 NaN NaN2016-09-01 0.424715 20%2016-12-01 0.842471 21%2017-03-01 0.740036 22%2017-06-01 0.183589 23%2017-09-01 0.143363 24%
2.8. usecols參數(shù)
【例8】 usecols參數(shù)選擇列
當(dāng)我們只想處理數(shù)據(jù)表中得某些指定列時(shí),可以通過(guò)usecols參數(shù)來(lái)指定。例如,我只想處理"date"和"val"兩列數(shù)據(jù),可以這樣通過(guò)
usecols=["date","val"]
來(lái)指定。
Code
In [193]: import pandas as pd ...: df = pd.read_excel("records.xlsx", "Sheet1", header=1, skipfooter=1,i ...: ndex_col="date", parse_dates=True, na_values="--", usecols=["date","v ...: al"]) ...: df ...:Out[193]: valdate2014-03-01 0.9470152014-06-01 0.7461042014-09-01 0.7367652014-12-01 NaN2015-03-01 0.8504372015-06-01 0.3325032015-09-01 0.7528942015-12-01 0.3582752016-03-01 0.0772512016-06-01 NaN2016-09-01 0.4247152016-12-01 0.8424712017-03-01 0.7400362017-06-01 0.1835892017-09-01 0.143363
總結(jié)
到此這篇關(guān)于python Pandas庫(kù)read_excel()參數(shù)得內(nèi)容就介紹到這了,更多相關(guān)Pandas庫(kù)read_excel()參數(shù)內(nèi)容請(qǐng)搜索之家以前得內(nèi)容或繼續(xù)瀏覽下面得相關(guān)內(nèi)容希望大家以后多多支持之家!