配色: 字号:
python pandas模块
2016-11-05 | 阅:  转:  |  分享 
  
评:我跟作者的智商差距是有多大,才能让我用60分钟看完作者认为10分钟的内容。。。习惯上我们先导入:In[1]:importpand
asaspdIn[2]:importnumpyasnpIn[3]:importmatplotlib.pyplo
tasplt创建序列(Series),输入可为列表(list):In[4]:s=pd.Series([1,3,5,np
.nan,6,8])In[5]:sOut[5]:01.013.025.03NaN46
.058.0dtype:float64创建DataFrame:In[6]:dates=pd.date_range
(''20130101'',periods=6)In[7]:datesOut[7]:DatetimeIndex([''2013-
01-01'',''2013-01-02'',''2013-01-03'',''2013-01-04'',''2013-01-05'',''
2013-01-06''],dtype=''datetime64[ns]'',freq=''D'')In[8]:df=pd.Da
taFrame(np.random.randn(6,4),index=dates,columns=list(''ABCD''))I
n[9]:dfOut[9]:ABCD2013-01-010.46
9112-0.282863-1.509059-1.1356322013-01-021.212112-0.173215
0.119209-1.0442362013-01-03-0.861849-2.104569-0.4949291.07
18042013-01-040.721555-0.706771-1.0395750.2718602013-01-05
-0.4249720.5670200.276232-1.0874012013-01-06-0.6736900.11
3648-1.4784270.524988创建DataFrame,输入可为字典(dict):In[10]:df2=p
d.DataFrame({''A'':1.,....:''B'':pd.Times
tamp(''20130102''),....:''C'':pd.Series(1,in
dex=list(range(4)),dtype=''float32''),....:''
D'':np.array([3]4,dtype=''int32''),....:
''E'':pd.Categorical(["test","train","test","train"]),....:
''F'':''foo''})....:In[11]:df2Out[11]:A
BCDEF01.02013-01-021.03testfoo1
1.02013-01-021.03trainfoo21.02013-01-021.03t
estfoo31.02013-01-021.03trainfoo各列的类型:In[12]:df2.d
typesOut[12]:Afloat64Bdatetime64[ns]Cf
loat32Dint32EcategoryFobjectdt
ype:object如果你用的是IPython,tab键有自动补全功能(只列出一部分):In[13]:df2.d
f2.Adf2.boxplotdf2.absdf2.Cdf2.
adddf2.clipdf2.add_prefixdf2.clip_lowerd
f2.add_suffixdf2.clip_upperdf2.aligndf2.co
lumnsdf2.alldf2.combinedf2.anydf2
.combineAdddf2.appenddf2.combine_firstdf2.apply
df2.combineMultdf2.applymapdf2.compounddf2.as
_blocksdf2.consolidatedf2.asfreqdf2.conver
t_objectsdf2.as_matrixdf2.copydf2.astypedf
2.corrdf2.atdf2.corrwithdf2.at_timed
f2.countdf2.axesdf2.covdf2.Bdf2.
cummaxdf2.between_timedf2.cummindf2.bfilldf2
.cumproddf2.blocksdf2.cumsumdf2.boold
f2.D查看数据查看首尾:In[14]:df.head()Out[14]:ABC
D2013-01-010.469112-0.282863-1.509059-1.1356322013-01-
021.212112-0.1732150.119209-1.0442362013-01-03-0.861849-2
.104569-0.4949291.0718042013-01-040.721555-0.706771-1.0395
750.2718602013-01-05-0.4249720.5670200.276232-1.087401In
[15]:df.tail(3)Out[15]:ABCD2013-01-
040.721555-0.706771-1.0395750.2718602013-01-05-0.4249720
.5670200.276232-1.0874012013-01-06-0.6736900.113648-1.4784
270.524988显示横(index)、纵(columns)、值(numpy):In[16]:df.indexOut[1
6]:DatetimeIndex([''2013-01-01'',''2013-01-02'',''2013-01-03'',''201
3-01-04'',''2013-01-05'',''2013-01-06''],dtype=''datetime64[ns]'',fr
eq=''D'')In[17]:df.columnsOut[17]:Index([u''A'',u''B'',u''C'',u''D'']
,dtype=''object'')In[18]:df.valuesOut[18]:array([[0.4691,-0.2
829,-1.5091,-1.1356],[1.2121,-0.1732,0.1192,-1.0442],[-0
.8618,-2.1046,-0.4949,1.0718],[0.7216,-0.7068,-1.0396,0
.2719],[-0.425,0.567,0.2762,-1.0874],[-0.6737,0.1136,
-1.4784,0.525]])按列的一些统计信息:In[19]:df.describe()Out[19]:A
BCDcount6.0000006.0000006.0000006
.000000mean0.073711-0.431125-0.687758-0.233103std0.8431
570.9228180.7798870.973118min-0.861849-2.104569-1.5090
59-1.13563225%-0.611510-0.600794-1.368714-1.07661050%0
.022070-0.228039-0.767252-0.38618875%0.6584440.041933-0
.0343260.461706max1.2121120.5670200.2762321.071804转置:
In[20]:df.TOut[20]:2013-01-012013-01-022013-01-032013-0
1-042013-01-052013-01-06A0.4691121.212112-0.861849
0.721555-0.424972-0.673690B-0.282863-0.173215-
2.104569-0.7067710.5670200.113648C-1.5090590.11
9209-0.494929-1.0395750.276232-1.478427D-1.135632
-1.0442361.0718040.271860-1.0874010.524988按轴(ax
is)排序:In[21]:df.sort_index(axis=1,ascending=False)Out[21]:D
CBA2013-01-01-1.135632-1.509059-0.28
28630.4691122013-01-02-1.0442360.119209-0.1732151.2121122
013-01-031.071804-0.494929-2.104569-0.8618492013-01-040.27
1860-1.039575-0.7067710.7215552013-01-05-1.0874010.276232
0.567020-0.4249722013-01-060.524988-1.4784270.113648-0.67
3690按值(value)排序:In[22]:df.sort_values(by=''B'')Out[22]:A
BCD2013-01-03-0.861849-2.104569-0.494929
1.0718042013-01-040.721555-0.706771-1.0395750.2718602013-01
-010.469112-0.282863-1.509059-1.1356322013-01-021.212112-
0.1732150.119209-1.0442362013-01-06-0.6736900.113648-1.478
4270.5249882013-01-05-0.4249720.5670200.276232-1.087401选择
访问列:In[23]:df[''A'']Out[23]:2013-01-010.4691122013-01-02
1.2121122013-01-03-0.8618492013-01-040.7215552013-01-05
-0.4249722013-01-06-0.673690Freq:D,Name:A,dtype:float64访问
行:In[24]:df[0:3]Out[24]:ABCD2013-0
1-010.469112-0.282863-1.509059-1.1356322013-01-021.212112
-0.1732150.119209-1.0442362013-01-03-0.861849-2.104569-0.49
49291.071804In[25]:df[''20130102'':''20130104'']Out[25]:A
BCD2013-01-021.212112-0.1732150.119209
-1.0442362013-01-03-0.861849-2.104569-0.4949291.0718042013-0
1-040.721555-0.706771-1.0395750.271860按标签(label)访问:In[26]:
df.loc[dates[0]]Out[26]:A0.469112B-0.282863C-1.509059
D-1.135632Name:2013-01-0100:00:00,dtype:float64按标签(label)多
维访问:In[27]:df.loc[:,[''A'',''B'']]Out[27]:AB2013-01-01
0.469112-0.2828632013-01-021.212112-0.1732152013-01-03-0.861
849-2.1045692013-01-040.721555-0.7067712013-01-05-0.424972
0.5670202013-01-06-0.6736900.113648按标签(label)切片,跟上面其实一样:In[28
]:df.loc[''20130102'':''20130104'',[''A'',''B'']]Out[28]:AB20
13-01-021.212112-0.1732152013-01-03-0.861849-2.104569In[29]
:df.loc[''20130102'',[''A'',''B'']]Out[29]:A1.212112B-0.173215
Name:2013-01-0200:00:00,dtype:float64In[30]:df.loc[dates[0
],''A'']Out[30]:0.46911229990718628快速存取,与上面的方法一样,估计快那么一点点:In[31]:
df.at[dates[0],''A'']Out[31]:0.46911229990718628按位置存取In[32]:df.
iloc[3]Out[32]:A0.721555B-0.706771C-1.039575D0.271
860Name:2013-01-0400:00:00,dtype:float64跟numpy很像呢:In[33]:df
.iloc[3:5,0:2]Out[33]:AB2013-01-040.721555-0.706771
2013-01-05-0.4249720.567020输入还可以是列表:In[34]:df.iloc[[1,2,4],[
0,2]]Out[34]:AC2013-01-021.2121120.1192092013-01-0
3-0.861849-0.4949292013-01-05-0.4249720.276232按行切片In[35]:d
f.iloc[1:3,:]Out[35]:ABCD2013-01-02
1.212112-0.1732150.119209-1.0442362013-01-03-0.861849-2.10
4569-0.4949291.071804按列切片:In[36]:df.iloc[:,1:3]Out[36]:B
C2013-01-01-0.282863-1.5090592013-01-02-0.1732150.119
2092013-01-03-2.104569-0.4949292013-01-04-0.706771-1.03957520
13-01-050.5670200.2762322013-01-060.113648-1.478427得到具体位置的
值:In[37]:df.iloc[1,1]Out[37]:-0.17321464905330858快速存取,与上方法无异。I
n[38]:df.iat[1,1]Out[38]:-0.17321464905330858逻辑索引按一列的值,选择数据:In
[39]:df[df.A>0]Out[39]:ABCD2013-
01-010.469112-0.282863-1.509059-1.1356322013-01-021.212112
-0.1732150.119209-1.0442362013-01-040.721555-0.706771-1.0
395750.271860全部地方都考虑:In[40]:df[df>0]Out[40]:AB
CD2013-01-010.469112NaNNaN
NaN2013-01-021.212112NaN0.119209NaN2013-01-03
NaNNaNNaN1.0718042013-01-040.721555
NaNNaN0.2718602013-01-05NaN0.5670200.276232
NaN2013-01-06NaN0.113648NaN0.524988用isin()
函数来筛选(filtering):In[41]:df2=df.copy()In[42]:df2[''E'']=[''o
ne'',''one'',''two'',''three'',''four'',''three'']In[43]:df2Out[43]:A
BCDE2013-01-010.469112-0.282863
-1.509059-1.135632one2013-01-021.212112-0.1732150.1192
09-1.044236one2013-01-03-0.861849-2.104569-0.4949291.07
1804two2013-01-040.721555-0.706771-1.0395750.271860th
ree2013-01-05-0.4249720.5670200.276232-1.087401four2013-
01-06-0.6736900.113648-1.4784270.524988threeIn[44]:df2[
df2[''E''].isin([''two'',''four''])]Out[44]:ABC
DE2013-01-03-0.861849-2.104569-0.4949291.071804
two2013-01-05-0.4249720.5670200.276232-1.087401four赋值In[
45]:s1=pd.Series([1,2,3,4,5,6],index=pd.date_range(''20130102''
,periods=6))In[46]:s1Out[46]:2013-01-0212013-01-03220
13-01-0432013-01-0542013-01-0652013-01-076Freq:D
,dtype:int64In[47]:df[''F'']=s1按标签(label)赋值:In[48]:df.at[da
tes[0],''A'']=0按位置(location)赋值:In[49]:df.iat[0,1]=0赋值一个数组(np.
array)In[50]:df.loc[:,''D'']=np.array([5]len(df))结果:In[51]:
dfOut[51]:ABCDF2013-01-010.0000000
.000000-1.5090595NaN2013-01-021.212112-0.1732150.119209
51.02013-01-03-0.861849-2.104569-0.49492952.02013-01-0
40.721555-0.706771-1.03957553.02013-01-05-0.4249720.56
70200.27623254.02013-01-06-0.6736900.113648-1.4784275
5.0逻辑赋值:In[52]:df2=df.copy()In[53]:df2[df2>0]=-df2In
[54]:df2Out[54]:ABCDF2013-01-010.000
0000.000000-1.509059-5NaN2013-01-02-1.212112-0.173215-0.
119209-5-1.02013-01-03-0.861849-2.104569-0.494929-5-2.0201
3-01-04-0.721555-0.706771-1.039575-5-3.02013-01-05-0.424972
-0.567020-0.276232-5-4.02013-01-06-0.673690-0.113648-1.478
427-5-5.0缺失数据In[55]:df1=df.reindex(index=dates[0:4],column
s=list(df.columns)+[''E''])In[56]:df1.loc[dates[0]:dates[1],''E''
]=1In[57]:df1Out[57]:ABCDFE2013
-01-010.0000000.000000-1.5090595NaN1.02013-01-021.21
2112-0.1732150.11920951.01.02013-01-03-0.861849-2.1045
69-0.49492952.0NaN2013-01-040.721555-0.706771-1.039575
53.0NaN去掉含有缺失数据的行:In[58]:df1.dropna(how=''any'')Out[58]:A
BCDFE2013-01-021.212112-0.173215
0.11920951.01.0给缺失数据赋值:In[59]:df1.fillna(value=5)Out[59]:
ABCDFE2013-01-010.0000000.00000
0-1.50905955.01.02013-01-021.212112-0.1732150.119209
51.01.02013-01-03-0.861849-2.104569-0.49492952.05.0
2013-01-040.721555-0.706771-1.03957553.05.0得到缺失数据处的位置逻辑
掩膜(booleanmask):In[60]:pd.isnull(df1)Out[60]:ABC
DFE2013-01-01FalseFalseFalseFalseTru
eFalse2013-01-02FalseFalseFalseFalseFalseFalse2013-
01-03FalseFalseFalseFalseFalseTrue2013-01-04False
FalseFalseFalseFalseTrue操作均值(按列):In[61]:df.mean()Out[
61]:A-0.004474B-0.383981C-0.687758D5.000000F3.0
00000dtype:float64均值(按行):In[62]:df.mean(1)Out[62]:2013-01-01
0.8727352013-01-021.4316212013-01-030.7077312013-01-04
1.3950422013-01-051.8836562013-01-061.592306Freq:D,
dtype:float64注意shift()函数:In[63]:s=pd.Series([1,3,5,np.nan,6,
8],index=dates).shift(2)In[64]:sOut[64]:2013-01-01NaN2013
-01-02NaN2013-01-031.02013-01-043.02013-01-055.02
013-01-06NaNFreq:D,dtype:float64In[65]:df.sub(s,axis=''i
ndex'')Out[65]:ABCDF2013-01-01N
aNNaNNaNNaNNaN2013-01-02NaNNaN
NaNNaNNaN2013-01-03-1.861849-3.104569-1.4949294.01
.02013-01-04-2.278445-3.706771-4.0395752.00.02013-01-05-5
.424972-4.432980-4.7237680.0-1.02013-01-06NaNN
aNNaNNaNNaN对数据施加函数操作:In[66]:df.apply(np.cumsum)Out[6
6]:ABCDF2013-01-010.0000000.00000
0-1.5090595NaN2013-01-021.212112-0.173215-1.38985010
1.02013-01-030.350263-2.277784-1.884779153.02013-01-0
41.071818-2.984555-2.924354206.02013-01-050.646846-2.
417535-2.6481222510.02013-01-06-0.026844-2.303886-4.12654
93015.0In[67]:df.apply(lambdax:x.max()-x.min())Out[67]:
A2.073961B2.671590C1.785291D0.000000F4.00000
0dtype:float64直方(图)统计:In[68]:s=pd.Series(np.random.randint(0
,7,size=10))In[69]:sOut[69]:04122132465
464768494dtype:int64In[70]:s.value_counts()
Out[70]:45622211dtype:int64字符方法小写化:In[71]:s=
pd.Series([''A'',''B'',''C'',''Aaba'',''Baca'',np.nan,''CABA'',''dog'',
''cat''])In[72]:s.str.lower()Out[72]:0a1b2
c3aaba4baca5NaN6caba7dog8catdtype:obj
ect合并:In[73]:df=pd.DataFrame(np.random.randn(10,4))In[74]:
dfOut[74]:01230-0.5487021.467327
-1.015962-0.48307511.637550-1.217659-0.291519-1.7455052-0.
2639520.991460-0.9190690.2660463-0.7096611.6690521.0378
82-1.7057754-0.919854-0.0423791.247642-0.00992050.290213
0.4957670.3629491.5481066-1.131345-0.0893290.337863-0.9
458677-0.9321321.9560300.017587-0.0166928-0.5752470.2541
61-1.1437040.21589791.193555-0.077118-0.408530-0.862495#
breakitintopiecesIn[75]:pieces=[df[:3],df[3:7],df[7:]]In
[76]:pd.concat(pieces)Out[76]:0123
0-0.5487021.467327-1.015962-0.48307511.637550-1.217659-0
.291519-1.7455052-0.2639520.991460-0.9190690.2660463-0.70
96611.6690521.037882-1.7057754-0.919854-0.0423791.247642
-0.00992050.2902130.4957670.3629491.5481066-1.131345-0
.0893290.337863-0.9458677-0.9321321.9560300.017587-0.016
6928-0.5752470.254161-1.1437040.21589791.193555-0.077118
-0.408530-0.862495SQL式合并:In[77]:left=pd.DataFrame({''key'':[
''foo'',''foo''],''lval'':[1,2]})In[78]:right=pd.DataFrame({''ke
y'':[''foo'',''foo''],''rval'':[4,5]})In[79]:leftOut[79]:keyl
val0foo11foo2In[80]:rightOut[80]:keyrval0fo
o41foo5In[81]:pd.merge(left,right,on=''key'')Out[81
]:keylvalrval0foo141foo152foo
243foo25添加一行:In[82]:df=pd.DataFrame(np.rand
om.randn(8,4),columns=[''A'',''B'',''C'',''D''])In[83]:dfOut[83]:A
BCD01.3460611.5117631.627081-0.9
905821-0.4416521.2115260.2685200.0245802-1.5775850.3968
23-0.105381-0.53253231.4537491.208843-0.080952-0.2646104
-0.727965-0.5893460.339969-0.6932055-0.3393550.5936160.8
843451.59143160.1418090.2203900.4355890.1924517-0.0967
010.8033511.715071-0.708758In[84]:s=df.iloc[3]In[85]:d
f.append(s,ignore_index=True)Out[85]:ABC
D01.3460611.5117631.627081-0.9905821-0.4416521.211
5260.2685200.0245802-1.5775850.396823-0.105381-0.5325323
1.4537491.208843-0.080952-0.2646104-0.727965-0.5893460.
339969-0.6932055-0.3393550.5936160.8843451.59143160.141
8090.2203900.4355890.1924517-0.0967010.8033511.715071
-0.70875881.4537491.208843-0.080952-0.264610分组In[86]:df=
pd.DataFrame({''A'':[''foo'',''bar'',''foo'',''bar'',....:
''foo'',''bar'',''foo'',''foo''],....:
''B'':[''one'',''one'',''two'',''three'',....:
''two'',''two'',''one'',''three''],....:
''C'':np.random.randn(8),....:''D'':np.rand
om.randn(8)})....:In[87]:dfOut[87]:ABC
D0fooone-1.202872-0.0552241barone-1.8144702.3
959852footwo1.0186011.5528253barthree-0.5954470.
1665994footwo1.3954330.0476095bartwo-0.392670-0
.1364736fooone0.007207-0.5617577foothree1.928123-
1.623033按一列分组后求和:In[88]:df.groupby(''A'').sum()Out[88]:C
DAbar-2.8025882.42611foo3.146492-0.63958按多列分组后求和:In[89]:
df.groupby([''A'',''B'']).sum()Out[89]:CDABbarone-1
.8144702.395985three-0.5954470.166599two-0.392670-0.13
6473fooone-1.195665-0.616981three1.928123-1.623033two
2.4140341.600434重塑size:In[90]:tuples=list(zip([[''bar'',''
bar'',''baz'',''baz'',....:''foo'',''foo'',''qux
'',''qux''],....:[''one'',''two'',''one'',''two'',
....:''one'',''two'',''one'',''two'']]))....:
In[91]:index=pd.MultiIndex.from_tuples(tuples,names=[''first''
,''second''])In[92]:df=pd.DataFrame(np.random.randn(8,2),ind
ex=index,columns=[''A'',''B''])In[93]:df2=df[:4]In[94]:df2Out
[94]:ABfirstsecondbarone0.029399-0.542108tw
o0.282696-0.087302bazone-1.5751701.771208two
0.8164821.100230stack()函数对列进行压缩:In[95]:stacked=df2.stack()
In[96]:stackedOut[96]:firstsecondbaroneA0.02939
9B-0.542108twoA0.282696B-0.087302bazone
A-1.575170B1.771208twoA0.816482B1.100230d
type:float64解压缩:In[97]:stacked.unstack()Out[97]:ABf
irstsecondbarone0.029399-0.542108two0.282696-0.0
87302bazone-1.5751701.771208two0.8164821.100230I
n[98]:stacked.unstack(1)Out[98]:secondonetwofir
stbarA0.0293990.282696B-0.542108-0.087302bazA-1.575
1700.816482B1.7712081.100230In[99]:stacked.unstack(0)Out
[99]:firstbarbazsecondoneA0.029399-1.575
170B-0.5421081.771208twoA0.2826960.816482B-0.087302
1.100230数据透视表In[100]:df=pd.DataFrame({''A'':[''www.mntuku.cn
'',''one'',''two'',''three'']3,.....:''B'':[''
A'',''B'',''C'']4,.....:''C'':[''foo'',''foo'',
''foo'',''bar'',''bar'',''bar'']2,.....:''D'':
np.random.randn(12),.....:''E'':np.random.r
andn(12)}).....:In[101]:dfOut[101]:ABCD
E0oneAfoo1.418757-0.1796661oneBfoo-1.87
90241.2918362twoCfoo0.536826-0.0096143threeA
bar1.0061600.3921494oneBbar-0.0297160.2645995
oneCbar-1.146178-0.0574096twoAfoo0.100900-1.42
56387threeBfoo-1.0350181.0240988oneCfoo0.314
665-0.1060629oneAbar-0.7737231.82437510twoBb
ar-1.1706530.59597411threeCbar0.6487401.167115生成数据透视
表:In[102]:pd.pivot_table(df,values=''D'',index=[''A'',''B''],colu
mns=[''C''])Out[102]:CbarfooABoneA-0.
7737231.418757B-0.029716-1.879024C-1.1461780.314665three
A1.006160NaNBNaN-1.035018C0.648740Na
NtwoANaN0.100900B-1.170653NaNCNaN0
.536826时间序列In[103]:rng=pd.date_range(''1/1/2012'',periods=100,
freq=''S'')In[104]:ts=pd.Series(np.random.randint(0,500,len(
rng)),index=rng)In[105]:ts.resample(''5Min'').sum()Out[105]:201
2-01-0125083Freq:5T,dtype:int64时区表示:In[106]:rng=pd.dat
e_range(''3/6/201200:00'',periods=5,freq=''D'')In[107]:ts=pd.S
eries(np.random.randn(len(rng)),rng)In[108]:tsOut[108]:2012-0
3-060.4640002012-03-070.2273712012-03-08-0.4969222012-
03-090.3063892012-03-10-2.290613Freq:D,dtype:float64In
[109]:ts_utc=ts.tz_localize(''UTC'')In[110]:ts_utcOut[110]:20
12-03-0600:00:00+00:000.4640002012-03-0700:00:00+00:000
.2273712012-03-0800:00:00+00:00-0.4969222012-03-0900:00:00+0
0:000.3063892012-03-1000:00:00+00:00-2.290613Freq:D,dty
pe:float64转换到另一个时区:In[111]:ts_utc.tz_convert(''US/Eastern'')Out[
111]:2012-03-0519:00:00-05:000.4640002012-03-0619:00:00-05
:000.2273712012-03-0719:00:00-05:00-0.4969222012-03-0819
:00:00-05:000.3063892012-03-0919:00:00-05:00-2.290613Freq
:D,dtype:float64时间间隔表示变换:In[112]:rng=pd.date_range(''1/1/20
12'',periods=5,freq=''M'')In[113]:ts=pd.Series(np.random.randn
(len(rng)),index=rng)In[114]:tsOut[114]:2012-01-31-1.13462
32012-02-29-1.5618192012-03-31-0.2608382012-04-300.2819
572012-05-311.523962Freq:M,dtype:float64In[115]:ps=ts.
to_period()In[116]:psOut[116]:2012-01-1.1346232012-02-1.
5618192012-03-0.2608382012-040.2819572012-051.523962Fr
eq:M,dtype:float64In[117]:ps.to_timestamp()Out[117]:2012-01
-01-1.1346232012-02-01-1.5618192012-03-01-0.2608382012-0
4-010.2819572012-05-011.523962Freq:MS,dtype:float64季度变
换:In[118]:prng=pd.period_range(''1990Q1'',''2000Q4'',freq=''Q-NO
V'')In[119]:ts=pd.Series(np.random.randn(len(prng)),prng)In[
120]:ts.index=(prng.asfreq(''M'',''e'')+1).asfreq(''H'',''s'')+9
In[121]:ts.head()Out[121]:1990-03-0109:00-0.9029371990-06-
0109:000.0681591990-09-0109:00-0.0578731990-12-0109:00
-0.3682041991-03-0109:00-1.144073Freq:H,dtype:float64类别I
n[122]:df=pd.DataFrame({"id":[1,2,3,4,5,6],"raw_grade":[''a'',
''b'',''b'',''a'',''a'',''e'']})变换成类别数据类型:In[123]:df["grade"]=df["
raw_grade"].astype("category")In[124]:df["grade"]Out[124]:0
a1b2b3a4a5eName:grade,dtype:categoryCateg
ories(3,object):[a,b,e]重命名:In[125]:df["grade"].cat.categor
ies=["verygood","good","verybad"]重新排序:In[126]:df["grade"]
=df["grade"].cat.set_categories(["verybad","bad","medium","
good","verygood"])In[127]:df["grade"]Out[127]:0verygood
1good2good3verygood4verygood5ver
ybadName:grade,dtype:categoryCategories(5,object):[veryba
d,bad,medium,good,verygood]排序:In[128]:df.sort_values(by="g
rade")Out[128]:idraw_gradegrade56everyba
d12bgood23bgood01
averygood34averygood45averygoo
d按类别列分组:In[129]:df.groupby("grade").size()Out[129]:gradeveryb
ad1bad0medium0good2verygood3d
type:int64画图In[130]:ts=pd.Series(np.random.randn(1000),inde
x=pd.date_range(''1/1/2000'',periods=1000))In[131]:ts=ts.cumsu
m()In[132]:ts.plot()Out[132]:bplotat0x10efd5a90>In[133]:df=pd.DataFrame(np.random.randn(
1000,4),index=ts.index,.....:columns=[''A'',
''B'',''C'',''D'']).....:In[134]:df=df.cumsum()In[135]:plt.fi
gure();df.plot();plt.legend(loc=''best'')Out[135]:gend.Legendat0x112854d90>数据本地存取:写入csv文件:In[136]:df.to_csv(''
foo.csv'')读取csv文件:In[137]:pd.read_csv(''foo.csv'')Out[137]:Unn
amed:0ABCD02000-01-01
0.266457-0.399641-0.2195821.18686012000-01-02-1.17
0732-0.3458731.653061-0.28295322000-01-03-1.734933
0.5304682.060811-0.51553632000-01-04-1.5551211.45262
00.239859-1.15689642000-01-050.5781170.5113710.10
3552-2.42820252000-01-060.4783440.449933-0.741620-
1.96240962000-01-071.235339-0.091757-1.543861-1.08475
3.................9932
002-09-20-10.628548-9.153563-7.88314628.3139409942002-09-
21-10.390377-8.727491-6.39964530.9141079952002-09-22-8.
985362-8.485624-4.66946231.3677409962002-09-23-9.558560
-8.781216-4.49981530.5184399972002-09-24-9.902058-9.340
490-4.38663930.1055939982002-09-25-10.216020-9.480682-3.
93380229.7585609992002-09-26-11.856774-10.671012-3.216025
29.369368[1000rowsx5columns]写入HDF5文件:In[138]:df.to_hdf(''f
oo.h5'',''df'')读取HDF5文件:In[139]:pd.read_hdf(''foo.h5'',''df'')Out[139
]:ABCD2000-01-010.266457-0.39
9641-0.2195821.1868602000-01-02-1.170732-0.3458731.653061-0.2829532000-01-03-1.7349330.5304682.060811-0.5155362000-01-04-1.5551211.4526200.239859-1.1568962000-01-050.5781170.5113710.103552-2.4282022000-01-060.4783440.449933-0.741620-1.9624092000-01-071.235339-0.091757-1.543861-1.084753...............2002-09-20-10.628548-9.153563-7.88314628.3139402002-09-21-10.390377-8.727491-6.39964530.9141072002-09-22-8.985362-8.485624-4.66946231.3677402002-09-23-9.558560-8.781216-4.49981530.5184392002-09-24-9.902058-9.340490-4.38663930.1055932002-09-25-10.216020-9.480682-3.93380229.7585602002-09-26-11.856774-10.671012-3.21602529.369368[1000rowsx4columns]写入excel文件:In[140]:df.to_excel(''foo.xlsx'',sheet_name=''Sheet1'')读取excel文件:In[141]:pd.read_excel(''foo.xlsx'',''Sheet1'',index_col=None,na_values=[''NA''])Out[141]:ABCD2000-01-010.266457-0.399641-0.2195821.1868602000-01-02-1.170732-0.3458731.653061-0.2829532000-01-03-1.7349330.5304682.060811-0.5155362000-01-04-1.5551211.4526200.239859-1.1568962000-01-050.5781170.5113710.103552-2.4282022000-01-060.4783440.449933-0.741620-1.9624092000-01-071.235339-0.091757-1.543861-1.084753...............2002-09-20-10.628548-9.153563-7.88314628.3139402002-09-21-10.390377-8.727491-6.39964530.9141072002-09-22-8.985362-8.485624-4.66946231.3677402002-09-23-9.558560-8.781216-4.49981530.5184392002-09-24-9.902058-9.340490-4.38663930.1055932002-09-25-10.216020-9.480682-3.93380229.7585602002-09-26-11.856774-10.671012-3.21602529.369368[1000rowsx4columns]
献花(0)
+1
(本文系雨亭之东首藏)