评:我跟作者的智商差距是有多大,才能让我用60分钟看完作者认为10分钟的内容。。。习惯上我们先导入:In[1]:importpand asaspdIn[2]:importnumpyasnpIn[3]:importmatplotlib.pyplo tasplt创建序列(Series),输入可为列表(list):In[4]:s=pd.Series([1,3,5,np .nan,6,8])In[5]:sOut[5]:01.013.025.03NaN46 .058.0dtype:float64创建DataFrame:In[6]:dates=pd.date_range (''20130101'',periods=6)In[7]:datesOut[7]:DatetimeIndex([''2013- 01-01'',''2013-01-02'',''2013-01-03'',''2013-01-04'',''2013-01-05'','' 2013-01-06''],dtype=''datetime64[ns]'',freq=''D'')In[8]:df=pd.Da taFrame(np.random.randn(6,4),index=dates,columns=list(''ABCD''))I n[9]:dfOut[9]:ABCD2013-01-010.46 9112-0.282863-1.509059-1.1356322013-01-021.212112-0.173215 0.119209-1.0442362013-01-03-0.861849-2.104569-0.4949291.07 18042013-01-040.721555-0.706771-1.0395750.2718602013-01-05 -0.4249720.5670200.276232-1.0874012013-01-06-0.6736900.11 3648-1.4784270.524988创建DataFrame,输入可为字典(dict):In[10]:df2=p d.DataFrame({''A'':1.,....:''B'':pd.Times tamp(''20130102''),....:''C'':pd.Series(1,in dex=list(range(4)),dtype=''float32''),....:'' D'':np.array([3]4,dtype=''int32''),....: ''E'':pd.Categorical(["test","train","test","train"]),....: ''F'':''foo''})....:In[11]:df2Out[11]:A BCDEF01.02013-01-021.03testfoo1 1.02013-01-021.03trainfoo21.02013-01-021.03t estfoo31.02013-01-021.03trainfoo各列的类型:In[12]:df2.d typesOut[12]:Afloat64Bdatetime64[ns]Cf loat32Dint32EcategoryFobjectdt ype:object如果你用的是IPython,tab键有自动补全功能(只列出一部分):In[13]:df2.d f2.Adf2.boxplotdf2.absdf2.Cdf2. adddf2.clipdf2.add_prefixdf2.clip_lowerd f2.add_suffixdf2.clip_upperdf2.aligndf2.co lumnsdf2.alldf2.combinedf2.anydf2 .combineAdddf2.appenddf2.combine_firstdf2.apply df2.combineMultdf2.applymapdf2.compounddf2.as _blocksdf2.consolidatedf2.asfreqdf2.conver t_objectsdf2.as_matrixdf2.copydf2.astypedf 2.corrdf2.atdf2.corrwithdf2.at_timed f2.countdf2.axesdf2.covdf2.Bdf2. cummaxdf2.between_timedf2.cummindf2.bfilldf2 .cumproddf2.blocksdf2.cumsumdf2.boold f2.D查看数据查看首尾:In[14]:df.head()Out[14]:ABC D2013-01-010.469112-0.282863-1.509059-1.1356322013-01- 021.212112-0.1732150.119209-1.0442362013-01-03-0.861849-2 .104569-0.4949291.0718042013-01-040.721555-0.706771-1.0395 750.2718602013-01-05-0.4249720.5670200.276232-1.087401In [15]:df.tail(3)Out[15]:ABCD2013-01- 040.721555-0.706771-1.0395750.2718602013-01-05-0.4249720 .5670200.276232-1.0874012013-01-06-0.6736900.113648-1.4784 270.524988显示横(index)、纵(columns)、值(numpy):In[16]:df.indexOut[1 6]:DatetimeIndex([''2013-01-01'',''2013-01-02'',''2013-01-03'',''201 3-01-04'',''2013-01-05'',''2013-01-06''],dtype=''datetime64[ns]'',fr eq=''D'')In[17]:df.columnsOut[17]:Index([u''A'',u''B'',u''C'',u''D''] ,dtype=''object'')In[18]:df.valuesOut[18]:array([[0.4691,-0.2 829,-1.5091,-1.1356],[1.2121,-0.1732,0.1192,-1.0442],[-0 .8618,-2.1046,-0.4949,1.0718],[0.7216,-0.7068,-1.0396,0 .2719],[-0.425,0.567,0.2762,-1.0874],[-0.6737,0.1136, -1.4784,0.525]])按列的一些统计信息:In[19]:df.describe()Out[19]:A BCDcount6.0000006.0000006.0000006 .000000mean0.073711-0.431125-0.687758-0.233103std0.8431 570.9228180.7798870.973118min-0.861849-2.104569-1.5090 59-1.13563225%-0.611510-0.600794-1.368714-1.07661050%0 .022070-0.228039-0.767252-0.38618875%0.6584440.041933-0 .0343260.461706max1.2121120.5670200.2762321.071804转置: In[20]:df.TOut[20]:2013-01-012013-01-022013-01-032013-0 1-042013-01-052013-01-06A0.4691121.212112-0.861849 0.721555-0.424972-0.673690B-0.282863-0.173215- 2.104569-0.7067710.5670200.113648C-1.5090590.11 9209-0.494929-1.0395750.276232-1.478427D-1.135632 -1.0442361.0718040.271860-1.0874010.524988按轴(ax is)排序:In[21]:df.sort_index(axis=1,ascending=False)Out[21]:D CBA2013-01-01-1.135632-1.509059-0.28 28630.4691122013-01-02-1.0442360.119209-0.1732151.2121122 013-01-031.071804-0.494929-2.104569-0.8618492013-01-040.27 1860-1.039575-0.7067710.7215552013-01-05-1.0874010.276232 0.567020-0.4249722013-01-060.524988-1.4784270.113648-0.67 3690按值(value)排序:In[22]:df.sort_values(by=''B'')Out[22]:A BCD2013-01-03-0.861849-2.104569-0.494929 1.0718042013-01-040.721555-0.706771-1.0395750.2718602013-01 -010.469112-0.282863-1.509059-1.1356322013-01-021.212112- 0.1732150.119209-1.0442362013-01-06-0.6736900.113648-1.478 4270.5249882013-01-05-0.4249720.5670200.276232-1.087401选择 访问列:In[23]:df[''A'']Out[23]:2013-01-010.4691122013-01-02 1.2121122013-01-03-0.8618492013-01-040.7215552013-01-05 -0.4249722013-01-06-0.673690Freq:D,Name:A,dtype:float64访问 行:In[24]:df[0:3]Out[24]:ABCD2013-0 1-010.469112-0.282863-1.509059-1.1356322013-01-021.212112 -0.1732150.119209-1.0442362013-01-03-0.861849-2.104569-0.49 49291.071804In[25]:df[''20130102'':''20130104'']Out[25]:A BCD2013-01-021.212112-0.1732150.119209 -1.0442362013-01-03-0.861849-2.104569-0.4949291.0718042013-0 1-040.721555-0.706771-1.0395750.271860按标签(label)访问:In[26]: df.loc[dates[0]]Out[26]:A0.469112B-0.282863C-1.509059 D-1.135632Name:2013-01-0100:00:00,dtype:float64按标签(label)多 维访问:In[27]:df.loc[:,[''A'',''B'']]Out[27]:AB2013-01-01 0.469112-0.2828632013-01-021.212112-0.1732152013-01-03-0.861 849-2.1045692013-01-040.721555-0.7067712013-01-05-0.424972 0.5670202013-01-06-0.6736900.113648按标签(label)切片,跟上面其实一样:In[28 ]:df.loc[''20130102'':''20130104'',[''A'',''B'']]Out[28]:AB20 13-01-021.212112-0.1732152013-01-03-0.861849-2.104569In[29] :df.loc[''20130102'',[''A'',''B'']]Out[29]:A1.212112B-0.173215 Name:2013-01-0200:00:00,dtype:float64In[30]:df.loc[dates[0 ],''A'']Out[30]:0.46911229990718628快速存取,与上面的方法一样,估计快那么一点点:In[31]: df.at[dates[0],''A'']Out[31]:0.46911229990718628按位置存取In[32]:df. iloc[3]Out[32]:A0.721555B-0.706771C-1.039575D0.271 860Name:2013-01-0400:00:00,dtype:float64跟numpy很像呢:In[33]:df .iloc[3:5,0:2]Out[33]:AB2013-01-040.721555-0.706771 2013-01-05-0.4249720.567020输入还可以是列表:In[34]:df.iloc[[1,2,4],[ 0,2]]Out[34]:AC2013-01-021.2121120.1192092013-01-0 3-0.861849-0.4949292013-01-05-0.4249720.276232按行切片In[35]:d f.iloc[1:3,:]Out[35]:ABCD2013-01-02 1.212112-0.1732150.119209-1.0442362013-01-03-0.861849-2.10 4569-0.4949291.071804按列切片:In[36]:df.iloc[:,1:3]Out[36]:B C2013-01-01-0.282863-1.5090592013-01-02-0.1732150.119 2092013-01-03-2.104569-0.4949292013-01-04-0.706771-1.03957520 13-01-050.5670200.2762322013-01-060.113648-1.478427得到具体位置的 值:In[37]:df.iloc[1,1]Out[37]:-0.17321464905330858快速存取,与上方法无异。I n[38]:df.iat[1,1]Out[38]:-0.17321464905330858逻辑索引按一列的值,选择数据:In [39]:df[df.A>0]Out[39]:ABCD2013- 01-010.469112-0.282863-1.509059-1.1356322013-01-021.212112 -0.1732150.119209-1.0442362013-01-040.721555-0.706771-1.0 395750.271860全部地方都考虑:In[40]:df[df>0]Out[40]:AB CD2013-01-010.469112NaNNaN NaN2013-01-021.212112NaN0.119209NaN2013-01-03 NaNNaNNaN1.0718042013-01-040.721555 NaNNaN0.2718602013-01-05NaN0.5670200.276232 NaN2013-01-06NaN0.113648NaN0.524988用isin() 函数来筛选(filtering):In[41]:df2=df.copy()In[42]:df2[''E'']=[''o ne'',''one'',''two'',''three'',''four'',''three'']In[43]:df2Out[43]:A BCDE2013-01-010.469112-0.282863 -1.509059-1.135632one2013-01-021.212112-0.1732150.1192 09-1.044236one2013-01-03-0.861849-2.104569-0.4949291.07 1804two2013-01-040.721555-0.706771-1.0395750.271860th ree2013-01-05-0.4249720.5670200.276232-1.087401four2013- 01-06-0.6736900.113648-1.4784270.524988threeIn[44]:df2[ df2[''E''].isin([''two'',''four''])]Out[44]:ABC DE2013-01-03-0.861849-2.104569-0.4949291.071804 two2013-01-05-0.4249720.5670200.276232-1.087401four赋值In[ 45]:s1=pd.Series([1,2,3,4,5,6],index=pd.date_range(''20130102'' ,periods=6))In[46]:s1Out[46]:2013-01-0212013-01-03220 13-01-0432013-01-0542013-01-0652013-01-076Freq:D ,dtype:int64In[47]:df[''F'']=s1按标签(label)赋值:In[48]:df.at[da tes[0],''A'']=0按位置(location)赋值:In[49]:df.iat[0,1]=0赋值一个数组(np. array)In[50]:df.loc[:,''D'']=np.array([5]len(df))结果:In[51]: dfOut[51]:ABCDF2013-01-010.0000000 .000000-1.5090595NaN2013-01-021.212112-0.1732150.119209 51.02013-01-03-0.861849-2.104569-0.49492952.02013-01-0 40.721555-0.706771-1.03957553.02013-01-05-0.4249720.56 70200.27623254.02013-01-06-0.6736900.113648-1.4784275 5.0逻辑赋值:In[52]:df2=df.copy()In[53]:df2[df2>0]=-df2In [54]:df2Out[54]:ABCDF2013-01-010.000 0000.000000-1.509059-5NaN2013-01-02-1.212112-0.173215-0. 119209-5-1.02013-01-03-0.861849-2.104569-0.494929-5-2.0201 3-01-04-0.721555-0.706771-1.039575-5-3.02013-01-05-0.424972 -0.567020-0.276232-5-4.02013-01-06-0.673690-0.113648-1.478 427-5-5.0缺失数据In[55]:df1=df.reindex(index=dates[0:4],column s=list(df.columns)+[''E''])In[56]:df1.loc[dates[0]:dates[1],''E'' ]=1In[57]:df1Out[57]:ABCDFE2013 -01-010.0000000.000000-1.5090595NaN1.02013-01-021.21 2112-0.1732150.11920951.01.02013-01-03-0.861849-2.1045 69-0.49492952.0NaN2013-01-040.721555-0.706771-1.039575 53.0NaN去掉含有缺失数据的行:In[58]:df1.dropna(how=''any'')Out[58]:A BCDFE2013-01-021.212112-0.173215 0.11920951.01.0给缺失数据赋值:In[59]:df1.fillna(value=5)Out[59]: ABCDFE2013-01-010.0000000.00000 0-1.50905955.01.02013-01-021.212112-0.1732150.119209 51.01.02013-01-03-0.861849-2.104569-0.49492952.05.0 2013-01-040.721555-0.706771-1.03957553.05.0得到缺失数据处的位置逻辑 掩膜(booleanmask):In[60]:pd.isnull(df1)Out[60]:ABC DFE2013-01-01FalseFalseFalseFalseTru eFalse2013-01-02FalseFalseFalseFalseFalseFalse2013- 01-03FalseFalseFalseFalseFalseTrue2013-01-04False FalseFalseFalseFalseTrue操作均值(按列):In[61]:df.mean()Out[ 61]:A-0.004474B-0.383981C-0.687758D5.000000F3.0 00000dtype:float64均值(按行):In[62]:df.mean(1)Out[62]:2013-01-01 0.8727352013-01-021.4316212013-01-030.7077312013-01-04 1.3950422013-01-051.8836562013-01-061.592306Freq:D, dtype:float64注意shift()函数:In[63]:s=pd.Series([1,3,5,np.nan,6, 8],index=dates).shift(2)In[64]:sOut[64]:2013-01-01NaN2013 -01-02NaN2013-01-031.02013-01-043.02013-01-055.02 013-01-06NaNFreq:D,dtype:float64In[65]:df.sub(s,axis=''i ndex'')Out[65]:ABCDF2013-01-01N aNNaNNaNNaNNaN2013-01-02NaNNaN NaNNaNNaN2013-01-03-1.861849-3.104569-1.4949294.01 .02013-01-04-2.278445-3.706771-4.0395752.00.02013-01-05-5 .424972-4.432980-4.7237680.0-1.02013-01-06NaNN aNNaNNaNNaN对数据施加函数操作:In[66]:df.apply(np.cumsum)Out[6 6]:ABCDF2013-01-010.0000000.00000 0-1.5090595NaN2013-01-021.212112-0.173215-1.38985010 1.02013-01-030.350263-2.277784-1.884779153.02013-01-0 41.071818-2.984555-2.924354206.02013-01-050.646846-2. 417535-2.6481222510.02013-01-06-0.026844-2.303886-4.12654 93015.0In[67]:df.apply(lambdax:x.max()-x.min())Out[67]: A2.073961B2.671590C1.785291D0.000000F4.00000 0dtype:float64直方(图)统计:In[68]:s=pd.Series(np.random.randint(0 ,7,size=10))In[69]:sOut[69]:04122132465 464768494dtype:int64In[70]:s.value_counts() Out[70]:45622211dtype:int64字符方法小写化:In[71]:s= pd.Series([''A'',''B'',''C'',''Aaba'',''Baca'',np.nan,''CABA'',''dog'', ''cat''])In[72]:s.str.lower()Out[72]:0a1b2 c3aaba4baca5NaN6caba7dog8catdtype:obj ect合并:In[73]:df=pd.DataFrame(np.random.randn(10,4))In[74]: dfOut[74]:01230-0.5487021.467327 -1.015962-0.48307511.637550-1.217659-0.291519-1.7455052-0. 2639520.991460-0.9190690.2660463-0.7096611.6690521.0378 82-1.7057754-0.919854-0.0423791.247642-0.00992050.290213 0.4957670.3629491.5481066-1.131345-0.0893290.337863-0.9 458677-0.9321321.9560300.017587-0.0166928-0.5752470.2541 61-1.1437040.21589791.193555-0.077118-0.408530-0.862495# breakitintopiecesIn[75]:pieces=[df[:3],df[3:7],df[7:]]In [76]:pd.concat(pieces)Out[76]:0123 0-0.5487021.467327-1.015962-0.48307511.637550-1.217659-0 .291519-1.7455052-0.2639520.991460-0.9190690.2660463-0.70 96611.6690521.037882-1.7057754-0.919854-0.0423791.247642 -0.00992050.2902130.4957670.3629491.5481066-1.131345-0 .0893290.337863-0.9458677-0.9321321.9560300.017587-0.016 6928-0.5752470.254161-1.1437040.21589791.193555-0.077118 -0.408530-0.862495SQL式合并:In[77]:left=pd.DataFrame({''key'':[ ''foo'',''foo''],''lval'':[1,2]})In[78]:right=pd.DataFrame({''ke y'':[''foo'',''foo''],''rval'':[4,5]})In[79]:leftOut[79]:keyl val0foo11foo2In[80]:rightOut[80]:keyrval0fo o41foo5In[81]:pd.merge(left,right,on=''key'')Out[81 ]:keylvalrval0foo141foo152foo 243foo25添加一行:In[82]:df=pd.DataFrame(np.rand om.randn(8,4),columns=[''A'',''B'',''C'',''D''])In[83]:dfOut[83]:A BCD01.3460611.5117631.627081-0.9 905821-0.4416521.2115260.2685200.0245802-1.5775850.3968 23-0.105381-0.53253231.4537491.208843-0.080952-0.2646104 -0.727965-0.5893460.339969-0.6932055-0.3393550.5936160.8 843451.59143160.1418090.2203900.4355890.1924517-0.0967 010.8033511.715071-0.708758In[84]:s=df.iloc[3]In[85]:d f.append(s,ignore_index=True)Out[85]:ABC D01.3460611.5117631.627081-0.9905821-0.4416521.211 5260.2685200.0245802-1.5775850.396823-0.105381-0.5325323 1.4537491.208843-0.080952-0.2646104-0.727965-0.5893460. 339969-0.6932055-0.3393550.5936160.8843451.59143160.141 8090.2203900.4355890.1924517-0.0967010.8033511.715071 -0.70875881.4537491.208843-0.080952-0.264610分组In[86]:df= pd.DataFrame({''A'':[''foo'',''bar'',''foo'',''bar'',....: ''foo'',''bar'',''foo'',''foo''],....: ''B'':[''one'',''one'',''two'',''three'',....: ''two'',''two'',''one'',''three''],....: ''C'':np.random.randn(8),....:''D'':np.rand om.randn(8)})....:In[87]:dfOut[87]:ABC D0fooone-1.202872-0.0552241barone-1.8144702.3 959852footwo1.0186011.5528253barthree-0.5954470. 1665994footwo1.3954330.0476095bartwo-0.392670-0 .1364736fooone0.007207-0.5617577foothree1.928123- 1.623033按一列分组后求和:In[88]:df.groupby(''A'').sum()Out[88]:C DAbar-2.8025882.42611foo3.146492-0.63958按多列分组后求和:In[89]: df.groupby([''A'',''B'']).sum()Out[89]:CDABbarone-1 .8144702.395985three-0.5954470.166599two-0.392670-0.13 6473fooone-1.195665-0.616981three1.928123-1.623033two 2.4140341.600434重塑size:In[90]:tuples=list(zip([[''bar'','' bar'',''baz'',''baz'',....:''foo'',''foo'',''qux '',''qux''],....:[''one'',''two'',''one'',''two'', ....:''one'',''two'',''one'',''two'']]))....: In[91]:index=pd.MultiIndex.from_tuples(tuples,names=[''first'' ,''second''])In[92]:df=pd.DataFrame(np.random.randn(8,2),ind ex=index,columns=[''A'',''B''])In[93]:df2=df[:4]In[94]:df2Out [94]:ABfirstsecondbarone0.029399-0.542108tw o0.282696-0.087302bazone-1.5751701.771208two 0.8164821.100230stack()函数对列进行压缩:In[95]:stacked=df2.stack() In[96]:stackedOut[96]:firstsecondbaroneA0.02939 9B-0.542108twoA0.282696B-0.087302bazone A-1.575170B1.771208twoA0.816482B1.100230d type:float64解压缩:In[97]:stacked.unstack()Out[97]:ABf irstsecondbarone0.029399-0.542108two0.282696-0.0 87302bazone-1.5751701.771208two0.8164821.100230I n[98]:stacked.unstack(1)Out[98]:secondonetwofir stbarA0.0293990.282696B-0.542108-0.087302bazA-1.575 1700.816482B1.7712081.100230In[99]:stacked.unstack(0)Out [99]:firstbarbazsecondoneA0.029399-1.575 170B-0.5421081.771208twoA0.2826960.816482B-0.087302 1.100230数据透视表In[100]:df=pd.DataFrame({''A'':[''www.mntuku.cn '',''one'',''two'',''three'']3,.....:''B'':['' A'',''B'',''C'']4,.....:''C'':[''foo'',''foo'', ''foo'',''bar'',''bar'',''bar'']2,.....:''D'': np.random.randn(12),.....:''E'':np.random.r andn(12)}).....:In[101]:dfOut[101]:ABCD E0oneAfoo1.418757-0.1796661oneBfoo-1.87 90241.2918362twoCfoo0.536826-0.0096143threeA bar1.0061600.3921494oneBbar-0.0297160.2645995 oneCbar-1.146178-0.0574096twoAfoo0.100900-1.42 56387threeBfoo-1.0350181.0240988oneCfoo0.314 665-0.1060629oneAbar-0.7737231.82437510twoBb ar-1.1706530.59597411threeCbar0.6487401.167115生成数据透视 表:In[102]:pd.pivot_table(df,values=''D'',index=[''A'',''B''],colu mns=[''C''])Out[102]:CbarfooABoneA-0. 7737231.418757B-0.029716-1.879024C-1.1461780.314665three A1.006160NaNBNaN-1.035018C0.648740Na NtwoANaN0.100900B-1.170653NaNCNaN0 .536826时间序列In[103]:rng=pd.date_range(''1/1/2012'',periods=100, freq=''S'')In[104]:ts=pd.Series(np.random.randint(0,500,len( rng)),index=rng)In[105]:ts.resample(''5Min'').sum()Out[105]:201 2-01-0125083Freq:5T,dtype:int64时区表示:In[106]:rng=pd.dat e_range(''3/6/201200:00'',periods=5,freq=''D'')In[107]:ts=pd.S eries(np.random.randn(len(rng)),rng)In[108]:tsOut[108]:2012-0 3-060.4640002012-03-070.2273712012-03-08-0.4969222012- 03-090.3063892012-03-10-2.290613Freq:D,dtype:float64In [109]:ts_utc=ts.tz_localize(''UTC'')In[110]:ts_utcOut[110]:20 12-03-0600:00:00+00:000.4640002012-03-0700:00:00+00:000 .2273712012-03-0800:00:00+00:00-0.4969222012-03-0900:00:00+0 0:000.3063892012-03-1000:00:00+00:00-2.290613Freq:D,dty pe:float64转换到另一个时区:In[111]:ts_utc.tz_convert(''US/Eastern'')Out[ 111]:2012-03-0519:00:00-05:000.4640002012-03-0619:00:00-05 :000.2273712012-03-0719:00:00-05:00-0.4969222012-03-0819 :00:00-05:000.3063892012-03-0919:00:00-05:00-2.290613Freq :D,dtype:float64时间间隔表示变换:In[112]:rng=pd.date_range(''1/1/20 12'',periods=5,freq=''M'')In[113]:ts=pd.Series(np.random.randn (len(rng)),index=rng)In[114]:tsOut[114]:2012-01-31-1.13462 32012-02-29-1.5618192012-03-31-0.2608382012-04-300.2819 572012-05-311.523962Freq:M,dtype:float64In[115]:ps=ts. to_period()In[116]:psOut[116]:2012-01-1.1346232012-02-1. 5618192012-03-0.2608382012-040.2819572012-051.523962Fr eq:M,dtype:float64In[117]:ps.to_timestamp()Out[117]:2012-01 -01-1.1346232012-02-01-1.5618192012-03-01-0.2608382012-0 4-010.2819572012-05-011.523962Freq:MS,dtype:float64季度变 换:In[118]:prng=pd.period_range(''1990Q1'',''2000Q4'',freq=''Q-NO V'')In[119]:ts=pd.Series(np.random.randn(len(prng)),prng)In[ 120]:ts.index=(prng.asfreq(''M'',''e'')+1).asfreq(''H'',''s'')+9 In[121]:ts.head()Out[121]:1990-03-0109:00-0.9029371990-06- 0109:000.0681591990-09-0109:00-0.0578731990-12-0109:00 -0.3682041991-03-0109:00-1.144073Freq:H,dtype:float64类别I n[122]:df=pd.DataFrame({"id":[1,2,3,4,5,6],"raw_grade":[''a'', ''b'',''b'',''a'',''a'',''e'']})变换成类别数据类型:In[123]:df["grade"]=df[" raw_grade"].astype("category")In[124]:df["grade"]Out[124]:0 a1b2b3a4a5eName:grade,dtype:categoryCateg ories(3,object):[a,b,e]重命名:In[125]:df["grade"].cat.categor ies=["verygood","good","verybad"]重新排序:In[126]:df["grade"] =df["grade"].cat.set_categories(["verybad","bad","medium"," good","verygood"])In[127]:df["grade"]Out[127]:0verygood 1good2good3verygood4verygood5ver ybadName:grade,dtype:categoryCategories(5,object):[veryba d,bad,medium,good,verygood]排序:In[128]:df.sort_values(by="g rade")Out[128]:idraw_gradegrade56everyba d12bgood23bgood01 averygood34averygood45averygoo d按类别列分组:In[129]:df.groupby("grade").size()Out[129]:gradeveryb ad1bad0medium0good2verygood3d type:int64画图In[130]:ts=pd.Series(np.random.randn(1000),inde x=pd.date_range(''1/1/2000'',periods=1000))In[131]:ts=ts.cumsu m()In[132]:ts.plot()Out[132]:bplotat0x10efd5a90>In[133]:df=pd.DataFrame(np.random.randn( 1000,4),index=ts.index,.....:columns=[''A'', ''B'',''C'',''D'']).....:In[134]:df=df.cumsum()In[135]:plt.fi gure();df.plot();plt.legend(loc=''best'')Out[135]:gend.Legendat0x112854d90>数据本地存取:写入csv文件:In[136]:df.to_csv('' foo.csv'')读取csv文件:In[137]:pd.read_csv(''foo.csv'')Out[137]:Unn amed:0ABCD02000-01-01 0.266457-0.399641-0.2195821.18686012000-01-02-1.17 0732-0.3458731.653061-0.28295322000-01-03-1.734933 0.5304682.060811-0.51553632000-01-04-1.5551211.45262 00.239859-1.15689642000-01-050.5781170.5113710.10 3552-2.42820252000-01-060.4783440.449933-0.741620- 1.96240962000-01-071.235339-0.091757-1.543861-1.08475 3.................9932 002-09-20-10.628548-9.153563-7.88314628.3139409942002-09- 21-10.390377-8.727491-6.39964530.9141079952002-09-22-8. 985362-8.485624-4.66946231.3677409962002-09-23-9.558560 -8.781216-4.49981530.5184399972002-09-24-9.902058-9.340 490-4.38663930.1055939982002-09-25-10.216020-9.480682-3. 93380229.7585609992002-09-26-11.856774-10.671012-3.216025 29.369368[1000rowsx5columns]写入HDF5文件:In[138]:df.to_hdf(''f oo.h5'',''df'')读取HDF5文件:In[139]:pd.read_hdf(''foo.h5'',''df'')Out[139 ]:ABCD2000-01-010.266457-0.39 9641-0.2195821.1868602000-01-02-1.170732-0.3458731.653061-0.2829532000-01-03-1.7349330.5304682.060811-0.5155362000-01-04-1.5551211.4526200.239859-1.1568962000-01-050.5781170.5113710.103552-2.4282022000-01-060.4783440.449933-0.741620-1.9624092000-01-071.235339-0.091757-1.543861-1.084753...............2002-09-20-10.628548-9.153563-7.88314628.3139402002-09-21-10.390377-8.727491-6.39964530.9141072002-09-22-8.985362-8.485624-4.66946231.3677402002-09-23-9.558560-8.781216-4.49981530.5184392002-09-24-9.902058-9.340490-4.38663930.1055932002-09-25-10.216020-9.480682-3.93380229.7585602002-09-26-11.856774-10.671012-3.21602529.369368[1000rowsx4columns]写入excel文件:In[140]:df.to_excel(''foo.xlsx'',sheet_name=''Sheet1'')读取excel文件:In[141]:pd.read_excel(''foo.xlsx'',''Sheet1'',index_col=None,na_values=[''NA''])Out[141]:ABCD2000-01-010.266457-0.399641-0.2195821.1868602000-01-02-1.170732-0.3458731.653061-0.2829532000-01-03-1.7349330.5304682.060811-0.5155362000-01-04-1.5551211.4526200.239859-1.1568962000-01-050.5781170.5113710.103552-2.4282022000-01-060.4783440.449933-0.741620-1.9624092000-01-071.235339-0.091757-1.543861-1.084753...............2002-09-20-10.628548-9.153563-7.88314628.3139402002-09-21-10.390377-8.727491-6.39964530.9141072002-09-22-8.985362-8.485624-4.66946231.3677402002-09-23-9.558560-8.781216-4.49981530.5184392002-09-24-9.902058-9.340490-4.38663930.1055932002-09-25-10.216020-9.480682-3.93380229.7585602002-09-26-11.856774-10.671012-3.21602529.369368[1000rowsx4columns] |
|