English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Aggregazione di Pandas

Esempi di operazioni di aggregazione di Pandas

Dopo aver creato gli oggetti rolling, expanding e ewm, è possibile eseguire l'aggregazione sui dati utilizzando vari metodi.

Aggregazione del DataFrame

Creiamo un DataFrame e applichiamoci l'aggregazione

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(10, 4),
    index = pd.date_range('1/1/2000', periods=10),
    columns = ['A', 'B', 'C', 'D']
 print df
 r = df.rolling(window=3, min_periods=1)
 print r

I risultati dell'esecuzione sono come segue:

                  A           B           C           D
2000-01-01  1.088512  -0.650942  -2.547450  -0.566858
2000-01-02   0.790670   -0.387854   -0.668132    0.267283
2000-01-03  -0.575523   -0.965025    0.060427   -2.179780
2000-01-04   1.669653    1.211759   -0.254695    1.429166
2000-01-05   0.100568   -0.236184    0.491646   -0.466081
2000-01-06   0.155172    0.992975   -1.205134    0.320958
2000-01-07   0.309468   -0.724053   -1.412446    0.627919
2000-01-08   0.099489   -1.028040    0.163206   -1.274331
2000-01-09  1.639500  -0.068443  0.714008  -0.565969
2000-01-10  0.326761  1.479841  0.664282  -1.361169
Rolling [window=3,min_periods=1,center=False,axis=0]

我们可以通过将函数传递给整个DataFrame进行聚合,也可以通过标准的get item方法选择一列。

对Dataframe聚合

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(10, 4),
    index = pd.date_range('1/1/2000', periods=10),
    columns = ['A', 'B', 'C', 'D']
 print df
 r = df.rolling(window=3, min_periods=1)
 print r.aggregate(np.sum)

I risultati dell'esecuzione sono come segue:

                  A           B           C           D
2000-01-01  1.088512  -0.650942  -2.547450  -0.566858
2000-01-02  1.879182  -1.038796  -3.215581  -0.299575
2000-01-03  1.303660  -2.003821  -3.155154  -2.479355
2000-01-04  1.884801  -0.141119  -0.862400  -0.483331
2000-01-05  1.194699  0.010551  0.297378  -1.216695
2000-01-06  1.925393  1.968551  -0.968183  1.284044
2000-01-07 0.565208 0.032738 -2.125934 0.482797
2000-01-08 0.564129 -0.759118 -2.454374 -0.325454
2000-01-09 2.048458 -1.820537 -0.535232 -1.212381
2000-01-10 2.065750 0.383357 1.541496 -3.201469
                    A           B           C           D
2000-01-01  1.088512  -0.650942  -2.547450  -0.566858
2000-01-02  1.879182  -1.038796  -3.215581  -0.299575
2000-01-03  1.303660  -2.003821  -3.155154  -2.479355
2000-01-04  1.884801  -0.141119  -0.862400  -0.483331
2000-01-05  1.194699  0.010551  0.297378  -1.216695
2000-01-06  1.925393  1.968551  -0.968183  1.284044
2000-01-07 0.565208 0.032738 -2.125934 0.482797
2000-01-08 0.564129 -0.759118 -2.454374 -0.325454
2000-01-09 2.048458 -1.820537 -0.535232 -1.212381
2000-01-10 2.065750 0.383357 1.541496 -3.201469

将聚合应用于Dataframe的单列

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(10, 4),
    index = pd.date_range('1/1/2000', periods=10),
    columns = ['A', 'B', 'C', 'D']
 print df
 r = df.rolling(window=3, min_periods=1)
 print r['A'].aggregate(np.sum)

I risultati dell'esecuzione sono come segue:

               A           B           C           D
2000-01-01  1.088512  -0.650942  -2.547450  -0.566858
2000-01-02  1.879182  -1.038796  -3.215581  -0.299575
2000-01-03  1.303660  -2.003821  -3.155154  -2.479355
2000-01-04  1.884801  -0.141119  -0.862400  -0.483331
2000-01-05  1.194699  0.010551  0.297378  -1.216695
2000-01-06  1.925393  1.968551  -0.968183  1.284044
2000-01-07 0.565208 0.032738 -2.125934 0.482797
2000-01-08 0.564129 -0.759118 -2.454374 -0.325454
2000-01-09 2.048458 -1.820537 -0.535232 -1.212381
2000-01-10 2.065750 0.383357 1.541496 -3.201469
2000-01-01  1.088512
2000-01-02  1.879182
2000-01-03  1.303660
2000-01-04  1.884801
2000-01-05  1.194699
2000-01-06  1.925393
2000-01-07  0.565208
2000-01-08  0.564129
2000-01-09  2.048458
2000-01-10  2.065750
Freq: D, Name: A, dtype: float64

将聚合应用于DataFrame的多个列

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(10, 4),
    index = pd.date_range('1/1/2000', periods=10),
    columns = ['A', 'B', 'C', 'D']
 print df
 r = df.rolling(window=3, min_periods=1)
 print r[['A','B']].aggregate(np.sum)

I risultati dell'esecuzione sono come segue:

               A           B           C           D
2000-01-01  1.088512  -0.650942  -2.547450  -0.566858
2000-01-02  1.879182  -1.038796  -3.215581  -0.299575
2000-01-03  1.303660  -2.003821  -3.155154  -2.479355
2000-01-04  1.884801  -0.141119  -0.862400  -0.483331
2000-01-05  1.194699  0.010551  0.297378  -1.216695
2000-01-06  1.925393  1.968551  -0.968183  1.284044
2000-01-07 0.565208 0.032738 -2.125934 0.482797
2000-01-08 0.564129 -0.759118 -2.454374 -0.325454
2000-01-09 2.048458 -1.820537 -0.535232 -1.212381
2000-01-10 2.065750 0.383357 1.541496 -3.201469
                    A  B
2000-01-01  1.088512  -0.650942
2000-01-02  1.879182  -1.038796
2000-01-03  1.303660  -2.003821
2000-01-04  1.884801  -0.141119
2000-01-05  1.194699  0.010551
2000-01-06  1.925393  1.968551
2000-01-07  0.565208  0.032738
2000-01-08  0.564129  -0.759118
2000-01-09  2.048458  -1.820537
2000-01-10   2.065750    0.383357

在数据框的单列上应用多个功能

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(10, 4),
    index = pd.date_range('1/1/2000', periods=10),
    columns = ['A', 'B', 'C', 'D']
 print df
 r = df.rolling(window=3, min_periods=1)
 print r['A'].aggregate([np.sum,np.mean])

I risultati dell'esecuzione sono come segue:

               A           B           C           D
2000-01-01  1.088512  -0.650942  -2.547450  -0.566858
2000-01-02  1.879182  -1.038796  -3.215581  -0.299575
2000-01-03  1.303660  -2.003821  -3.155154  -2.479355
2000-01-04  1.884801  -0.141119  -0.862400  -0.483331
2000-01-05  1.194699  0.010551  0.297378  -1.216695
2000-01-06  1.925393  1.968551  -0.968183  1.284044
2000-01-07 0.565208 0.032738 -2.125934 0.482797
2000-01-08 0.564129 -0.759118 -2.454374 -0.325454
2000-01-09 2.048458 -1.820537 -0.535232 -1.212381
2000-01-10 2.065750 0.383357 1.541496 -3.201469
                  sum       mean
2000-01-01   1.088512   1.088512
2000-01-02   1.879182   0.939591
2000-01-03   1.303660   0.434553
2000-01-04   1.884801   0.628267
2000-01-05   1.194699   0.398233
2000-01-06   1.925393   0.641798
2000-01-07   0.565208   0.188403
2000-01-08   0.564129   0.188043
2000-01-09   2.048458   0.682819
2000-01-10   2.065750   0.688583

在数据框的多个列上应用多个功能

 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(10, 4),
    index = pd.date_range('1/1/2000', periods=10),
    columns = ['A', 'B', 'C', 'D']
 print df
 r = df.rolling(window=3, min_periods=1)
 print r[['A','B']].aggregate([np.sum,np.mean])

I risultati dell'esecuzione sono come segue:

               A           B           C           D
2000-01-01  1.088512  -0.650942  -2.547450  -0.566858
2000-01-02  1.879182  -1.038796  -3.215581  -0.299575
2000-01-03  1.303660  -2.003821  -3.155154  -2.479355
2000-01-04  1.884801  -0.141119  -0.862400  -0.483331
2000-01-05  1.194699  0.010551  0.297378  -1.216695
2000-01-06  1.925393  1.968551  -0.968183  1.284044
2000-01-07 0.565208 0.032738 -2.125934 0.482797
2000-01-08 0.564129 -0.759118 -2.454374 -0.325454
2000-01-09 2.048458 -1.820537 -0.535232 -1.212381
2000-01-10 2.065750 0.383357 1.541496 -3.201469
                    A B
                  sum mean sum mean
2000-01-01 1.088512 1.088512 -0.650942 -0.650942
2000-01-02 1.879182 0.939591 -1.038796 -0.519398
2000-01-03 1.303660 0.434553 -2.003821 -0.667940
2000-01-04 1.884801 0.628267 -0.141119 -0.047040
2000-01-05 1.194699 0.398233 0.010551 0.003517
2000-01-06 1.925393 0.641798 1.968551 0.656184
2000-01-07 0.565208 0.188403 0.032738 0.010913
2000-01-08 0.564129 0.188043 -0.759118 -0.253039
2000-01-09 2.048458 0.682819 -1.820537 -0.606846
2000-01-10 2.065750 0.688583 0.383357 0.127786

Applicare diverse funzioni alle colonne di un data frame

 import pandas as pd
 import numpy as np
  
 df = pd.DataFrame(np.random.randn(3, 4))
    index = pd.date_range('1/1/2000', periods=3),
    columns = ['A', 'B', 'C', 'D']
 print df
 r = df.rolling(window=3, min_periods=1)
 print r.aggregate({'A': np.sum, 'B': np.mean})

I risultati dell'esecuzione sono come segue:

                  A          B          C          D
2000-01-01  -1.575749  -1.018105   0.317797  0.545081
2000-01-02  -0.164917  -1.361068   0.258240  1.113091
2000-01-03   1.258111   1.037941  -0.047487  0.867371
                    A          B
2000-01-01  -1.575749  -1.018105
2000-01-02  -1.740666  -1.189587
2000-01-03  -0.482555  -0.447078