Suppose you are a teacher, and want to keep track of and do some simple descriptive stats on your student’s scores in a number of subjects. So you have recorded scores in a frequency table as below:

Left-most column has the scores, and the next four columns show the frequency of those scores for the four subjects.
By summing the columns we can find the number of students having obtained these scores in each subject:
subject math 11 physics 14 chemistry 11 biology 15
Now, suppose you’re interested in the average score (median & mean) of the students, in each subject.
How would you go about obtaining those descriptive stats…? You can’t simply apply a mean() or median() function on each column, because that would give you the average frequency for the subject, not the average score.
The problem is that we have weights (the frequencies) on each score.
Pandas makes the problem a breeze. Below code with just a handful lines to set up the dataframe for the table, and to massage that dataframe to produce the descriptive stats we want:
import numpy as np import pandas as pd np.random.seed(666) vals = [0,1,2,3,4,5] nr_categories = 5 df_freq = pd.DataFrame({'math' : np.random.choice(vals,nr_categories,replace=False), 'physics' : np.random.choice(vals,nr_categories,replace=False), 'chemistry' : np.random.choice(vals,nr_categories,replace=False), 'biology' : np.random.choice(vals,nr_categories,replace=False)}, index=range(1,nr_categories + 1)) df_freq.index.name='points' df_freq.columns.name='subject' df_freq.sort_index(inplace=True) df_freq_subgrps = pd.DataFrame(df_freq.unstack()) df_freq_subgrps.columns=['freq'] df_freq_subgrps['points'] = df_freq_subgrps.index.get_level_values(1) df_freq_subgrps = df_freq_subgrps[['points','freq']] df_freq_subgrps = df_freq_subgrps.droplevel(1) serialized_points = df_freq_subgrps.groupby('subject',group_keys=False).apply( lambda row: np.repeat(row['points'],row['freq'])) stats = serialized_points.groupby('subject').agg(['median','mean','count','min','max']) stats
The output:
