python - Pandas: Querying a DataFrame - multiple criteria -


i have large dataframe 2 of columns are:

  • a question code (e.g q453)
  • true or false denote whether or not question answered correctly

i not know possible question codes , each question has been answered multiple times. like:

q666 true

q463 true

q056 true

q666 false

i wish know question easiest, precisely question has highest ratio of attempts correct answers. steps achieve using pandas?

i made sample of data post. you're looking for?

i made true/false indicators lowercase strings; not sure stored in file. if you're on python2, believe can change from io import stringio from stringio import stringio.

in [105]: import pandas pd      ...: io import stringio      ...:      ...: data = """q666 true      ...:      ...: q463 true      ...:      ...: q056 true      ...:      ...: q666 false      ...: q666 true      ...:      ...:      ...: q463 true      ...:      ...: q056 true      ...: q666 false      ...: q463 false      ...: q666 false"""      ...:  in [106]: df = pd.read_csv(stringio(data), sep=" ", header=none)  in [107]: df.columns = [['question', 'answer']]  in [108]: df['answer'] = df['answer'].astype("str").apply(lambda x: x.lower())  in [109]: df out[109]:   question answer 0     q666   true 1     q463   true 2     q056   true 3     q666  false 4     q666   true 5     q463   true 6     q056   true 7     q666  false 8     q463  false 9     q666  false 

assign constant value to dataframe can count groupby our conditions, move multiindex need it.

in [110]: piv = (df      ...:        .assign(val = 1)      ...:        .groupby(["question", "answer"])      ...:        .count()      ...:        .unstack()['val']) 

it's easy here...

in [111]: piv['ratio'] = piv['true'] / piv.sum(axis=1)  in [112]: piv.sort_values("ratio") out[112]: answer    false  true     ratio question q666        3.0   2.0  0.400000 q463        1.0   2.0  0.666667 q056        nan   2.0  1.000000 

Comments

Popular posts from this blog

sql server - Cannot query correctly (MSSQL - PHP - JSON) -

php - trouble displaying mysqli database results in correct order -

C++ Linked List -