Python + oTree Crash Course

Session 4 - Common Modules

Ali Seyhun Saral (IAST)

IMPRS Be Smart Summer School

2023-08-07

Recap

Modules

  • One great advantage of python that it has a vast ecosystem of packages.

  • Some packages are build in, but still needs to be imported.

  • Python use the syntax import packagename to import a package.

  • The functions, methods etc. comes as a subset of the package, which can be reached by a dot.

Importing modules

  • Modules usually contain a lot of functions, classes, etc.
import random

random.choice(['ali', "bob", "chiara"])
'bob'

You can find the documentation of the module here ## Modules

Modules

  • You can also import all objects directly. Then you wont need to call the package name before. . . .
from random import *

choice(['ali', "bob", "chiara"])
'ali'
  • Or a subset:
from random import choice

choice(['ali', "bob", "chiara"])
'chiara'

Modules

  • Or you can use an alias for the module . . .
import random as rnd

rnd.choice(['ali', "bob", "chiara"])
'bob'

Built-in packages

  • Python has a lot of built-in modules that you can use without installing anything.

  • You can find the list of built-in modules here: docs.python.org/3/library/

  • Some common built-in modules are:

    • math for mathematical functions
    • random for random number generation
    • os for operating system related functions
    • sys for system related functions
    • datetime for date and time related functions

Installing modules/packages

  • You can install modules using pip command.

  • pip is a package manager for python.

  • You can install a package using:

pip install packagename

  • Some common packages are:
    • numpy for numerical computing
    • pandas for data analysis
    • scikit-learn for machine learning
    • And we will use otree for package!

Lists

  • Lists are very flexible and can hold different types
  • But they are not very good for mathematical operations
  • Not very efficient for larger data

List calcuations

prices = [3, 7, 9, 2]
quantities = [3, 3, 2, 1]

# this will not work
total_cost = prices * quantities

## you should loop over items
TypeError: can't multiply sequence by non-int of type 'list'

NumPy

  • Numpy, “Numerical Python”, is a library for scientific computing

  • Brings Numpy Array data type which is similar to vectors

  • Install it by:

pip install numpy

  • And import it:

import numpy as np

NumPy Arrays

import numpy as np

prices = np.array([3, 7, 9, 2])
print(prices)
[3 7 9 2]
quantities = np.array([3, 3, 2, 1])
print(quantities)
[3 3 2 1]
# now this will work
total_cost = prices * quantities
print(total_cost)
[ 9 21 18  2]

NumPy Arrays

  • It assumes that all elements are of the same type

  • It has its own methods

quantities1 = np.array([3, 3, 2, 1])

quantities2 = np.array([1, 1, 2, 3])

quantities1 + quantities2
array([4, 4, 4, 4])

Indexing is similar to lists

quantities = np.array([3, 4, 2, 1])

quantities[0]
3
quantities[1:3]
array([4, 2])

You can also subset based on a condition

quantities = np.array([3, 4, 2, 1])
quantities[quantities > 2]
array([3, 4])

Numpy can hold 2 (or more) dimensional arrays

locations = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9],[10, 11, 12]])

print(locations)
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
print(locations.shape)
(4, 3)
# get the second row, third column
locations[1][2]

# or 
locations[1, 2]
6

Numpy subsetting

locations = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9],[10, 11, 12]])
print(locations)
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
# get the second rows
locations[1, :]
array([4, 5, 6])
# get the middle-right four elements
locations[1:3, 1:3]
array([[5, 6],
       [8, 9]])

Some other things you can do with NumPy

  • Statistical operations
  • Linear algebra
  • Random number generation
  • And many more

Refer to the documentation for more details

Pandas

  • Numpy likes only one type of data
  • Pandas is a library for data analysis
  • It brings DataFrame data type
      Name  Age         City
0    Alice   25     New York
1      Bob   30      Chicago
2  Charlie   22      Chicago
3    David   28  Los Angeles
type(df)
pandas.core.frame.DataFrame

Pandas (2)

  • Select rows
# select the city Chicago
df[df['City'] == 'Chicago']
Name Age City
1 Bob 30 Chicago
2 Charlie 22 Chicago

Pandas (3)

  • Select columns
df[['Name', 'Age']]
Name Age
0 Alice 25
1 Bob 30
2 Charlie 22
3 David 28

Statistical/Economic Analysis: Statsmodels

  • Statsmodels is a library for statistical and econometric analysis
  • It has many models and methods
  • Some of them are:
    • Regression and Linear models
    • Logistic regression
    • Time series analysis
    • Panel data analysis
    • And many more

Linear regression

import numpy as np
import statsmodels.api as sm

# Generate some data
x = np.random.normal(size=100)
y = 2 * x + np.random.normal(size=100)

# Fit and summarize OLS model
model = sm.OLS(y, sm.add_constant(x))
results = model.fit()
print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.734
Model:                            OLS   Adj. R-squared:                  0.731
Method:                 Least Squares   F-statistic:                     269.9
Date:                Sun, 06 Aug 2023   Prob (F-statistic):           6.58e-30
Time:                        22:07:19   Log-Likelihood:                -148.48
No. Observations:                 100   AIC:                             301.0
Df Residuals:                      98   BIC:                             306.2
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.1549      0.108     -1.436      0.154      -0.369       0.059
x1             1.9397      0.118     16.430      0.000       1.705       2.174
==============================================================================
Omnibus:                        5.195   Durbin-Watson:                   1.787
Prob(Omnibus):                  0.074   Jarque-Bera (JB):                4.557
Skew:                          -0.441   Prob(JB):                        0.102
Kurtosis:                       3.563   Cond. No.                         1.09
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Plotting: Seaborn

Seaborn

  • Seaborn is a library for statistical data visualization
  • It has many types of plots

Seaborn example

import seaborn as sns
import matplotlib.pyplot as plt

sns.regplot(x=x, y=y)
plt.title('Scatter Plot of x vs y')
plt.show()

Scikit-learn

Scikit-learn

  • Scikit-learn is a library for machine learning
  • It has many models and methods
  • Some of them are:
    • Classification
    • Regression
    • Clustering
    • Dimensionality reduction
    • And many more

Python Ecosystem for Research (AFAIK 🤷‍♂️)

{.width=135%}

What we covered

  • Python basics
  • Functions
  • Lists
  • Dictionaries
  • Classes

What we didn’t cover

  • Tuples
  • Sets
  • (many other things)

Things to do before the next class


Install Python 3.8 or higher https://www.python.org/downloads/

Install Jupyter Lab https://jupyter.org/install

Install Visual Studio Code https://code.visualstudio.com/ (or any other text editor)

Get a Github account