Python + oTree Crash Course

Session 4 - Common Modules

Ali Seyhun Saral (IAST)

IMPRS Be Smart Summer School

2023-08-07

Recap

Modules

One great advantage of python that it has a vast ecosystem of packages.
Some packages are build in, but still needs to be imported.
Python use the syntax import packagename to import a package.
The functions, methods etc. comes as a subset of the package, which can be reached by a dot.

Importing modules

Modules usually contain a lot of functions, classes, etc.

import random

random.choice(['ali', "bob", "chiara"])

'bob'

You can find the documentation of the module here ## Modules

Modules

You can also import all objects directly. Then you wont need to call the package name before. . . .

from random import *

choice(['ali', "bob", "chiara"])

'ali'

Or a subset:

from random import choice

choice(['ali', "bob", "chiara"])

'chiara'

Modules

Or you can use an alias for the module . . .

import random as rnd

rnd.choice(['ali', "bob", "chiara"])

'bob'

Built-in packages

Python has a lot of built-in modules that you can use without installing anything.
You can find the list of built-in modules here: docs.python.org/3/library/
Some common built-in modules are:
- math for mathematical functions
- random for random number generation
- os for operating system related functions
- sys for system related functions
- datetime for date and time related functions

Installing modules/packages

You can install modules using pip command.
pip is a package manager for python.
You can install a package using:

`pip install packagename`

Some common packages are:
- numpy for numerical computing
- pandas for data analysis
- scikit-learn for machine learning
- …
- And we will use otree for package!

Lists

Lists are very flexible and can hold different types
But they are not very good for mathematical operations
Not very efficient for larger data

List calcuations

prices = [3, 7, 9, 2]
quantities = [3, 3, 2, 1]

# this will not work
total_cost = prices * quantities

## you should loop over items

TypeError: can't multiply sequence by non-int of type 'list'

NumPy

Numpy, “Numerical Python”, is a library for scientific computing
Brings Numpy Array data type which is similar to vectors
Install it by:

pip install numpy

And import it:

import numpy as np

NumPy Arrays

import numpy as np

prices = np.array([3, 7, 9, 2])
print(prices)

[3 7 9 2]

quantities = np.array([3, 3, 2, 1])
print(quantities)

[3 3 2 1]

# now this will work
total_cost = prices * quantities
print(total_cost)

[ 9 21 18  2]

NumPy Arrays

It assumes that all elements are of the same type
It has its own methods

quantities1 = np.array([3, 3, 2, 1])

quantities2 = np.array([1, 1, 2, 3])

quantities1 + quantities2

array([4, 4, 4, 4])

Indexing is similar to lists

quantities = np.array([3, 4, 2, 1])

quantities[0]

quantities[1:3]

array([4, 2])

You can also subset based on a condition

quantities = np.array([3, 4, 2, 1])
quantities[quantities > 2]

array([3, 4])

Numpy can hold 2 (or more) dimensional arrays

locations = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9],[10, 11, 12]])

print(locations)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

print(locations.shape)

(4, 3)

# get the second row, third column
locations[1][2]

# or 
locations[1, 2]

Numpy subsetting

locations = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9],[10, 11, 12]])
print(locations)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

# get the second rows
locations[1, :]

array([4, 5, 6])

# get the middle-right four elements
locations[1:3, 1:3]

array([[5, 6],
       [8, 9]])

Some other things you can do with NumPy

Statistical operations
Linear algebra
Random number generation
And many more

Refer to the documentation for more details

Pandas

Numpy likes only one type of data
Pandas is a library for data analysis
It brings DataFrame data type

      Name  Age         City
0    Alice   25     New York
1      Bob   30      Chicago
2  Charlie   22      Chicago
3    David   28  Los Angeles

type(df)

pandas.core.frame.DataFrame

Pandas (2)

Select rows

# select the city Chicago
df[df['City'] == 'Chicago']

	Name	Age	City
1	Bob	30	Chicago
2	Charlie	22	Chicago

Pandas (3)

Select columns

df[['Name', 'Age']]

	Name	Age
0	Alice	25
1	Bob	30
2	Charlie	22
3	David	28

Statistical/Economic Analysis: Statsmodels

Statsmodels is a library for statistical and econometric analysis
It has many models and methods
Some of them are:
- Regression and Linear models
- Logistic regression
- Time series analysis
- Panel data analysis
- And many more

Linear regression

import numpy as np
import statsmodels.api as sm

# Generate some data
x = np.random.normal(size=100)
y = 2 * x + np.random.normal(size=100)

# Fit and summarize OLS model
model = sm.OLS(y, sm.add_constant(x))
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.734
Model:                            OLS   Adj. R-squared:                  0.731
Method:                 Least Squares   F-statistic:                     269.9
Date:                Sun, 06 Aug 2023   Prob (F-statistic):           6.58e-30
Time:                        22:07:19   Log-Likelihood:                -148.48
No. Observations:                 100   AIC:                             301.0
Df Residuals:                      98   BIC:                             306.2
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.1549      0.108     -1.436      0.154      -0.369       0.059
x1             1.9397      0.118     16.430      0.000       1.705       2.174
==============================================================================
Omnibus:                        5.195   Durbin-Watson:                   1.787
Prob(Omnibus):                  0.074   Jarque-Bera (JB):                4.557
Skew:                          -0.441   Prob(JB):                        0.102
Kurtosis:                       3.563   Cond. No.                         1.09
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Plotting: Seaborn

Seaborn

Seaborn is a library for statistical data visualization
It has many types of plots

Seaborn example

import seaborn as sns
import matplotlib.pyplot as plt

sns.regplot(x=x, y=y)
plt.title('Scatter Plot of x vs y')
plt.show()

Scikit-learn

Scikit-learn is a library for machine learning
It has many models and methods
Some of them are:
- Classification
- Regression
- Clustering
- Dimensionality reduction
- And many more

Python Ecosystem for Research (AFAIK 🤷‍♂️)

{.width=135%}

What we covered

Python basics
Functions
Lists
Dictionaries
Classes

What we didn’t cover

Tuples
Sets
(many other things)

Things to do before the next class

Install Python 3.8 or higher https://www.python.org/downloads/

Install Jupyter Lab https://jupyter.org/install

Install Visual Studio Code https://code.visualstudio.com/ (or any other text editor)

Get a Github account