First Session: Introduction to Python and Jupyter Notebooks¶

1. Getting Started with Jupyter Notebooks¶

The first thing to do is to open this introduction document using Jupyter Notebook. To do this, in a terminal, run the command:

jupyter notebook

This will automatically open the browser, where you can start working. The main tab represents the file tree, where you should have saved the introduction.ipynb file.

Notebooks are composed of cells containing either code (in Python) or text (simple or formatted with Markdown markup). IPython allows for interactive calculations in Python. We use Python version 3 (hence the command jupyter). When a cell ends with a calculation or a single variable, upon evaluation, IPython displays Out[n]: followed by its value. If it is an instruction, nothing is displayed (but they are executed).

You can edit a cell by double-clicking on it, and evaluate it by typing Ctrl+Enter (often Shift+Enter is used to evaluate and move to the next cell). The buttons in the toolbar will be very useful, hover over them to see a tooltip if their icon is not clear enough. Don’t forget to save your work from time to time, even though IPython makes regular automatic saves.

(a) Edit the three example cells below, modify them, and validate them.

(b) Add a code cell (before the beginning of part 2) where you display the value of $12a + 5$ (after validating the cell where $a$ is initialized).

(c) Re-edit a code cell, validate the result, and observe that the number after In and Out is incremented. You can go back to a previous cell, and it will take into account the changes made in cells that have already been validated.

In [ ]:
## Markdown Cell ##
Here is a cell that should have been considered as text but is saved as code. Change the cell type to "Markdown" instead of "Code," and you will see a change in the syntax highlighting.

Modify its content as you wish and evaluate it.
### Lab work of the best student of the CIMPA School *Ordered Structures with Applications in Finance and Machine Learning* ###
In [1]:
# Here, a Python code cell. Comments start with a hash #.
a = 12
a**2 + 12 # The power operator is written as **
# The cell may appear to have already been evaluated, but this is not the case.
# The results of previous evaluations (before opening the notebook) remain displayed.
# But if the cell is not re-evaluated, the variable a will not be assigned.
Out[1]:
156

Finally, here is a Markdown cell already validated. Double-click to edit and see the syntax. In Markdown markup (like in this cell), you can simply format the text in italic or bold. You can make

  • lists
  • numbered lists:
    1. Equations
    2. Links

Equations can be directly given in $\LaTeX$ format, so for example, you can write $x_i$ in the middle of a line using $, or write more complete equations using $$: $$\sum_{n=1}^{\infty}\frac1{n^2}=\frac{\pi^2}6. $$

Modify this formula to insert another one (not necessarily mathematically correct, but using at least one Greek letter and a fraction).

Some basic $\LaTeX$ commands are explained on this page, and for more details, you can consult the translation of "A not so short introduction to $\LaTeX2e$," available here, particularly chapter 3.

(d) Advanced Handling: Use the menus Help → User Interface Tour and Help → Keyboard Shortcuts for keyboard usage. You can navigate between cells with the arrow keys and use a large number of shortcuts in Command mode (use Esc to enter Command mode or Enter for Edit mode).

2. Basics of Python¶

Structure of Python¶

The foundation of Python code structure is indentation.

(a) Here is an example of a for loop and list usage (the append function adds an element). Validate the cell and understand what it does.

In [ ]:
squares_list = []
for i in range(4):
    # For loop, indentation starts after the colon
    # It is done automatically in IPython.
    squares_list.append(i**2)
# When we return to the previous indentation, we are no longer in the loop

Note: In Python, indices start at ZERO!

(b) Try to produce the same result with a while loop. It is recommended to use Edit → Split Cell to split this cell into two, so you can insert a code cell between these two questions (b) and (c)!

(c) Validate the following cell, read the entire error code (it’s in English, you’ll have to get used to it), and try to understand it to debug the code. Along the way, notice another way to make a for loop by iterating over an arbitrary list.

In [ ]:
for i in [12, 15, 6, 20]  # Add the missing colon at the end of the for statement
    if i > 12:
        print(i, "is really greater than 12")  # Corrected indentation and translation
   else:  # Correct indentation
        print("12 is greater than", i)
print("Well done")

(d) Use the Tab key for autocompletion (which gives you the range function) and Shift+Tab for help on functions. If you do it twice in a row, it extends the help tooltip. Calculate, for example, $\sum_{i=3}^{12} i(i+1)$ using range and a for loop.

In [ ]:
rang

Python code aims to be very readable, here are some practical tips:

In [ ]:
# Defining multiple values at the same time
A, B = 12, 15
# No need for a temporary variable to swap the values of A and B
A, B = B, A
print(A, B)
In [ ]:
# Arithmetic operators += -= *=
A += 3
# Avoid formulas like x = x + 1.
B *= 2
print(A, B)
# Using ** (power) to calculate roots
2**.5

(e) Creating lists using list comprehension. It is one way to create a list in a readable manner. Validate the following example, and similarly create a list of lists where the elements will be of the form $3i + j$ for $0 \leq i \leq 5$ and $0 \leq j \leq i$, then evaluate the second element $(j = 1)$ of the third $(i = 2)$ list.

In [ ]:
# Example of list comprehension to create a list of squares
squares_list = [i**2 for i in range(0, 12)]
print(squares_list[5])  # Evaluating the sixth element (index 5) of the list of squares

# Creating a list of lists where elements are of the form 3i + j
list_of_lists = [[3*i + j for j in range(i+1)] for i in range(6)]
print(list_of_lists[2][1])  # Evaluating the second element (j = 1) of the third (i = 2) list

Different Types of Variables¶

In the basics of Python, we will primarily use floats, integers (which are not limited in Python), booleans (valued True or False), strings, and lists, which can contain elements of different types. The print function that we have already encountered allows us to display the content of different variables in a readable manner.

(f) Understand the result of the floating-point calculations below. Observe (and experiment with) the different possibilities of operations on variables of differypes.ent t

In [ ]:
print((.2+.005)*5-1.025)
for i in [2,5,12,20]:
    print((1+10**(-i)-1)/10**(-i))
In [ ]:
print([1, 2., 3] + ["twelve"] * 2 + [3 > 12, 4 <= 12])
print("text " + "copy " * 2 + 
      "string with the apostrophe \n" + 
      'or the "straight quotes"')
# Notice the separation into multiple lines, this is not a problem.
print("Formatting the number 2π: %.3f" % 6.283185)

print(12/3, 12//3, 12/5, 12.//5.)    

3. Using Software Libraries for Scientific Computing¶

Importing Various Components¶

(a) Read the error message of the following cell in its entirety and try to understand everything thatd, then move on. is sai

In [ ]:
sin(B)

The function sin is not part of the basic commands of Python. We need to use a library, and we will use numpy. You can import libraries in the following way, even giving them a short name (most people simply write np)

In [ ]:
import numpy as np
np.sin(12)

It is also possible to import specific functions from a library without having to import everything and without specifying the library name afterwards:

(b) Validate the following cell and then revalidate the previous one containing only sin(B). The error message should have disarppea

In [ ]:
from numpy import sin

There is actually a special command iJupyteron: %pylab which loads the tools from numpy (for scientific computing) and matplotlib (for plotting). The keyword inline allows you to display plots directly in the notebook (if you remove it, the plots will be opened in a new window), allowing you to directly submit the lab work at the end of the session with the plots you have produced. For the following lab works, we will ALWAYS start with this line, which we will not forget to validate (and revalidate every time we restart the Kernel)

In [3]:
%pylab inline
%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib
In [ ]:
cos(B), exp(A)

Functions from numpy that we will use¶

We now move on to functions related to numpy, a library that allows for scientific computing. With the %pylab directive that we used at the beginning, all numpy functions are directly accessible.

The basic type is the array. Unlike lists, all elements are of the same type (in our case, they will almost always be floats). And operations behave well with scalars or between arrays of the same size.

(c) Observe what the different cells below produce, and experiment to familiarize yoursethe array type.lf with

In [ ]:
## Creating arrays (type array)
n = 3
T = zeros([n, n])
print(eye(n))       # identity matrix of size n
print(ones([n, 2]))  # a matrix of ones, of size n x 2

# Filling with two loops,
for i in range(n):
    for j in range(n):
        T[i, j] = 1 + (i - 2*j)**2

print(T + 1)
print(T + 2 * eye(n))
In [ ]:
# We could have more directly used list comprehension.
# We create the array with array(list).
# This method is better in terms of performance but also readability.
A = array([[1 + i/2 + (i - j)**3 for j in range(n)] for i in range(n)])
print(A)
In [ ]:
# Indexing, accessing elements
print(T[2, 0:2])
print(T[:, 1])
print(T[1:, :-1])
In [ ]:
# Direct calculation of sums
print(sum(T))
print(sum(T, 1))
print(sum(T, 0))

In almost all sessions, we will use arrays for matrix calculations. There is a matrix class, but we will not use it intentionally, as it causes confusion in many cases and will not be useful to us.

Note: The multiplication * is not matrix multiplication, but element-wise multiplication of arrays. For matrix multiplication, we will use the dot function: dot(A, B) returns the matrix product of A and B if they are two-dimensional arrays. Similarly, common functions generally operate element-wise.

Vectors will be represented by one-dimensional array type arrays, and matrices similarly, but with two dimensions.

(d) Familiarize yourself with matrix calculations and common functions on arrays using the examples below. Calculate the exponential of the identity matrix. Is it the result given by exp(eye(n))?

In [ ]:
print("Matrix product: \n", dot(T, transpose(T)))
In [ ]:
# Solving a linear system of the type Ax = b (without calculating the inverse)
b = array([1., 2, 12])
print(b)
x = solve(T, b)
print(x)
print(norm(dot(T, x) - b))

print("Dot product: ", dot(b, x))

Graphs¶

We use matplotlib to plot graphs. Here too, the functions are directly accessible thanks to the initial %pylab directive. Don't hesitate to use the interactive help by typing plot(, then Shift+Tab, to get the syntax and the various possible options. You can also explicitly request help on a function.

(e) The most useful basic function for obtaining graphs is probably linspace. Get help by validating the cell below, then do the same for the plot function. Plot your favorite curve inspired by the example.

In [ ]:
linspace?
In [4]:
# Displaying graphs
X = linspace(0, 3, 100)
# Common functions also work on arrays
plot(X, sin(5*X)*exp(-X))
Y = linspace(0, 3, 4)
plot(Y, cos(Y), '-o')
axhline(0, color="black", lw=.5)
show()
No description has been provided for this image

Different options are available for plotting graphs: see examples on the Matplotlib website: http://matplotlib.org/gallery.html.

(f) Advanced: use contour to plot contour lines of surfaces. You can use the example below.

In [5]:
X = linspace(-2, 3, 100)
Y = linspace(-1, 1, 40)
Z = [[exp(-y**2)*sin(2*x+y) for x in X] for y in Y]
contour(X, Y, Z, 30)
Out[5]:
<matplotlib.contour.QuadContourSet at 0x2140a2a9d50>
No description has been provided for this image

4. Functions¶

In the various lab sessions, we will often want to program generic optimization algorithms that can be applied to any function. Here’s how we define these functions using the example of the bisection method to find the zero of a function.

(a) Carefully read these examples, validate them, and understand them.

In [ ]:
square = 1/2

# Definition of a function (note the indentation)
def function1(x):
    return x**2 - square  # All variables defined outside the function are considered global
In [ ]:
# Functions can take other functions as arguments, and we can provide default values
def bisection(f, x0=0, x1=1, tolerance=1e-6):
    xg, xd = x0, x1
    sg, sd = sign(f(xg)), sign(f(xd))
    while (xd - xg) > tolerance:
        xm = (xg + xd) / 2.
        sm = sign(f(xm))
        if sm == sd:
            xd, sd = xm, sm
        else:
            xg, sg = xm, sm
    return xm
In [ ]:
xg  # Variables defined inside a function are local: observe the error message.
In [ ]:
y = bisection(sin, 3, 4)

# We can switch the order of arguments by writing the original argument name before its value
x = bisection(function1, tolerance=1e-9)

print(x, y)

square = 1/3  # If we modify a global variable, the function takes it into account.
x = bisection(function1)
print(x**2)

5. Introduction to Pandas¶

Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like Series (one-dimensional) and DataFrame (two-dimensional) that are used to manipulate structured data efficiently. Pandas is widely used in data analysis, cleaning, transformation, and visualization tasks.

Key Features of Pandas:¶

  1. Data Structures: Series and DataFrame.
  2. Data Alignment and Indexing: Align data easily with a powerful and flexible indexing system.
  3. Handling Missing Data: Functions to detect and fill missing values.
  4. Data Wrangling: Merge, join, concatenate, and reshape data.
  5. Input/Output: Read and write data from/to various formats like CSV, Excel, SQL, and more.
  6. Time Series: Convenient handling of date and time data.

Basic Usage:¶

  1. Importing Pandas:
In [7]:
import pandas as pd
  1. Creating a DataFrame:
In [10]:
data = {'Name': ['Mohamed', 'Emmanuel', 'Yury'],
        'Age': [43, 53, 35],
        'Salary': [5000, 60000, 700000000]}
df = pd.DataFrame(data)
print(df)
       Name  Age     Salary
0   Mohamed   43       5000
1  Emmanuel   53      60000
2      Yury   35  700000000
  1. Reading Data from a CSV File:
In [11]:
df = pd.read_csv('AAPL.csv')
print(df.head())  # Display the first 5 rows
         Date      Open      High       Low     Close  Adj Close     Volume
0  1980-12-12  0.128348  0.128906  0.128348  0.128348   0.100323  469033600
1  1980-12-15  0.122210  0.122210  0.121652  0.121652   0.095089  175884800
2  1980-12-16  0.113281  0.113281  0.112723  0.112723   0.088110  105728000
3  1980-12-17  0.115513  0.116071  0.115513  0.115513   0.090291   86441600
4  1980-12-18  0.118862  0.119420  0.118862  0.118862   0.092908   73449600
  1. Basic Operations:
In [13]:
print(df.describe())  # Summary statistics
print(df['High'].mean())  # Mean of the 'High' column
df['Volume'] += 1  # Increment the 'Volume' column by 1
               Open          High           Low         Close     Adj Close  \
count  10409.000000  10409.000000  10409.000000  10409.000000  10409.000000   
mean      13.959910     14.111936     13.809163     13.966757     13.350337   
std       30.169244     30.514878     29.835055     30.191696     29.911132   
min        0.049665      0.049665      0.049107      0.049107      0.038384   
25%        0.281964      0.287946      0.274554      0.281250      0.234799   
50%        0.468750      0.477679      0.459821      0.468750      0.386853   
75%       14.217857     14.364286     14.043571     14.206071     12.188149   
max      182.630005    182.940002    179.119995    182.009995    181.778397   

             Volume  
count  1.040900e+04  
mean   3.321778e+08  
std    3.393344e+08  
min    0.000000e+00  
25%    1.247604e+08  
50%    2.199680e+08  
75%    4.126108e+08  
max    7.421641e+09  
14.111935729945241
  1. Filtering Data:
In [14]:
df_filtered = df[df['Volume'] > df['Volume'].mean()]
print(df_filtered)
             Date        Open        High         Low       Close   Adj Close  \
0      1980-12-12    0.128348    0.128906    0.128348    0.128348    0.100323   
429    1982-08-25    0.077009    0.077567    0.077009    0.077009    0.060194   
532    1983-01-20    0.150112    0.166853    0.150112    0.166853    0.130420   
533    1983-01-21    0.166853    0.174107    0.165179    0.166853    0.130420   
538    1983-01-28    0.181920    0.187500    0.180804    0.183036    0.143070   
...           ...         ...         ...         ...         ...         ...   
9902   2020-03-23   57.020000   57.125000   53.152500   56.092499   55.332169   
9993   2020-07-31  102.885002  106.415001  100.824997  106.260002  105.103409   
10008  2020-08-21  119.262497  124.867500  119.250000  124.370003  123.238098   
10009  2020-08-24  128.697495  128.785004  123.937500  125.857498  124.712036   
10018  2020-09-04  120.070000  123.699997  110.889999  120.959999  119.859123   

          Volume  
0      469033601  
429    357078401  
532    707840001  
533    402595201  
538    397734401  
...          ...  
9902   336752801  
9993   374336801  
10008  338054801  
10009  345937601  
10018  332607201  

[3424 rows x 7 columns]

Practical Exercise in Finance¶

Exercise: Analyzing Stock Prices¶

Objective: Analyze historical stock prices to calculate returns and visualize trends.

Step-by-Step Instructions:

  1. Import Necessary Libraries:
In [ ]:
 
  1. Load Historical Stock Price Data: Download historical stock price data for a specific stock (e.g., Apple Inc.) from a source like Yahoo Finance, and save it as AAPL.csv.
In [ ]:
 
  1. Calculate Daily Returns:
In [ ]:
 
  1. Plot Closing Prices and Daily Returns:
In [ ]:
 
  1. Calculate and Plot Moving Averages:
In [ ]:
 
  1. Save the Modified DataFrame:
In [ ]: