numpy
Tabular data with pandas
- next time?
Could use
But what if want to see intermediary/final results that cannot easily be appreciated as text? e.g.
^
| x
y | x x YUK ('~')
| x
|________>
x
Why not use browser as a REPL?
Browsers can present bitmap & vector images, tables, maths (using MathJax), video, audio...
Work interactively with Python (or R/Julia/Octave/JS/...) in the browser!
Some LaTeX maths, rendered in a **Markdown** cell:
$y(x) = x^3 + \frac(x^2,3) - 6 \cdot x + 4$
Some LaTeX maths, rendered in a Markdown cell:
$y(x) = x^3 + \frac{x^2}{3} - 6 \cdot x + 4$
import numpy as np
def f(a, b):
return 20 - ((a**2 - (10 * np.cos(2 * np.pi * a))) +
(b**2 - (10 * np.cos(2 * np.pi * b))))
f(3, 4)
f(5, 6)
-21.0
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
x = np.linspace(-1, 1, 1000)
y = -x**5 - 4 * x**3
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(x, y)
[<matplotlib.lines.Line2D at 0x7ff52a44af28>]
Special _repr_*_
methods:
class Circle(object):
def __init__(self, radius, color_str):
if radius > 50:
raise ValueError('Must have 0 < radius <= 50')
else:
self.radius = radius
self.color = color_str
def _repr_html_(self):
return """<svg height="100" width="100">
<circle cx="50" cy="50" r="{}" fill="{}" />
Sorry, your browser does not support inline SVG.
</svg>""".format(self.radius, self.color)
Circle(31.2, 'red')
from IPython.display import display
display(Circle(15, 'blue'))
display(Circle(31.2, 'green'))
Many projects now including _repr_html_
and _repr_latex_
methods for Ipython notebook:
sympy
for analytical mathspandas
for manipulating tabular datamatplotlib
/bokeh
/seaborn
for plottingipynb
filenbviewer.ipython.org
as static HTMLnbconvert
utility (generate HTML, PDF or LaTeX)reveal.js
)ipyparallel
Much data in engineering/science/economics is array / tabular / matrix
How best to access/manipulate such data interactively?
import numpy as np
a = np.array([[0, 1], [1, 0]])
a
array([[0, 1], [1, 0]])
b = np.linspace(0.3, 8.9, 10)
b
array([ 0.3 , 1.25555556, 2.21111111, 3.16666667, 4.12222222, 5.07777778, 6.03333333, 6.98888889, 7.94444444, 8.9 ])
a.dtype, b.dtype
(dtype('int64'), dtype('float64'))
a.shape, b.shape
((2, 2), (10,))
x_max = 10000
x1 = list(range(x_max))
x2 = np.arange(x_max)
%timeit [x + 1 for x in x1]
%timeit x2 + 1
1000 loops, best of 3: 746 µs per loop The slowest run took 23.73 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 6.07 µs per loop
Note that many standard operators (arithmetic, comparison, element access, etc) allow for array/slice (view)-level operations.
Fast as implemented in C.
distances = np.linspace(0, 1774, 50)
distances.shape, distances.max()
((50,), 1774.0)
concentrations = np.random.normal(5, 3, len(distances))
concentrations.shape, concentrations.min()
((50,), 0.29912674162937236)
Ensure all concentrations are strictly positive:
concentrations[concentrations < 0] = 0
Find the median concentration over all but the first and last 100 units of distance:
np.median(concentrations[(distances > 100 ) & (distances <= distances[-1] - 100)])
6.1252944385858328
A word of warning:
base
array numpy
docs for rules numpy
¶Thanks for listening