NumPy: The Power of Array-Based Data Processing
NumPy, the Numerical Python package, revolutionizes the way we approach data processing tasks by providing an efficient and expressive array-based framework. In this article, we will delve into the world of NumPy arrays and explore their numerous benefits, including faster execution, vectorized operations, and a wide range of built-in functions.
Vectorization: The Key to Speed
NumPy arrays enable us to perform complex data processing tasks using simple array expressions, eliminating the need for explicit loops. This vectorization approach leads to significant performance gains, with operations executing one or two orders of magnitude faster than their pure Python counterparts.
Conditional Logic with NumPy
When dealing with conditional logic, we can leverage NumPy’s array operations to achieve efficient and concise solutions. For instance, the np.where() function allows us to select elements from an array based on a conditional expression.
import numpy as np
x_arr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
y_arr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])
result = [(x if c else y) for x, y, c in zip(x_arr, y_arr, cond)]
print(result)
print(np.where(cond, x_arr, y_arr))
Statistical Functions
NumPy provides a comprehensive set of statistical functions, including mean(), sum(), and std(), which can be used to analyze and manipulate arrays.
import numpy as np
import numpy.random as np_random
arr = np.random.randn(5, 4)
print(arr)
print(arr.mean())
print(arr.sum())
print(arr.mean(axis=1)) # averaging elements of each row
print(arr.sum(0)) # summing elements of each column
Cumulative Sum and Product
The cumsum() and cumprod() functions can be used to compute the cumulative sum and product of arrays along a specified axis.
import numpy as np
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(arr.cumsum(0))
print(arr.cumsum(1))
print(arr.cumprod(0))
print(arr.cumprod(1))
Boolean Array Operations
NumPy’s Boolean array operations enable us to perform element-wise logical operations on arrays.
import numpy as np
import numpy.random as np_random
arr = np_random.randn(100)
print(arr)
print((arr > 0).sum())
print((arr > 0).any()) # True if any element is True
print((arr > 0).all()) # True if all elements are True
Sorting
The sort() function can be used to sort arrays along a specified axis.
import numpy as np
import numpy.random as np_random
arr = np_random.randn(8)
print(arr)
arr.sort()
print(arr)
arr = np_random.randn(5, 3)
print(arr)
arr.sort(1) # sorting along the second axis
print(arr)
Deduplication
The unique() function can be used to remove duplicate elements from an array.
import numpy as np
import numpy.random as np_random
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
print(sorted(set(names)))
print(np.unique(names))
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
print(np.unique(ints))
Array Boolean Operations
NumPy’s Boolean array operations enable us to perform element-wise logical operations on arrays.
import numpy as np
import numpy.random as np_random
values = np.array([6, 0, 0, 3, 2, 5, 6])
print(np.in1d(values, [2, 3, 6]))
Saving and Loading Arrays
NumPy provides functions to save and load arrays in various formats, including binary format.
import numpy as np
arr = np.arange(10)
np.save('some_array', arr)
print(np.load('some_array.npy'))
np.savez('array_archive.npz', a=arr, b=arr)
arch = np.load('array_archive.npz')
print(arch['b'])
arr = np.loadtxt('array_ex.txt', delimiter=',')
print(arr)
Linear Algebra
NumPy provides a comprehensive set of linear algebra functions, including matrix multiplication, inversion, and QR decomposition.
import numpy as np
from numpy.linalg import inv, qr
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
print(x.dot(y))
print(np.dot(x, np.ones(3)))
x = np.random.randn(5, 5)
print('matrix inversion')
mat = x.T.dot(x)
print(inv(mat)) # matrix inversion
print(mat.dot(inv(mat))) # inverse matrix multiplication
q, r = qr(mat)
print(q)
print(r)
Merging and Splitting Arrays
NumPy provides functions to merge and split arrays, including concatenate(), vstack(), hstack(), and split().
import numpy as np
import numpy.random as np_random
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
print(np.concatenate([arr1, arr2], axis=0)) # rows connected
print(np.concatenate([arr1, arr2], axis=1)) # columns connected
print(np.vstack((arr1, arr2))) # stacked vertically
print(np.hstack((arr1, arr2))) # horizontally stacked
arr = np.random.randn(5, 5)
print(arr)
print('horizontal split')
first, second, third = np.split(arr, [1, 3], axis=0)
print('first')
print(first)
print('second')
print(second)
print('third')
print(third)
print('vertical split')
first, second, third = np.split(arr, [1, 3], axis=1)
print('first')
print(first)
print('second')
print(second)
print('third')
print(third)