NumPy: The Power of Array-Based Data Processing

NumPy: The Power of Array-Based Data Processing

NumPy, the Numerical Python package, revolutionizes the way we approach data processing tasks by providing an efficient and expressive array-based framework. In this article, we will delve into the world of NumPy arrays and explore their numerous benefits, including faster execution, vectorized operations, and a wide range of built-in functions.

Vectorization: The Key to Speed

NumPy arrays enable us to perform complex data processing tasks using simple array expressions, eliminating the need for explicit loops. This vectorization approach leads to significant performance gains, with operations executing one or two orders of magnitude faster than their pure Python counterparts.

Conditional Logic with NumPy

When dealing with conditional logic, we can leverage NumPy’s array operations to achieve efficient and concise solutions. For instance, the np.where() function allows us to select elements from an array based on a conditional expression.

import numpy as np

x_arr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
y_arr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

result = [(x if c else y) for x, y, c in zip(x_arr, y_arr, cond)]
print(result)

print(np.where(cond, x_arr, y_arr))

Statistical Functions

NumPy provides a comprehensive set of statistical functions, including mean(), sum(), and std(), which can be used to analyze and manipulate arrays.

import numpy as np
import numpy.random as np_random

arr = np.random.randn(5, 4)
print(arr)
print(arr.mean())
print(arr.sum())
print(arr.mean(axis=1))  # averaging elements of each row
print(arr.sum(0))  # summing elements of each column

Cumulative Sum and Product

The cumsum() and cumprod() functions can be used to compute the cumulative sum and product of arrays along a specified axis.

import numpy as np

arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(arr.cumsum(0))
print(arr.cumsum(1))
print(arr.cumprod(0))
print(arr.cumprod(1))

Boolean Array Operations

NumPy’s Boolean array operations enable us to perform element-wise logical operations on arrays.

import numpy as np
import numpy.random as np_random

arr = np_random.randn(100)
print(arr)
print((arr > 0).sum())
print((arr > 0).any())  # True if any element is True
print((arr > 0).all())  # True if all elements are True

Sorting

The sort() function can be used to sort arrays along a specified axis.

import numpy as np
import numpy.random as np_random

arr = np_random.randn(8)
print(arr)
arr.sort()
print(arr)

arr = np_random.randn(5, 3)
print(arr)
arr.sort(1)  # sorting along the second axis
print(arr)

Deduplication

The unique() function can be used to remove duplicate elements from an array.

import numpy as np
import numpy.random as np_random

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
print(sorted(set(names)))
print(np.unique(names))

ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
print(np.unique(ints))

Array Boolean Operations

NumPy’s Boolean array operations enable us to perform element-wise logical operations on arrays.

import numpy as np
import numpy.random as np_random

values = np.array([6, 0, 0, 3, 2, 5, 6])
print(np.in1d(values, [2, 3, 6]))

Saving and Loading Arrays

NumPy provides functions to save and load arrays in various formats, including binary format.

import numpy as np

arr = np.arange(10)
np.save('some_array', arr)
print(np.load('some_array.npy'))

np.savez('array_archive.npz', a=arr, b=arr)
arch = np.load('array_archive.npz')
print(arch['b'])

arr = np.loadtxt('array_ex.txt', delimiter=',')
print(arr)

Linear Algebra

NumPy provides a comprehensive set of linear algebra functions, including matrix multiplication, inversion, and QR decomposition.

import numpy as np
from numpy.linalg import inv, qr

x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
print(x.dot(y))
print(np.dot(x, np.ones(3)))

x = np.random.randn(5, 5)
print('matrix inversion')
mat = x.T.dot(x)
print(inv(mat))  # matrix inversion
print(mat.dot(inv(mat)))  # inverse matrix multiplication

q, r = qr(mat)
print(q)
print(r)

Merging and Splitting Arrays

NumPy provides functions to merge and split arrays, including concatenate(), vstack(), hstack(), and split().

import numpy as np
import numpy.random as np_random

arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
print(np.concatenate([arr1, arr2], axis=0))  # rows connected
print(np.concatenate([arr1, arr2], axis=1))  # columns connected

print(np.vstack((arr1, arr2)))  # stacked vertically
print(np.hstack((arr1, arr2)))  # horizontally stacked

arr = np.random.randn(5, 5)
print(arr)
print('horizontal split')
first, second, third = np.split(arr, [1, 3], axis=0)
print('first')
print(first)
print('second')
print(second)
print('third')
print(third)

print('vertical split')
first, second, third = np.split(arr, [1, 3], axis=1)
print('first')
print(first)
print('second')
print(second)
print('third')
print(third)