2.6. Array Import¶
2.6.1. SetUp¶
>>> import numpy as np
2.6.2. np.loadtxt()¶
>>> DATA = 'https://python.astrotech.io/_static/iris.csv'
>>> a = np.loadtxt(DATA)
Traceback (most recent call last):
ValueError: could not convert string 'sepal_length,sepal_width,petal_length,petal_width,species' to float64 at row 0, column 1.
>>> a = np.loadtxt(DATA, skiprows=1)
Traceback (most recent call last):
ValueError: could not convert string '5.4,3.9,1.3,0.4,setosa' to float64 at row 0, column 1.
>>> a = np.loadtxt(DATA, skiprows=1, delimiter=',')
Traceback (most recent call last):
ValueError: could not convert string 'setosa' to float64 at row 0, column 5.
>>> a = np.loadtxt(DATA, skiprows=1, delimiter=',', max_rows=5, usecols=(0,1,2,3))
>>> a
array([[5.4, 3.9, 1.3, 0.4],
[5.9, 3. , 5.1, 1.8],
[6. , 3.4, 4.5, 1.6],
[7.3, 2.9, 6.3, 1.8],
[5.6, 2.5, 3.9, 1.1]])
>>> header = np.loadtxt(DATA, max_rows=1, delimiter=',', dtype=str, usecols=(0,1,2,3))
>>> data = np.loadtxt(DATA, skiprows=1, max_rows=3, delimiter=',', usecols=(0,1,2,3))
>>>
>>> header
array(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='<U12')
>>>
>>> data
array([[5.4, 3.9, 1.3, 0.4],
[5.9, 3. , 5.1, 1.8],
[6. , 3.4, 4.5, 1.6]])
2.6.3. Other¶
Method |
Data Type |
Description |
---|---|---|
|
Text |
Load data from text file such as |
|
Binary |
Load data from |
|
Binary |
Load binary data from |
|
Text |
Load data from string |
|
Text |
Load data from file using regex to parse |
|
Text |
Load data with missing values handled as specified |
|
Binary |
reads MATLAB data files |
>>>
... data = np.loadtxt('myfile.csv', delimiter=',', usecols=1, skiprows=1, dtype=np.float16)
...
... small = (data < 1)
... medium = (data < 1) & (data < 2.0)
... large = (data < 2)
...
... np.save('/tmp/small', data[small])
... np.save('/tmp/medium', data[medium])
... np.save('/tmp/large', data[large])
2.6.4. Assignments¶
"""
* Assignment: Numpy Loadtext
* Complexity: easy
* Lines of code: 4 lines
* Time: 5 min
English:
1. Load text from `URL`
2. From the first line select Iris species names and save as str to `species: np.ndarray`
3. For other lines:
a. Read columns with data and save as float to `features: np.ndarray`
b. Read last column with species numbers and save as `int` to `labels: np.ndarray`
4. Run doctests - all must succeed
Polish:
1. Wczytaj tekst z `URL`
2. Z pierwszej linii wybierz nazwy gatunków Irysów i zapisz rezultat jako str do `species: np.ndarray`
3. W pozostałych linii:
a Wczytaj kolumny z danymi i zapisz jako float do `features: np.ndarray`
b Wczytaj ostatnią kolumnę z numerami gatunków i zapisz jako `int` do `labels: np.ndarray`
4. Uruchom doctesty - wszystkie muszą się powieść
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert species is not Ellipsis, \
'Assign result to variable: `species`'
>>> assert type(species) is np.ndarray, \
'Variable `species` has invalid type, expected: np.ndarray'
>>> assert features is not Ellipsis, \
'Assign result to variable: `features`'
>>> assert type(features) is np.ndarray, \
'Variable `features` has invalid type, expected: np.ndarray'
>>> assert labels is not Ellipsis, \
'Assign result to variable: `labels`'
>>> assert type(labels) is np.ndarray, \
'Variable `labels` has invalid type, expected: np.ndarray'
>>> species
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
>>> len(features)
151
>>> features[:3]
array([[5.4, 3.9, 1.3, 0.4],
[5.9, 3. , 5.1, 1.8],
[6. , 3.4, 4.5, 1.6]])
>>> features[-3:]
array([[4.9, 2.5, 4.5, 1.7],
[6.3, 2.8, 5.1, 1.5],
[6.8, 3.2, 5.9, 2.3]])
>>> labels
array([0, 2, 1, 2, 1, 0, 1, 1, 0, 2, 2, 0, 0, 2, 2, 1, 2, 2, 2, 1, 0, 1,
1, 0, 0, 0, 2, 2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2, 1, 1, 1, 2, 2,
0, 1, 1, 1, 1, 1, 2, 0, 2, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 2, 0, 0,
0, 0, 0, 0, 1, 0, 2, 0, 0, 1, 1, 2, 2, 1, 0, 2, 1, 0, 1, 0, 2, 1,
0, 2, 0, 2, 1, 0, 2, 1, 1, 0, 0, 1, 2, 2, 2, 1, 0, 1, 1, 1, 2, 2,
0, 2, 2, 0, 2, 1, 2, 0, 0, 1, 0, 2, 0, 2, 1, 2, 2, 2, 1, 0, 2, 1,
0, 0, 2, 0, 2, 1, 1, 1, 0, 1, 1, 2, 0, 1, 1, 0, 2, 2, 2])
"""
import numpy as np
DATA = 'https://python.astrotech.io/_static/iris-dirty.csv'
species = ...
features = ...
labels = ...