'Adding row/column headers to NumPy arrays
I have a NumPy ndarray
to which I would like to add row/column headers.
The data is actually 7x12x12, but I can represent it like this:
A=[[[0, 1, 2, 3, 4, 5],
[1, 0, 3, 4, 5, 6],
[2, 3, 0, 5, 6, 7],
[3, 4, 5, 0, 7, 8],
[4, 5, 6, 7, 0, 9],
[5, 6, 7, 8, 9, 0]]
[[0, 1, 2, 3, 4, 5],
[1, 0, 3, 4, 5, 6],
[2, 3, 0, 5, 6, 7],
[3, 4, 5, 0, 7, 8],
[4, 5, 6, 7, 0, 9],
[5, 6, 7, 8, 9, 0]]]
where A is my 2x6x6 array.
How do I insert headers across the first row and the first column, so that each array looks like this in my CSV
output file?
A, a, b, c, d, e, f
a, 0, 1, 2, 3, 4, 5,
b, 1, 0, 3, 4, 5, 6,
c, 2, 3, 0, 5, 6, 7,
d, 3, 4, 5, 0, 7, 8,
e, 4, 5, 6, 7, 0, 9,
f, 5, 6, 7, 8, 9, 0
What I have done is made the array 7x13x13 and inserted the data such that I have a row and column of zeros, but I'd much prefer strings.
I guess I could just write an Excel macro to replace the zeros with strings. However, the problem is that NumPy cannot convert string
to float
, if I try to reassign those zeros as the strings I want.
Solution 1:[1]
Numpy will handle n-dimensional arrays fine, but many of the facilities are limited to 2-dimensional arrays. Not even sure how you want the output file to look.
Many people who would wish for named columns overlook the recarray() capabilities of numpy. Good stuff to know, but that only "names" one dimension.
For two dimensions, Pandas is very cool.
In [275]: DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])],
.....: orient='index', columns=['one', 'two', 'three'])
Out[275]:
one two three
A 1 2 3
B 4 5 6
If output is the only problem you are trying to solve here, I'd probably just stick with a few lines of hand coded magic as it will be less weighty than installing another package for one feature.
Solution 2:[2]
With pandas.DataFrame.to_csv
you can write the columns and the index to a file:
import numpy as np
import pandas as pd
A = np.random.randint(0, 10, size=36).reshape(6, 6)
names = [_ for _ in 'abcdef']
df = pd.DataFrame(A, index=names, columns=names)
df.to_csv('df.csv', index=True, header=True, sep=' ')
will give you the following df.csv
file:
a b c d e f
a 1 5 5 0 4 4
b 2 7 5 4 0 9
c 6 5 6 9 7 0
d 4 3 7 9 9 3
e 8 1 5 1 9 0
f 2 8 0 0 5 1
Solution 3:[3]
Think this does the trick generically
Input
mats = array([[[0, 1, 2, 3, 4, 5],
[1, 0, 3, 4, 5, 6],
[2, 3, 0, 5, 6, 7],
[3, 4, 5, 0, 7, 8],
[4, 5, 6, 7, 0, 9],
[5, 6, 7, 8, 9, 0]],
[[0, 1, 2, 3, 4, 5],
[1, 0, 3, 4, 5, 6],
[2, 3, 0, 5, 6, 7],
[3, 4, 5, 0, 7, 8],
[4, 5, 6, 7, 0, 9],
[5, 6, 7, 8, 9, 0]]])
Code
# Recursively makes pyramiding column and row headers
def make_head(n):
pre = ''
if n/26:
pre = make_head(n/26-1)
alph = "abcdefghijklmnopqrstuvwxyz"
pre+= alph[n%26]
return pre
# Generator object to create header items for n-rows or n-cols
def gen_header(nitems):
n = -1
while n<nitems:
n+=1
yield make_head(n)
# Convert numpy to list
lmats = mats.tolist()
# Loop through each "matrix"
for mat in lmats:
# Pre store number of columns as we modify it before working rows
ncols = len(mat[0])
# add header value to front of each row from generator object
for row,hd in zip(mat,gen_header(len(mat))):
row.insert(0,hd)
# Create a "header" line for all the columns
col_hd = [hd for hd in gen_header(ncols-1)]
col_hd.insert(0,"A")
# Insert header line into lead row of matrix
mat.insert(0,col_hd)
# Convert back to numpy
mats = numpy.array(lmats)
Output (value stored in mats):
array([[['A', 'a', 'b', 'c', 'd', 'e', 'f'],
['a', '0', '1', '2', '3', '4', '5'],
['b', '1', '0', '3', '4', '5', '6'],
['c', '2', '3', '0', '5', '6', '7'],
['d', '3', '4', '5', '0', '7', '8'],
['e', '4', '5', '6', '7', '0', '9'],
['f', '5', '6', '7', '8', '9', '0']],
[['A', 'a', 'b', 'c', 'd', 'e', 'f'],
['a', '0', '1', '2', '3', '4', '5'],
['b', '1', '0', '3', '4', '5', '6'],
['c', '2', '3', '0', '5', '6', '7'],
['d', '3', '4', '5', '0', '7', '8'],
['e', '4', '5', '6', '7', '0', '9'],
['f', '5', '6', '7', '8', '9', '0']]],
dtype='|S4')
Solution 4:[4]
I am not aware of any method to add headers to the matrix (even though I would find it useful). What I would do is to create a small class that prints the object for me, overloading the __str__
function.
Something like this:
class myMat:
def __init__(self, mat, name):
self.mat = mat
self.name = name
self.head = ['a','b','c','d','e','f']
self.sep = ','
def __str__(self):
s = "%s%s"%(self.name,self.sep)
for x in self.head:
s += "%s%s"%(x,self.sep)
s = s[:-len(self.sep)] + '\n'
for i in range(len(self.mat)):
row = self.mat[i]
s += "%s%s"%(self.head[i],self.sep)
for x in row:
s += "%s%s"%(str(x),self.sep)
s += '\n'
s = s[:-len(self.sep)-len('\n')]
return s
Then you could just easily print them with the headers, using the following code:
print myMat(A,'A')
print myMat(B,'B')
Solution 5:[5]
Not really sure, but you may consider having a look at Pandas.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | LCC |
Solution 2 | bmu |
Solution 3 | |
Solution 4 | Oriol Nieto |
Solution 5 | Davide |