在Python中稀疏的3dmatrix/数组？

在scipy中，我们可以使用scipy.sparse.lil_matrix（）等构造一个稀疏matrix，但matrix在2d中。

我想知道在Python中是否有稀疏的3Dmatrix/数组（张量）的现有数据结构？

ps我有很多稀疏的3D数据，需要张量来存储/执行乘法。如果没有现有的数据结构，有什么build议来实现这样一个张量？

很高兴能提出一个（可能是显而易见的）这个实现，如果你有足够的时间和空间来处理新的依赖关系，那么可以用纯Python或C / Cython来实现，而且需要更快。

N维中的稀疏matrix可以假设大多数元素是空的，所以我们使用一个字典在元组上键入：

class NDSparseMatrix: def __init__(self): self.elements = {} def addValue(self, tuple, value): self.elements[tuple] = value def readValue(self, tuple): try: value = self.elements[tuple] except KeyError: # could also be 0.0 if using floats... value = 0 return value

你会像这样使用它：

 sparse = NDSparseMatrix() sparse.addValue((1,2,3), 15.7) should_be_zero = sparse.readValue((1,5,13))

通过validationinput实际上是一个元组，并且它只包含整数，你可以使这个实现更健壮，但是这只会减慢速度，所以我不会担心，除非你稍后将代码发布到世界上。

编辑 – matrix乘法问题的Cython实现，假设其他张量是N维NumPy数组（ numpy.ndarray ）可能看起来像这样：

 #cython: boundscheck=False #cython: wraparound=False cimport numpy as np def sparse_mult(object sparse, np.ndarray[double, ndim=3] u): cdef unsigned int i, j, k out = np.ndarray(shape=(u.shape[0],u.shape[1],u.shape[2]), dtype=double) for i in xrange(1,u.shape[0]-1): for j in xrange(1, u.shape[1]-1): for k in xrange(1, u.shape[2]-1): # note, here you must define your own rank-3 multiplication rule, which # is, in general, nontrivial, especially if LxMxN tensor... # loop over a dummy variable (or two) and perform some summation: out[i,j,k] = u[i,j,k] * sparse((i,j,k)) return out

虽然你总是需要手动处理这个问题，因为（正如代码注释中所提到的），你需要定义你正在求和的索引，并且要注意数组的长度或者事情将不起作用！

编辑2 – 如果其他matrix也稀疏，那么你不需要做三路循环：

 def sparse_mult(sparse, other_sparse): out = NDSparseMatrix() for key, value in sparse.elements.items(): i, j, k = key # note, here you must define your own rank-3 multiplication rule, which # is, in general, nontrivial, especially if LxMxN tensor... # loop over a dummy variable (or two) and perform some summation # (example indices shown): out.addValue(key) = out.readValue(key) + other_sparse.readValue((i,j,k+1)) * sparse((i-3,j,k)) return out

我对C实现的build议是使用一个简单的结构来保存索引和值：

 typedef struct { int index[3]; float value; } entry_t;

那么你需要一些函数来分配和维护一个这样的结构的dynamic数组，并且尽可能快的search它们。但是您应该在考虑性能之前testingPython实现，然后再担心这些问题。

看一下Python中的sparray – 稀疏n维数组（由Jan Erik Solem撰写）。也可以在github上。

比写新的东西更好，可能是尽可能使用scipy的稀疏模块。这可能会导致（很多）更好的性能。我有一个类似的问题，但我只需要有效地访问数据，而不是执行任何操作。而且，我的数据在三个维度中只有两个是稀疏的。

我已经写了一门课来解决我的问题，并且可以（就我看来）轻易地扩展到满足OP的需求。不过，它可能仍然有一些改进的潜力。

 import scipy.sparse as sp import numpy as np class Sparse3D(): """ Class to store and access 3 dimensional sparse matrices efficiently """ def __init__(self, *sparseMatrices): """ Constructor Takes a stack of sparse 2D matrices with the same dimensions """ self.data = sp.vstack(sparseMatrices, "dok") self.shape = (len(sparseMatrices), *sparseMatrices[0].shape) self._dim1_jump = np.arange(0, self.shape[1]*self.shape[0], self.shape[1]) self._dim1 = np.arange(self.shape[0]) self._dim2 = np.arange(self.shape[1]) def __getitem__(self, pos): if not type(pos) == tuple: if not hasattr(pos, "__iter__") and not type(pos) == slice: return self.data[self._dim1_jump[pos] + self._dim2] else: return Sparse3D(*(self[self._dim1[i]] for i in self._dim1[pos])) elif len(pos) > 3: raise IndexError("too many indices for array") else: if (not hasattr(pos[0], "__iter__") and not type(pos[0]) == slice or not hasattr(pos[1], "__iter__") and not type(pos[1]) == slice): if len(pos) == 2: result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]]] else: result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]], pos[2]].T if hasattr(pos[2], "__iter__") or type(pos[2]) == slice: result = result.T return result else: if len(pos) == 2: return Sparse3D(*(self[i, self._dim2[pos[1]]] for i in self._dim1[pos[0]])) else: if not hasattr(pos[2], "__iter__") and not type(pos[2]) == slice: return sp.vstack([self[self._dim1[pos[0]], i, pos[2]] for i in self._dim2[pos[1]]]).T else: return Sparse3D(*(self[i, self._dim2[pos[1]], pos[2]] for i in self._dim1[pos[0]])) def toarray(self): return np.array([self[i].toarray() for i in range(self.shape[0])])

在Python中稀疏的3dmatrix/数组？

漂亮的打印2D Python列表

使用scipy计算matrix排名

如何在一列中对MATLAB中的二维数组进行sorting？

平滑SVGmatrix在Inkscape中转换

数组中最大值和最小值的索引

NumPy Matrix与Array类的乘法有何不同？

数据types不明白

Java中的稀疏matrix/数组

C ++中的graphics问题有更好的邻接表或邻接matrix吗？

转置/解压缩函数（zip的反转）？