Numpy 两个矩阵部分维度相乘,有没有很快的方法? - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
xinhangliu
V2EX    Python

Numpy 两个矩阵部分维度相乘,有没有很快的方法?

  •  
  •   xinhangliu 2018-09-27 19:49:39 +08:00 5397 次点击
    这是一个创建于 2571 天前的主题,其中的信息可能已经有所发展或是发生改变。

    例如,有两个三维矩阵 m1,m2,shape 都是 (10000, 2, 2)

    m1 = array([x1, x2, ..., x9999]) m2 = array([y1, y2, ..., y9999]) 

    xn, yn 都是 2x2 的矩阵。

    希望得到的结果是:

    res = array([dot(x1, y1), dot(x2, y2), ..., dot(x9999, y9999)]) 

    目前我试过最快的方法是 map(循环、列表推导都试过,numpy.vectorize 没成功),但感觉还是不够快。各位有没有更好的方法?

    13 条回复    2018-09-28 07:42:31 +08:00
    JeffKing
        1
    JeffKing  
       2018-09-27 20:02:51 +08:00 via iPhone   1
    多进程试试
    Philippa
        2
    Philippa  
       2018-09-27 20:24:47 +08:00 via iPhone   1
    Python 已经很快了底层做了很多优化,不行把矩阵拆开多进程大批量的情况下可以加快最后再组合回来。再不行试一下用 cpp。
    justou
        3
    justou  
       2018-09-27 20:39:41 +08:00   1
    放到 cython 里面用 prange 多线程算
    Alan1312
        4
    Alan1312  
       2018-09-27 20:42:18 +08:00 via iPhone   1
    https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl

    试试这个,再不行就只能用 Matlab 了,不要尝试手写 cpp,几乎不可能比 MKL 更快的了
    GenkunAbe
        5
    GenkunAbe  
       2018-09-27 22:13:50 +08:00   1
    不知道是不是我理解的有问题,numpy.matmul 是可以实现多个矩阵对应元素相乘的。

    按照我对 lz 的描述的理解,直接 np.matmul(m1, m2) 就可以达到目的。
    wizardforcel
        6
    wizardforcel  
       2018-09-27 22:25:04 +08:00 via Android   1
    https://docs.scipy.org/doc/numpy/reference/generated/numpy.matmul.html

    If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

    意思是,如果输入大于 2d,会当做矩阵的数组 /矩阵...

    所以直接乘就行了
    nooper
        7
    nooper  
       2018-09-27 22:25:55 +08:00 via iPad   1
    有的有的,直接选择矩阵特定的切片元素相乘就好因为直接是矩阵计算不需要循环,性能可以的
    necomancer
        8
    necomancer  
       2018-09-27 22:31:48 +08:00   1
    matmul(...)
    matmul(a, b, out=None)

    Matrix product of two arrays.

    The behavior depends on the arguments in the following way.

    - If both arguments are 2-D they are multiplied like conventional
    matrices.
    - If either argument is N-D, N > 2, it is treated as a stack of
    matrices residing in the last two indexes and broadcast accordingly.
    ...

    所以 np.matmul(m1, m2) 就行了。

    In [1]: a = np.random.random((100000,2,2))

    In [2]: b = np.random.random((100000,2,2))

    In [3]: c = np.matmul(a, b)

    In [4]: d = np.array([np.dot(i,j) for i,j in zip(a,b)])

    In [5]: np.allclose(c,d)
    Out[5]: True

    In [6]: %timeit np.matmul(a,b)
    41.1 ms ± 81.6 s per loop (mean ± std. dev. of 7 runs, 10 loops each)

    In [7]: %timeit np.array([np.dot(i,j) for i, j in zip(a,b)])
    231 ms ± 5.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    necomancer
      &nbp; 9
    necomancer  
       2018-09-27 23:53:12 +08:00   1
    加上 vectorize 的:

    In [28]: def mydot(a, b):
    ...: return np.dot(a,b)
    ...:
    ...:

    In [29]:

    In [29]: vmydot = np.vectorize(mydot, signature='(n,m),(m,n)->(n,n)')

    In [30]: e = vmydot(a, b)

    In [31]: np.allclose(d,e)
    Out[31]: True

    In [32]: %timeit vmydot(a,b)
    467 ms ± 4.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    xinhangliu
        10
    xinhangliu  
    OP
       2018-09-27 23:54:26 +08:00 via Android
    @necomancer 非常感谢!看来我还是学艺不精
    necomancer
        11
    necomancer  
       2018-09-28 00:31:34 +08:00
    不客气。我很久没鼓捣过这玩意儿了,我记得 numba 的各路 jit 处理这个问题是很嗨的,你要是经常遇到这种问题,去看看 numba 吧,有 GPU 加速更开心 ^_^

    In [1]: from numba import guvectorize, float64

    In [2]: a = np.random.random((100000,2,2))

    In [3]: b = np.random.random((100000,2,2))

    In [4]: c = np.matmul(a, b)

    In [5]: d = np.array([np.dot(i,j) for i,j in zip(a,b)])

    In [6]: @guvectorize([(float64[:,:],float64[:,:], float64[:,:])], '(n,m),(m,n)->(n,n)', target='parallel') #target='cpu','gpu'
    ...: def mydot(a,b,res):
    ...: for i in range(res.shape[0]):
    ...: for j in range(res.shape[1]):
    ...: tmp = 0.
    ...: for k in range(a.shape[1]):
    ...: tmp += a[i, k] * b[k, j]
    ...: res[i, j] = tmp
    ...:

    In [7]: e = mydot(a, b)

    In [8]: np.allclose(c,e)
    Out[8]: True

    In [9]: np.allclose(c,d)
    Out[9]: True

    In [10]: %timeit mydot(a,b)
    234 s ± 4.02 s per loop (mean ± std. dev. of 7 runs, 1000 loops each)

    In [11]: %timeit np.array(list(map(np.dot, a, b)))
    210 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    In [12]: %timeit np.array([np.dot(i,j) for i, j in zip(a,b)])
    235 ms ± 5.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    In [13]: %timeit np.matmul(a,b)
    41.1 ms ± 90 s per loop (mean ± std. dev. of 7 runs, 10 loops each)
    zhucegeqiu
        12
    zhucegeqiu  
       2018-09-28 07:42:05 +08:00 via iPhone   1
    no.tensordot
    zhucegeqiu
        13
    zhucegeqiu  
       2018-09-28 07:42:31 +08:00 via iPhone
    @zhucegeqiu 打错,np.tensordot
    关于     助文档     自助推广系统     博客     API     FAQ     Solana     2732 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 31ms UTC 09:14 PVG 17:14 LAX 02:14 JFK 05:14
    Do have faith in what you're doing.
    ubao snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86