Computational Graph

  • patterns in backward flow
    • add gate: gradient distributor
    • max gate: gradient router
    • mul gate: gradient switcher
    • branches: sum up gradients
  • pros
    • intuitive interpretation of gradient
    • easily define new nodes using forward/backward pattern (i.e. only these two functions must be implemented)
    • any complex learning architecture can be composed of atomic nodes (node composition or factorization)
    • no need to compute manually complex gradients
    • loss function can be seen as extra nodes in the end of the graph