Some applications in computational physics involve solving a "variational" probl...

ChrisRackauckas · on July 21, 2020

Yes, I think this is a great use for neural networks since they are effectively high dimensional function approximators, and something like Schrondinger's equation is a PDE where the number of dimensions is the number of observables so it can get very high dimensional very fast. Classical methods don't necessarily scale that well in high dimensions (curse of dimensionality: cost is exponential in dimensions), but using neural networks does very well. This gives rise to the physics-informed neural network and deep backwards stochastic differential equation approaches which will likely be driving a lot of future HPC applications in a way that blends physical equations with neural network approaches. We recently released a library, NeuralPDE [1], which utilizes a lot of these approaches to solve what were traditionally difficult equations in an automated form. I think the future is bright for scientific machine learning!

[1] https://neuralpde.sciml.ai/dev/

wenc · on July 21, 2020

This is fascinating. ELI5: how does this work? (I'm couldn't find references on the linked site)

Let's say I supply a high-dimensional DAE, f(x', x, z) = 0, x(0) = x₀, where classical methods like quadrature are unwieldy. Does the algorithm generate n samples in the solution space by integrating n times and then fitting an NN? With different initial conditions? Or does it perform quadrature with NNs instead of polynomial basis functions?

ChrisRackauckas · on July 21, 2020

A lot of these methods here utilize the universal differential equation framework described here: https://arxiv.org/abs/2001.04385 . Specifically, the last example in this preprint describes how high dimensional parabolic PDEs can be solving using neural networks inside of a specific SDE (derivation in the supplemental). Discrete physics-informed neural networks also are a subset of this methodology.

The other subset of methods, continuous physics-informed neural networks, are described in https://www.sciencedirect.com/science/article/pii/S002199911... .

For a very basic introduction, I wrote some lecture notes on how this is done for a simple ODE with code examples: https://mitmath.github.io/18S096SciML/lecture2/ml

jedbrown · on July 21, 2020

These methods are really interesting for high-dimensional PDE (like HJB), but there's a ton of skepticism about the applicability of NN models for solving the more common PDE that arise in physical sciences and engineering.

The tests are rarely equivalent, in that standard PDE technology can move to new domains, boundary conditions, materials, etc., without new training phases. If one needs to solve many nearby problems, there are many established techniques for leveraging that similarity. There is active research on ML to refine these techniques, but it isn't a silver bullet.

Far more exciting, IMO, is to use known methods for representing (reference-frame invariant and entropy-compatible) constitutive relations while training their form from observations of the PDE, and to do so using multiscale modeling in which a fine-scale simulation (e.g., atomistic or grain-resolving for granular/composite media) is used to train/support multiscale constitutive relations. In this approach, the PDEs are still solved by "standard" methods such as finite element or finite volume, and thus can be designed with desired accuracy and exact conservation/compatibility properties and generalize immediately to new domains/boundary conditions, but the trained constitutive models are better able to represent real materials.

A good overview paper on ML in the context of multiscale modeling: https://arxiv.org/pdf/2006.02619.pdf

ChrisRackauckas · on July 21, 2020

Yes, and our recent work https://arxiv.org/abs/2001.04385 gives a fairly general form for how to mix known scientific structural knowledge directly with machine learning. In fact, some of these PDE solvers are just instantiations of specific choices of universal differential equations. I agree that in many cases the "fully uninformed" physics-informed neural network won't work well, but we need to fully optimize a library with all of the training techniques possible in order to prove that, which is what we plan to do. In the end, I think PINNs will be most applicable to (1) non-local PDEs where classical methods have not fared well, so things like fractional differential equations, and (2) very high dimensional PDEs, like 100's of dimensions, but paired with constraints on the architecture to preserve physical quantities and relationships. But of course, something like a fractional differential equation is not an example for the first pages of tutorials since they are quite niche equations to solve!

jedbrown · on July 21, 2020

You've got a lot of broken references (??) in that preprint, BTW.

I think I understand why you're putting in the learned derivative operator, but I think it's rarely desirable. Computing derivatives with compatibility properties is a well-studied domain (e.g., finite element exterior calculus), as is tensor invariance theory (e.g., Zheng 1994, though this subject is sorely in need of a modern software-centric review). When the exact theory is known and readily computable, it's hard to see science/engineering value in "learned" surrogates that merely approximate the symmetries.

More generally, it is disheartening to see trends that would conflate discretization errors with modeling errors, lest it bring back the chaos of early turbulence modeling days that prompted this 1986 Editorial Policy Statement for the Journal of Fluids Engineering. https://jedbrown.org/files/RoacheGhiaWhite-JFEEditorialPolic...

ChrisRackauckas · on July 21, 2020

>When the exact theory is known and readily computable, it's hard to see science/engineering value in "learned" surrogates that merely approximate the symmetries.

I completely agree, which is why the approach I am taking is to only utilize surrogates to think which are unknown or do not have an exact theory. I don't think surrogates will be more efficient than methods developed that exploit specific properties of the problem. In fact, I think the recent proof of convergence for PINNs simultaneously demonstrates this might be an issue (there was no upper bound to the proved convergence rate, but the one they could prove was low order).

>More generally, it is disheartening to see trends that would conflate discretization errors with modeling errors, lest it bring back the chaos of early turbulence modeling days that prompted this 1986 Editorial Policy Statement for the Journal of Fluids Engineering. https://jedbrown.org/files/RoacheGhiaWhite-JFEEditorialPolic....

Agree, this is a difficult issue with approaches that augment numerical approaches with data-driven components. There are ways to validate these trained components independent of the training data (i.e. by using other data), but validation will always be more difficult.

jedbrown · on July 21, 2020

With enough coaxing, we can get the optimizer to converge to known methods (high-order, conservative, entropy-stable, ...), and I'm sure this tactic will lead to more papers, though they'll be kind of empty unless we're really discovering good methods that were not previously known.

I presume you meant "verify" in the last sentence.

ChrisRackauckas · on July 22, 2020

No, what I am doing is using high order, conservative (universal DAEs), strong-stability preserving, etc. discretizations for the numerics but utilizing neural networks to represent unknown quantities to transform it into a functional inverse problem. In the discussion of the HJB equation, we mention that we solve the equation by writing down an SDE such that the solution to the functional inverse problem gives the PDE's solution, and then utilize adaptive, high order, implicit, etc. SDE integrators on the inverse problem. Essentially the idea is to utilize neural networks in conjunction with all of the classical tricks you can, making the neural network have to perform as small of a job as possible. It does not need to learn good methods if you have already designed the training problem to utilize those kinds of discretizations: you just need a methodology to differentiate through your FEM, FVM, discrete Galarkin, implicit ODE solver, Gaussian quadrature, etc. algorithms to augment the full algorithm with neural networks, which is precisely what we are building.

So I completely agree with you that throwing away classical knowledge won't go very far, which is why that's not what we're doing. We utilizing neural networks within and on top of classical methods to try and solve problems where they have not traditionally performed well, or utilizing it to cover epistemic uncertainty from model misspecification.

fluffything · on July 22, 2020

This looks really interesting.

I think it would be a good topic for a blog post or teaching paper that shows how to do this for very simple problems "end-to-end" (e.g. advection eqt, diffusion eq, advection-diffusion, burgers eqt., poisson eqt, etc.).

I see the appeal in showing that these can be used for very complex problems, but what I want to understand is what are the trade-offs for the most basic hyperbolic, parabolic, and elliptic one-dimensional problems. What's the accuracy? What's the order of convergence in practice? Are there tight upper bounds? (does that even matter?), what's the performance, how does the performance scale with the number of degrees of freedom, what does a good training pipeline look like, what's the cost of training, inference, etc.

There are well-understood methods that are optimal for all of the problems above. Knowing that you can apply these NN for problems without optimal methods is good, but I'd be more convinced that this is not just "NN-all-the-things hype" if I were to understand how these methods fair against problems for which optimal methods are indeed available.

ChrisRackauckas · on July 22, 2020

No, it will not work well without the optimal method. But the method is no longer optimal if say a nonlinear term is added to these equations, so you can use the "optimal" method as a starting point and then try to nudge towards something better. Don't throw away any information that you have.

jedbrown · on July 22, 2020

This comment sounds good. I was objecting to approaches like Eq 10 of your paper and much of the Karniadakis approach.

castratikron · on July 22, 2020

Cool example, thanks!

currymj · on July 21, 2020

this is very cool.

I was thinking specifically of this and related approaches https://arxiv.org/abs/1909.08423 where they search for the ground state by iteratively using an MCMC sampler and doing SGD. The innovation is a network architecture that takes classic approaches from physics and judiciously replaces parts with flexible NNs.

I had not even considered how things might work if you actually want to think about time.

Do you know if anybody has been running this NN+DiffEq solver stuff on big HPC systems that also have GPUs? If you know of any papers where they tried this, would be interesting to look at.

fluffything · on July 21, 2020

I see a Poisson solver in the docs.

Is there a paper comparing the performance of this particular solver against the state of the art ?

(if you are using GPUs, the AmgX library has a finite-difference solver for Poisson in their examples - very far from the state of the art, but a comparison might put performance in perspective)

whinvik · on July 21, 2020

Almost every time a PDE is solved on a computer, it is a variational problem. Maybe neural networks are indeed good at this but I haven't seen any literature that shows that it is provably better. A reference would be good, especially to this point "But neural networks are very good parametric function approximators, generally better than what traditionally gets used in physics (b-splines or whatever)."

currymj · on July 21, 2020

https://arxiv.org/abs/1909.08423 and https://arxiv.org/abs/1909.02487 are some examples I've been looking at recently.

whinvik · on July 21, 2020

Thanks, not familiar with QM at all, but it seems to me from glancing through one of the papers that the neural network is used to replace a popular way of representing the wave function which itself is an Ansatz. Not very convincing, but of course, as I said not familiar with the background and so I may be overlooking something.

currymj · on July 21, 2020

that's exactly it -- they take an existing form for the ansatz (or the general idea of it, at least), and make it more flexible by replacing pieces with neural networks that have many more parameters, while maintaining constraints required by physics. I think this will become very common in the future.

whinvik · on July 21, 2020

That maybe true but what I was looking for is a more convincing way of showing that a neural network approximates a function better than other functions, such as say a b-spline. For example, you say that the neural network with many more parameters works better, but what if we had a b-spline with many more nodes.

hnarayanan · on July 21, 2020

I don't know anything about anything but I'm willing to bet that the end result is very similar. They're "just" using neural networks as rich approximators.

jhrmnn · on July 22, 2020

I'm an author of one of the arXiv papers above. One thing to consider is that the approximative power of a given parametric function is not the only criterium. Being able to optimize that function efficiently is as important. Neural networks excel in this. So the comparison you ask for most likely won't appear, because any other parametric ansatz with tens of thousands (or more) parameters would be impossible to optimize. At the least that's the case in quantum Monte Carlo, the domain of our paper. As for "provable", I also don't think that will appear. All the exact theorems about neural networks are way too abstract to be applicable to practical problems.

btrettel · on July 21, 2020

> Almost every time a PDE is solved on a computer, it is a variational problem.

Not true. In computational fluid dynamics, variational methods are only one category out of many, and they aren't dominant.

whinvik · on July 21, 2020

Its usually only finite difference methods that are not variational. But finite difference is dominant in academia, not in industry. And that is changing as well with methods such as the discontinuous Galerkin method. The more popular finite volume method in industry, can also be seen as a variational problem.

Yes, I exaggerated when I said that, but its still mostly variational problems.

btrettel · on July 21, 2020

> The more popular finite volume method in industry, can also be seen as a variational problem.

By that standard, you could interpret almost any numerical method for PDEs used in academia or industry as variational (aside from some fringe ones). By "variational" I mean methods which are designed in a variational way from the start, not can be merely interpreted variationally.

whinvik · on July 21, 2020

Well, it helps to see these connections. For example, realising that the lowest order DG method is finite volume lets one think about how to extend well studied finite volume properties to high order DG methods.

wenc · on July 21, 2020

So the idea of surrogate models (for parameter estimation) has been around for some time, where f(x, θ) is some (computationally) simplified model of a complex model/simulation (x = factors, θ = parameters).

f can be any arbitrary choice that works.

Not sure if the choice of f being a NN is necessarily related to AI, where some cognitive function is being replicated. It is a good function approximator though.

mumbisChungo · on July 21, 2020

Why ought machine learning be boring to anyone but specialists? Does this imply that specialists ought to be born, rather than become specialists out of interest?

currymj · on July 21, 2020

There are lots of things that I think are comparable to machine learning in the sense that they combine applied math and heavy computation and are very practically important, like simulating chemical reactions, solving operations research problems, or computational fluid dynamics. You cannot talk about these things at cocktail parties, though, because people will slowly shuffle away from you -- whereas you can talk about deep learning, which is odd.

Basically, I think if somebody wants to work in machine learning then they should be encouraged, and I think it's great that barriers to entry are lower than most fields, but the average person should not feel like they need to care about it, and if they do it might be because they have an inaccurate narrative.

mumbisChungo · on July 21, 2020

>You cannot talk about these things at cocktail parties, though, because people will slowly shuffle away from you -- whereas you can talk about deep learning, which is odd.

It's really not odd at all. The average person has some familiarity with ML/AI, so you don't have to expend the energy to introduce them to the topic in a way that is understandable and also engaging to them. They already have a baseline, and are likely already aware of some interesting use cases. By contrast, they might not even know what "operations research" is, so you have to be both willing and able to expend the energy to explain the field in a way that is comprehensible and interesting. I'm sure it's possible, but the cross-section of people with the knowledge, the interest, and the social graces to do it is probably small.

To me it seems that a large swath of the science community dislikes buzzwordification and pop science more than they like proliferation of knowledge, based on how negative responses seem to be to things like normie interest in AI here. I would be fascinated to read any peer reviewed studies on the negative impacts of pop science on long term scientific advancement, so that I could understand this bias (and debunk my own bias that more interest in science is better in the long term).

KineticLensman · on July 21, 2020

I interpreted this to mean something like 'how phones / car engines / etc work' is not of interest to most of their users as long as they get the job done. If they get 'interesting' it can means that something isn't working right. Where 'interesting' = 'suddenly noticeable'.

catalogia · on July 21, 2020

I don't think they mean ML in general is boring, just that this particular application of it isn't particularly flashy.

mumbisChungo · on July 21, 2020

Maybe I misinterpreted this:

>it will be boring to anyone but specialists, as the rest of machine learning ought to be.

noobermin · on July 21, 2020

Thank you. More importantly, this is not new. Sheesh, how much of AI is just hype. I am in favor in not using the term AI for such things.

baq · on July 21, 2020

> John McCarthy, the Father of AI, famously said: "As soon as it works, no one calls it AI any more." Leading researcher Rodney Brooks says "Every time we figure out a piece of it, it stops being magical; we say, 'Oh, that's just a computation. '"

https://cacm.acm.org/blogs/blog-cacm/138907-john-mccarthy/fu...

Barrin92 · on July 21, 2020

this is an extremely overused fortune-cookie like quote. There's a legitimate distinction to be made between intelligence on one hand, and simply computation or calculation on the other. If we start calling every numerical method AI we're rendering the term meaningless.

xvilka · on July 22, 2020

None of existing AI, ML, DL, RL algorithms is intelligence either.

glial · on July 21, 2020

What’s the distinction?

Barrin92 · on July 21, 2020

In the most basic sense intelligence involves the aqcuisition of knowledge, which is representation or generalisation at some higher level of abstraction and the ability to make decisions.

The mere ability to perform computational work is something virtually even the tiniest piece of hardware entails, or even an abacus for that matter.

avereveard · on July 21, 2020

intelligence is inductive

finding correlations and using a human to filter the interesting ones from the flukes doesn't make the correlations engine an ai, the intelligence is still in the human

exabyte · on July 21, 2020

The power that I see in machine learning is the techniques being developed to handle the unavoidable noise in empirical data. I think that poses a large obstacle for traditional techniques although I am not familiar enough to compare.

willis936 · on July 22, 2020

To me the value is in matching relationships (equations) of curated parameters from empirical data and using simulated recreations of the experiment as the objective. As soon as you can recreate experimental results in a simulation then you’ve made a successful model for that domain. This is an incredibly important and difficult task for fluid dynamics and plasma physics.

stainforth · on July 22, 2020

Fusion is right around the corner!

amelius · on July 21, 2020

Is this AI taking the place of preconditioners as used in iterative solvers?

Would an AI be able to learn how to apply multigrid methods?

ChrisRackauckas · on July 21, 2020

We have prototypes in Julia for this. The answer is yes, there are tricks you can use to do this effectively.

astroH · on July 21, 2020

Could you elaborate a bit more on this? I have thought about employing NNs in this way for quite a while but the thing I never wrapped my head around was ensuring how it generalizes to different problems.

bencw · on July 21, 2020

This is something I've wondered about (along with potential applications of autograd outside of deep learning). Do you have a recommended starting point for someone who wants to learn more about this?