There is only a single thread executing any single process instance. So when you do parallel gateways they will execute sequentially. I don't know the guts well enough to know anything about order.
Another way to reason about this would be to consider how many people could write thread safe scripts for their before and after actions. No offense to you non-programmer workflow creators. Making stuff thread safe is tough. Making systems as extensible and customizable as this thread safe is a nightmare.