* wpgu: Initial implementation of PixelBender shader execution
The implementation is split across four crates:
* `ruffle_render` now holds the main PixelBender bytecode parsing
implementation (previously, this was in `ruffle_core`).
* `ruffle_core` holds some helper functions for converting between
AVM2 `Value`s and the PixelBender vector types.
* `naga-pixelbender` (newly created) constructs a Naga `Module`
from parsed PixelBender bytecode
* `ruffle_render_wgpu` sets up the render pipeline for the shader
constructed by `naga-pixelbender`, and actually executes the shader.
The Actionscript-side shader parameters are passed in through uniforms.
This allows us to cache the compiled `naga::Module` and associated
wgpu types inside `ShaderData`, when it's first created. Each invocation
of a `ShaderJob` only needs to create a bind group and render pass.
Limitations:
* Only a few of the PixelBender opcodes are implemented - however, this is
enough to get Stemlands cannon rotation working, as well as a cool
"donut" shader that I found and included as a test.
* PixelBender matrix types are not supported.
* Only BitmapData is supported as an input/output type - Flash Player
also supports using Vector and ByteArray
* ShaderJob execution is always synchronous.
* Adjust comments
* Address review comments
In a previous PR, I introduced an optimization that used
`copy_texture_to_texture` to copy directly from a BitmapData GPU
texture to a Stage3D GPU texture.
Unfortunately, this optimization is incorrect. A BitmapData GPU
texture can be modified at any time by normal AVM2 code - in
particular, in might be modified before we submit the encoded
`copy_texture_to_texture` command. This shows up in Sniper Team,
which re-uses BitmapData objects for multiple distinct textures.
The previous 'optimization' resulted in the wrong BitmapData contents
getting uploaded to a texture (since it was changed before the copy
command was submitted).
wgpu requires buffer copy sizes and offsets to be 4-byte aligned.
Unfortunately, ActionScript can perform 2-byte aligned uploads
into an IndexBuffer3D.
To support this, we now keep a copy of the IndexBuffer3D on the CPU.
When performing an upload to the buffer, we round the offset down
and the size up to the nearest 4-byte aligned value. The cpu buffer
is used to fill out the write with existing data, so that we don't
corrupt the contents of the GPU buffer.
To avoid introducing a new RefCell, I've changed IndexBuffer3D
to use a `Box` instead of an `Rc` to store the trait object.
This allows us to pass a mutable reference down to the backend.
Generally, when transforming a difference between two points, `p1`
and `p2`, with a matrix `m`, we would like the following property
to hold:
```
m * (p1 - p2) == m * p1 - m * p2
```
Unfortunately, it wasn't like this before, because matrices have a
translation component, which is non-linear. In `m * p1 - m * p2`,
the translations of `m * p1` and `m * p2` are the same and therefore
cancel out each other. However, in `m * (p1 - p2)` the translation
stays.
In order to preserve this property, introduce a new `PointDelta`
type which is not subject to translation when transformed by a matrix.
For now, the following operations are supported:
* `Point - Point -> PointDelta`
* `Point + PointDelta -> Point`
* `Point += PointDelta`
* `Point - PointDelta -> Point`
* `Point -= PointDelta`
As a consequence, the expression `position + global_to_local_matrix * mouse_delta`
in `update_drag()` now ignores translation, which fixes#817.
Flash does not support nested mask regions and instead merges them
into a single clip region.
For example, this occurs when using a dynamic text field as a mask.
One mask layer contains the glyphs, while the second layer is the
bounds of the text field. The text field bounds end up being
ignored when the text field is used as a mask, allowing the text
outside the bounds to be visible.
Add `CommandList::maskers_in_progress` to keep track of the mask
state and discard drawing commands for inner maskers.
Fixes#9664.
This matches the Context3D docs. Calling 'present' swaps
the buffers.
I wasn't certain if we actually need a double-buffered depth
texture, but I included one just to be safe.
Now that most of the complicated Context3D methods have been
implemented, we can simplify the overall design. Instead of queueing
up commands and having `present` execute them in a loop, we
can execute each command immediately. The key insight is that
a `RenderPass` is only needed for `DrawTriangles`, so we don't
have to store it in `Context3D` and deal with complicated lifetime
issues.
The old behavior gave us implicit double-buffering behavior,
since nothing would get rendered until a 'present' call.
Now that a 'drawTriangles' call will immediately submit
a draw command, we need to implement actual double buffering.
This is done in the next commit.
* `global_to_local` returns `None` if the object has zero scale.
* Adjust AVM `globalToLocal` methods to return the untransformed
point on failure.
* Add `DisplayObject::mouse_to_local` to handle AVM `mouseX`
and `mouseY` coordinates. For zero scale objects, these end up
returning values based on the twips-to-pixels scale,
divided by 20.
* Add `Matrix::determinant`.
* Rename `Matrix::invert` to `inverse`.
* `Matrix::inverse` return an `Option`, with `None` returned
for non-invertible matrices.
* AMV `Matrix::invert` duplicates the code as the behavior is
different (works in f64 and not twips, etc.)