* wpgu: Initial implementation of PixelBender shader execution
The implementation is split across four crates:
* `ruffle_render` now holds the main PixelBender bytecode parsing
implementation (previously, this was in `ruffle_core`).
* `ruffle_core` holds some helper functions for converting between
AVM2 `Value`s and the PixelBender vector types.
* `naga-pixelbender` (newly created) constructs a Naga `Module`
from parsed PixelBender bytecode
* `ruffle_render_wgpu` sets up the render pipeline for the shader
constructed by `naga-pixelbender`, and actually executes the shader.
The Actionscript-side shader parameters are passed in through uniforms.
This allows us to cache the compiled `naga::Module` and associated
wgpu types inside `ShaderData`, when it's first created. Each invocation
of a `ShaderJob` only needs to create a bind group and render pass.
Limitations:
* Only a few of the PixelBender opcodes are implemented - however, this is
enough to get Stemlands cannon rotation working, as well as a cool
"donut" shader that I found and included as a test.
* PixelBender matrix types are not supported.
* Only BitmapData is supported as an input/output type - Flash Player
also supports using Vector and ByteArray
* ShaderJob execution is always synchronous.
* Adjust comments
* Address review comments
We use an `lru::LruCache` inside `ShaderModuleAgal`. This automatically
gives us the proper garbage-collection behavior (when the Flash
Program3D instance is garbage collected, we'll drop the
`ShaderModuleAgal` and the cache).
The cache is keyed on the data needed to compile the shader (vertex
attributes and sampler overrides). This lets us avoid shader
recompilations when a Stage3D program repeatedly uses the same
Program3D with different sampler overrides / vertex attribute formats.
These are poorly documented, but from looking at OpenFL
and AGALMiniAssembler, they allow performing loads of the
form `vc[va0.x + offset]` - that is, computing a dynamic register
number, instead of using the register number present in the opcode.
In a previous PR, I introduced an optimization that used
`copy_texture_to_texture` to copy directly from a BitmapData GPU
texture to a Stage3D GPU texture.
Unfortunately, this optimization is incorrect. A BitmapData GPU
texture can be modified at any time by normal AVM2 code - in
particular, in might be modified before we submit the encoded
`copy_texture_to_texture` command. This shows up in Sniper Team,
which re-uses BitmapData objects for multiple distinct textures.
The previous 'optimization' resulted in the wrong BitmapData contents
getting uploaded to a texture (since it was changed before the copy
command was submitted).
When multisampling is enabled, we should create a new multisampled texture,
and use the existing texture as the resolve buffer. We also need to
call `update_has_depth_texture` to keep our pipeline aware of whether
or not we currently have a depth buffer attached.
Makes progress on #10641 (it has a stack overflow after
this PR, due to an unrelated issue).
wgpu requires buffer copy sizes and offsets to be 4-byte aligned.
Unfortunately, ActionScript can perform 2-byte aligned uploads
into an IndexBuffer3D.
To support this, we now keep a copy of the IndexBuffer3D on the CPU.
When performing an upload to the buffer, we round the offset down
and the size up to the nearest 4-byte aligned value. The cpu buffer
is used to fill out the write with existing data, so that we don't
corrupt the contents of the GPU buffer.
To avoid introducing a new RefCell, I've changed IndexBuffer3D
to use a `Box` instead of an `Rc` to store the trait object.
This allows us to pass a mutable reference down to the backend.
Generally, when transforming a difference between two points, `p1`
and `p2`, with a matrix `m`, we would like the following property
to hold:
```
m * (p1 - p2) == m * p1 - m * p2
```
Unfortunately, it wasn't like this before, because matrices have a
translation component, which is non-linear. In `m * p1 - m * p2`,
the translations of `m * p1` and `m * p2` are the same and therefore
cancel out each other. However, in `m * (p1 - p2)` the translation
stays.
In order to preserve this property, introduce a new `PointDelta`
type which is not subject to translation when transformed by a matrix.
For now, the following operations are supported:
* `Point - Point -> PointDelta`
* `Point + PointDelta -> Point`
* `Point += PointDelta`
* `Point - PointDelta -> Point`
* `Point -= PointDelta`
As a consequence, the expression `position + global_to_local_matrix * mouse_delta`
in `update_drag()` now ignores translation, which fixes#817.
Flash does not support nested mask regions and instead merges them
into a single clip region.
For example, this occurs when using a dynamic text field as a mask.
One mask layer contains the glyphs, while the second layer is the
bounds of the text field. The text field bounds end up being
ignored when the text field is used as a mask, allowing the text
outside the bounds to be visible.
Add `CommandList::maskers_in_progress` to keep track of the mask
state and discard drawing commands for inner maskers.
Fixes#9664.