* wpgu: Initial implementation of PixelBender shader execution
The implementation is split across four crates:
* `ruffle_render` now holds the main PixelBender bytecode parsing
implementation (previously, this was in `ruffle_core`).
* `ruffle_core` holds some helper functions for converting between
AVM2 `Value`s and the PixelBender vector types.
* `naga-pixelbender` (newly created) constructs a Naga `Module`
from parsed PixelBender bytecode
* `ruffle_render_wgpu` sets up the render pipeline for the shader
constructed by `naga-pixelbender`, and actually executes the shader.
The Actionscript-side shader parameters are passed in through uniforms.
This allows us to cache the compiled `naga::Module` and associated
wgpu types inside `ShaderData`, when it's first created. Each invocation
of a `ShaderJob` only needs to create a bind group and render pass.
Limitations:
* Only a few of the PixelBender opcodes are implemented - however, this is
enough to get Stemlands cannon rotation working, as well as a cool
"donut" shader that I found and included as a test.
* PixelBender matrix types are not supported.
* Only BitmapData is supported as an input/output type - Flash Player
also supports using Vector and ByteArray
* ShaderJob execution is always synchronous.
* Adjust comments
* Address review comments
Generally, when transforming a difference between two points, `p1`
and `p2`, with a matrix `m`, we would like the following property
to hold:
```
m * (p1 - p2) == m * p1 - m * p2
```
Unfortunately, it wasn't like this before, because matrices have a
translation component, which is non-linear. In `m * p1 - m * p2`,
the translations of `m * p1` and `m * p2` are the same and therefore
cancel out each other. However, in `m * (p1 - p2)` the translation
stays.
In order to preserve this property, introduce a new `PointDelta`
type which is not subject to translation when transformed by a matrix.
For now, the following operations are supported:
* `Point - Point -> PointDelta`
* `Point + PointDelta -> Point`
* `Point += PointDelta`
* `Point - PointDelta -> Point`
* `Point -= PointDelta`
As a consequence, the expression `position + global_to_local_matrix * mouse_delta`
in `update_drag()` now ignores translation, which fixes#817.
This matches the Context3D docs. Calling 'present' swaps
the buffers.
I wasn't certain if we actually need a double-buffered depth
texture, but I included one just to be safe.
Now that most of the complicated Context3D methods have been
implemented, we can simplify the overall design. Instead of queueing
up commands and having `present` execute them in a loop, we
can execute each command immediately. The key insight is that
a `RenderPass` is only needed for `DrawTriangles`, so we don't
have to store it in `Context3D` and deal with complicated lifetime
issues.
The old behavior gave us implicit double-buffering behavior,
since nothing would get rendered until a 'present' call.
Now that a 'drawTriangles' call will immediately submit
a draw command, we need to implement actual double buffering.
This is done in the next commit.
This is a very large diff, but most of it comes from test files and
output.
This PR ads partial support for the following Stage3D shader features:
* Normal (square), rectangle, and cube textures
* Varying and temporary registers
* Lots of opcodes
The combination of these allows us to get a raytracing program
fully working in Ruffle. I've included it as image test.
Currently, this test is very slow (about 90 seconds on my machine),
as the code I'm using (https://github.com/saharan/OGSL) includes
its own shader language and compiler. THe raytracing demo
first compiles its own shader language to AGAL, and then starts
rendering the scene.
Limitations:
* Many opcodes are still unimplemented
* Most non-default texture options (e.g. mipmaps) are not implemented
* Take two: Delay reading image back from render backend using `SyncHandle`
This allows us to avoid blocking immediately after a `BitmapData.draw` call.
Instead, we only attempt to use the `SyncHandle` when performing an operation
that requires the CPU-side pixels (e.g. BitmapData.getPixel or BitmapData.setPixel).
In the best case, the SWF will never explicitly access the pixels of
the target BitmapData, removing the need to ever copy back the render backend
image to our BitmapData. If the SWF doesn't require access to the pixels immediately,
we can delay copying the pixels until they're actually needed, hopefully allowing
the render backend to finish processing the BitmapData.draw operation in
the backenground before we need the result.
Now that the CPU and GPU pixels can be intentionally out of sync with
each other, we need to ensure that we don't accidentally expose 'stale'
CPU-side pixels to ActionScript (which needs to remain unaware of
our internal laziness). We now use a wrapper type `BitmapDataWrapper`
to enforce that the `SyncHandle` is consumed before accessing the
underlying `BitmapData.
* core: Skip GPU->CPU sync for source and target BitmapData during draw
* Introduce DirtyState enum
There were two issues:
1. We were accidentally calling `as_any` on `handle,` rather than
`handle.0`
2. Calling `as_any` can invoke the wrong implementation, depending on
what traits are in scope. We want the method implemented on the
underlying type (`RegistryData`) to be used, but if `Downcast` is
explicitly imported, then we appear to invoke it on the trait object
`dyn BitmapHandleImpl` itself (using the fact that trait objects
themselves implement `Any`). We now explicitly call the generated
method on the trait object, which avoids this issue.
`BitmapHandle` now holds `Arc<dyn BitmapHandleImpl>`.
This allows us to move all of the per-bitmap backend data into
`BitmapHandle`, instead of holding an id to a backend-specific
hashmap.
This fixes the memory leak issue with bitmaps. Once the AVM side of a
bitmap (`Bitmap`/`BitmapData`) gets garbage-collected, the
`BitmapHandle` will get dropped, freeing all of the GPU resources
assoicated with the bitmap.
This PR implements core 'stage3D' APIs. We are now able
to render at least two demos from the Context3D docs - a simple
triangle render, and a rotating cube.
Implemented in this PR:
* Stage3D access and Context3D creation
* IndexBuffer3D and VertexBuffer3D creation, uploading, and usage
* Program3D uploading and usage (via `naga-agal`)
* Context3D: configureBackBuffer, clear, drawTriangles, and present
Not yet implemented:
* Any 'dispose()' methods
* Depth and stencil buffers
* Context3D texture apis
* Scissor rectangle
General implementation strategy:
A new `Object` variant is added for each of the Stage3D objects
(VertexBuffer3D, Program3D, etc). This stores a handle to the
parent `Context3D`, and (depending on the object) a handle
to the underlying native resource, via `Rc<dyn
SomeRenderBackendTrait>`).
Calling methods on Context3D does not usually result in an immediate
call to a `wgpu` method. Instead, we queue up commands in our
`Context3D` instance, and execute them all on a call to `present`.
This avoids some nasty wgpu lifetime issues, and is very similar
to the approah we use for normal rendering.
The actual rendering happens on a `Texture`, with dimensions
determined by `createBackBuffer`. During 'Stage' rendering,
we render all of these Stage3D textures *behind* the normal
stage (but in front of the overall stage background color).
We only called `get_bitmap_pixels` when creating a `BitmapData`
for an SWF-provided `Bitmap`. We now store the initial pixels
in `Character::Bitmap`, and use them to initialize a `BitmapData`
when needed.
This lets us simplify the wgpu backend, which no longer needs
to store a `Bitmap` object. In addition to saving space for
`BitmapData` objects that lack an SWF `Bitmap`, this will make
it easier to move data from `bitmap_registry` into `BitmapHandle`
itself.