We use an `lru::LruCache` inside `ShaderModuleAgal`. This automatically
gives us the proper garbage-collection behavior (when the Flash
Program3D instance is garbage collected, we'll drop the
`ShaderModuleAgal` and the cache).
The cache is keyed on the data needed to compile the shader (vertex
attributes and sampler overrides). This lets us avoid shader
recompilations when a Stage3D program repeatedly uses the same
Program3D with different sampler overrides / vertex attribute formats.
In a previous PR, I introduced an optimization that used
`copy_texture_to_texture` to copy directly from a BitmapData GPU
texture to a Stage3D GPU texture.
Unfortunately, this optimization is incorrect. A BitmapData GPU
texture can be modified at any time by normal AVM2 code - in
particular, in might be modified before we submit the encoded
`copy_texture_to_texture` command. This shows up in Sniper Team,
which re-uses BitmapData objects for multiple distinct textures.
The previous 'optimization' resulted in the wrong BitmapData contents
getting uploaded to a texture (since it was changed before the copy
command was submitted).
When multisampling is enabled, we should create a new multisampled texture,
and use the existing texture as the resolve buffer. We also need to
call `update_has_depth_texture` to keep our pipeline aware of whether
or not we currently have a depth buffer attached.
Makes progress on #10641 (it has a stack overflow after
this PR, due to an unrelated issue).
wgpu requires buffer copy sizes and offsets to be 4-byte aligned.
Unfortunately, ActionScript can perform 2-byte aligned uploads
into an IndexBuffer3D.
To support this, we now keep a copy of the IndexBuffer3D on the CPU.
When performing an upload to the buffer, we round the offset down
and the size up to the nearest 4-byte aligned value. The cpu buffer
is used to fill out the write with existing data, so that we don't
corrupt the contents of the GPU buffer.
To avoid introducing a new RefCell, I've changed IndexBuffer3D
to use a `Box` instead of an `Rc` to store the trait object.
This allows us to pass a mutable reference down to the backend.
This matches the Context3D docs. Calling 'present' swaps
the buffers.
I wasn't certain if we actually need a double-buffered depth
texture, but I included one just to be safe.
Now that most of the complicated Context3D methods have been
implemented, we can simplify the overall design. Instead of queueing
up commands and having `present` execute them in a loop, we
can execute each command immediately. The key insight is that
a `RenderPass` is only needed for `DrawTriangles`, so we don't
have to store it in `Context3D` and deal with complicated lifetime
issues.
The old behavior gave us implicit double-buffering behavior,
since nothing would get rendered until a 'present' call.
Now that a 'drawTriangles' call will immediately submit
a draw command, we need to implement actual double buffering.
This is done in the next commit.