Instead of binding every supported sampler combination
and selecting the correct index in our AGAL builder,
we now determine the correct sampler on the wgpu side,
and create one sampler binding per texture slot.
We now validate the passed in profile, and return the selected profile
from 'Context3D.profile'. We don't yet alter the available
registers/textures based on the profile.
After some testing, and looking at OpenFL, I believe I've
determined the correct behavior for AGAL sampling:
Each time a Context3D.setProgram or Context3D.setSamplerStateAt
call is made, the sampler config for the used texture slot(s)
is updated with the new wrapping/filter behavior. For setProgram,
this comes from all of the 'tex' opcodes used within the program.
However, when the 'ignoresampler' flag is set in a 'tex' opcode,
the setProgram call does *not* override the existing sampler config.
As a result, that program will sample with the behavior determined
by the most recent setSamplerStateAt or setProgram call involving
the used texture slot(s).
Previously, we were always overriding the opcode sampler config
with the values from Context3D.setSamplerStateAt. However, I didn't
realize that the order of the calls matter, so none of my tests ended
up observing the effect of 'ignoresampler'.
We now need to process AGAL bytecode twice - a quick initial
parse to determine the sampler configs (which need to be updated
when we call 'setProgram'), and a second time when to build the
Naga module (which needs to wait until we have the vertex attributes
available, which can be changed by ActionScript after setting
the program).
This is our first non-rgba texture format (it uses Bc3RgbaUnorm).
ATF files store these textures in a very convoluted way - fortunately,
the 'dds2atf' tool is open-source, which allowed me to figure out
how to decode the texture back to a DXT5/DXT1 texture.
The 'gc_arena' dependency was only used to manipulate the `GcCell`s
containing the vertex and fragment shaders; replacing these by a
reference to a plain old `Cell` means tha the Context3D traits and
types do not need to interact with GC'd object anymore.
As a knock-on effect, we can also remove the `Activation` parameter
from most of the `Context3DObject` methods.
The bind group layout only depends on the texture registers
(and 2D/cubemap type) accessed by the fragment shader, not on
the runtime texture bound with Context3D. This means that we can
build and cache it when we compile the AGAL program to a Naga
module.
Since the bind group layout is used for the overall pipeline, I've
refactored the shader caching code into `ShaderPairAgal`, which
holds both the vertex and fragment shader bytecode, and compiles
both in the `compile` function.
We use an `lru::LruCache` inside `ShaderModuleAgal`. This automatically
gives us the proper garbage-collection behavior (when the Flash
Program3D instance is garbage collected, we'll drop the
`ShaderModuleAgal` and the cache).
The cache is keyed on the data needed to compile the shader (vertex
attributes and sampler overrides). This lets us avoid shader
recompilations when a Stage3D program repeatedly uses the same
Program3D with different sampler overrides / vertex attribute formats.
In a previous PR, I introduced an optimization that used
`copy_texture_to_texture` to copy directly from a BitmapData GPU
texture to a Stage3D GPU texture.
Unfortunately, this optimization is incorrect. A BitmapData GPU
texture can be modified at any time by normal AVM2 code - in
particular, in might be modified before we submit the encoded
`copy_texture_to_texture` command. This shows up in Sniper Team,
which re-uses BitmapData objects for multiple distinct textures.
The previous 'optimization' resulted in the wrong BitmapData contents
getting uploaded to a texture (since it was changed before the copy
command was submitted).
When multisampling is enabled, we should create a new multisampled texture,
and use the existing texture as the resolve buffer. We also need to
call `update_has_depth_texture` to keep our pipeline aware of whether
or not we currently have a depth buffer attached.
Makes progress on #10641 (it has a stack overflow after
this PR, due to an unrelated issue).
wgpu requires buffer copy sizes and offsets to be 4-byte aligned.
Unfortunately, ActionScript can perform 2-byte aligned uploads
into an IndexBuffer3D.
To support this, we now keep a copy of the IndexBuffer3D on the CPU.
When performing an upload to the buffer, we round the offset down
and the size up to the nearest 4-byte aligned value. The cpu buffer
is used to fill out the write with existing data, so that we don't
corrupt the contents of the GPU buffer.
To avoid introducing a new RefCell, I've changed IndexBuffer3D
to use a `Box` instead of an `Rc` to store the trait object.
This allows us to pass a mutable reference down to the backend.
This matches the Context3D docs. Calling 'present' swaps
the buffers.
I wasn't certain if we actually need a double-buffered depth
texture, but I included one just to be safe.
Now that most of the complicated Context3D methods have been
implemented, we can simplify the overall design. Instead of queueing
up commands and having `present` execute them in a loop, we
can execute each command immediately. The key insight is that
a `RenderPass` is only needed for `DrawTriangles`, so we don't
have to store it in `Context3D` and deal with complicated lifetime
issues.
The old behavior gave us implicit double-buffering behavior,
since nothing would get rendered until a 'present' call.
Now that a 'drawTriangles' call will immediately submit
a draw command, we need to implement actual double buffering.
This is done in the next commit.
When we receieve a nonzero 'antiAlias' parameter, we create
create a non-multisampled resolve buffer to use with WGPU.
Several tests were already requesting antialiasing, so their
output images are now anti-aliased without any changes to
the tests themselves.
In the process, I fixed a bug where we were clearing the depth
and stencil buffers with the incorrect value.
This makes Fancy Pants World 4 Part 1 playable to completion
(though there are still some rendering issues that need
to be fixed).
We don't need to perform a sync when getting the width/height,
getting or setting the 'disposed' status, or uploading to
a Context3D texture.
The Context3D change (using `copy_texture_to_texture` instead
of relying on the CPU pixels) has the added advantage of avoiding
a validation error when our source image row length isn't aligned
to `COPY_BYTES_PER_ROW_ALIGNMENT`
This dramatically speeds up the Fancy Pants World 4 loading time
(on a branch with my XML prs merged). Without this change, my
machine spends around 10 seconds on a blank white screen after
clicking 'Play'. With this change, the time spent on that screen
is reduced to around 1-2 seconds.
None of these formats can currently be implemented
correctly with wgpu, so we just use Rgba8Unorm instead.
The handling of opaque compressed textures is a little
sketchy - it should work for 'normal' SWFs that upload
an opaque BitmapData, but we might need to manually
adjust the alpha values if
We were ignoreing 'data32PerVertex'.
To make the code clearer, I've renamed the variable to
'data32_per_vertex', and made it a 'u8' (as it has a maximum of 64)
Webgl doesn't support BGRA textures, so this lets us use
Stage3D textures on the web backend. As a bonus, this speeds up
uploading an BitmapData to a Context3dTextureFormat.BGRA texture,
since we no longer need to change the format before copying.
This makes Solarmax2 playable on the web backend.
This is a very large diff, but most of it comes from test files and
output.
This PR ads partial support for the following Stage3D shader features:
* Normal (square), rectangle, and cube textures
* Varying and temporary registers
* Lots of opcodes
The combination of these allows us to get a raytracing program
fully working in Ruffle. I've included it as image test.
Currently, this test is very slow (about 90 seconds on my machine),
as the code I'm using (https://github.com/saharan/OGSL) includes
its own shader language and compiler. THe raytracing demo
first compiles its own shader language to AGAL, and then starts
rendering the scene.
Limitations:
* Many opcodes are still unimplemented
* Most non-default texture options (e.g. mipmaps) are not implemented