XaiJu
Puppygames
Puppygames

patreon


Transcripts 6th December 2017 - 12th December 2017

Cas: mornin

Dan: I'm stuck in a sleep/eat/schoolwork cycle...

Cas: feck 😦

we going to have something to show on Friday?

Dan: I should have some time to work on the SSAO tonight.

I'm not sure what you'd call showable

Most of the stuff we have are technical improvements.

Cas: that's ok so long as we can talk about it

besides if you get the SSAO working with the 2-stage stuff tonight maybe that's somethgin to show. Also MSAA

Dan: Performance improvements I should be able to get the correct normals working reduced memory usage MSAA.

Cas: doesnt matter how small progress is so long as it's progress 😃

Dan: I think I can cut the memory usage quite a lot by not needing an alpha channel.

Yeah I can summarize it.

Cas: yep

Dan: I managed to get some cool work done.

Cas: oh-ho how cool is cool?

Dan: Well not insanely uber cool

but pretty cool.

Slope-based depth-aware blur!

Cas: 😎

Dan: Normally you do a depth-aware bilateral blur on the SSAO

basically you make sure that the neighboring pixels have the same depth as the one you're blurring with

to prevent bleeding over edges.

Cas: nice

Dan: The problem is that I had two major problems:

- We need a really sensitive depth test to make sure we don't blur over voxel edges

- Having a sensitive depth test means that as soon as we're not looking at something directly the fact that the object is sloped means that the depth test will fail even for neighboring pixels belonging to the same triangle.

To solve this I store the X and Y slope of depth for each pixel together with the linear depth value.

So when blurring I modify the depth I compare with based on this slope.

Basically this means that for a perfectly flat surface it will blur 100% no matter the view angle because it compensate for it.

I could set the depth error threshold to 0.1% and still get accurate results.

had to do quite a bit of crazy stuff to fit all the data though.

ssao is 8 bits depth is 24 bits and slope is 2 half floats = 64 bits.

The addition of the slope translates to just one more add per sample in the blur shader too so minor cost there.

Cas: lovely

this means nice easy 2-level SSAO then?

Dan: Well it solves some problems for it.

For the large scale SSAO we REALLY want a blur

and this blur is precise enough that it doesn't ruin the small radius SSAO.

Cas: sounds like... all of the problems?

Dan: It opens it up a bit yeah.

I still need to add the normal support

Cas: that bits easy surely

Dan: or rather now I need to output the slope properly

yeah but possibly expensive...

Cas: thought normals would be cheap as chips to write?

Dan: I managed to win back quite a lot of performance by reducing the main MSAA buffer from GL_RGBA16F to GL_R11F_G11F_B10F halving memory usage there.

The slopes need half precision so another GL_RG16F multisampled buffer brings us back to the same amount of memory usage again.

My biggest worry is that the depth aware MSAA upscale of SSAO isn't good enough and we need to take the slope into account

which will slow down the merge even more...

Hopefully it all works out well.

The reconstructed normal looks like shit in certain cases where there's just not enough info to reconstruct it.

Cas: it can't fail 😄

Dan: *nervous laughter*

Cas: heh

hopefully soon have all this fancy stuff behind us and on to particles or something soon then eh

Dan: Yeah hopefully.

Current timings: 

Frame 3650 : 4.573ms (100.0%)     Camera 1 : 4.535ms (99.1%)         Shadow map rendering : 0.397ms (8.6%)             Shadow map 1 : 0.395ms (8.6%)         Clear main buffer : 0.062ms (1.3%)         Terrain rendering : 0.923ms (20.1%)         Skybox : 0.001ms (0.0%)         Post processing : 3.149ms (68.8%)             SSAO rendering : 1.145ms (25.0%)                 Linearize depth : 0.226ms (4.9%)                 Generate depth mipmaps : 0.096ms (2.1%)                 Compute SSAO : 0.462ms (10.1%)                 Blur : 0.358ms (7.8%)             Merge : 0.906ms (19.8%)             Bloom : 0.652ms (14.2%)             Tone mapping : 0.442ms (9.6%)

8xMSAA 1440p.

Linearize depth is technically not part of SSAO we'll need it for particle blending anyway.

Cas: way over target spec then is still doing good

1080p 4xMSAA is the high end i'm targeting

Dan: Hopefully I can successfully revert to the old branch to compare performance and see if there's anything I've missed.

1080p 4xMSAA:  Frame 3640 : 2.52ms (100.0%)     Camera 1 : 2.475ms (98.2%)         Shadow map rendering : 0.391ms (15.5%)             Shadow map 1 : 0.389ms (15.4%)         Clear main buffer : 0.019ms (0.7%)         Terrain rendering : 0.632ms (25.1%)         Skybox : 0.001ms (0.0%)         Post processing : 1.43ms (56.7%)             SSAO rendering : 0.695ms (27.5%)                 Linearize depth : 0.102ms (4.0%)                 Generate depth mipmaps : 0.07ms (2.8%)                 Compute SSAO : 0.294ms (11.7%)                 Blur : 0.224ms (8.8%)             Merge : 0.289ms (11.4%)             Bloom : 0.295ms (11.7%)             Tone mapping : 0.147ms (5.8%)

Cas: bearing in mind your card is pretty much top of the range

Dan: I mean there's the GTX 1080 TI but yeah lol

2.5ms on my card for the highest settings is pretty nice though.

Cas: 960 is sorta the smart gamer's choice

Dan: Hey!

lol

Cas: sensible money

Dan: HEY!!! >=((((

Cas: well if you've got money to burn ... 😄

Dan: HEYYYYYYYYYYYYYYYYYYY!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

lol

Trying out the lowest postprocessing/MSAA settings:

660FPS 75% GPU load Frame 9505 : 1.07ms (100.0%)     Camera 1 : 1.033ms (96.5%)         Shadow map rendering : 0.368ms (34.4%)             Shadow map 1 : 0.367ms (34.3%)         Clear main buffer : 0.021ms (2.0%)         Terrain rendering : 0.388ms (36.2%)         Skybox : 0.0ms (0.0%)         Post processing : 0.248ms (23.2%)             Merge : 0.031ms (2.9%)             Bloom : 0.137ms (12.8%)             Tone mapping : 0.077ms (7.2%)

1080p

Cas: no MSAA?

Dan: Nope.

Could maybe cut down that time by a bit more by reducing shadow map resolution

or disabling shadow maps complete.

Cas: plenty of options on the table.

Dan: Then we're talking ~0.7ms on my card.

Cas: bear in mind we've got to add 10000 particles and a few thousand models too 😃

well few hundred realistically

Dan: 10k particles shouldn't be a major issue. Overdraw is the bigger issue to worry about here.

1k models shouldn't be a major issue either we're talking vertex limitations at that point.

Oh those above minimum numbers were still with bloom so that's another 0.15ms or so gone.

Prolly 0.5ms at absolute lowest.

Even lower if you drop the resolution even more.

Cas: pretty remarkable

so normals tonight...?

Dan: We'll see I gotta take care of some more school crap

Cas: doh

I'll do my little tasks anyway

Dan: but one of my 3 projects ended up being pushed back until saturday and it's pretty much 75% done already

so I've got a bit more time than I was expecting which is nice.

Note that the ultimate performance will depend on the vertex count a lot.

You may want to have simplified models for the bots

not as LOD but rather as complete low-end alternatives.

Get what I mean?

If we determine that they are the major performance hog it may be worth it even though it'd require more resources.

Cas: probably not much need - they're going to be rather simple really

Dan: Could you maybe make an example robot so we can start with the robots at some point?

Cas: your typical bot has to fit into 16x16x32 ish voxels

Dan: I'd like to see how many tris it ends up being and so.

Cas: Ill make a simple one

maybe see if @Chaz will draw one for us

Dan: Cool.

Hehe

Just tried out the game at 1440p 8x SUPERsampling 16k x 16k shadow maps and SSAO

80+ FPS

Cas: bah

Dan: Crazy settings. xD

---

Dan:

Hey sorry been in school all day as we had a project deadline and presentation

and another one coming up tomorrow...

It's super stressful and I'm not sure what I can finish until tomorrow

Cas: No worries. Don't panic.

Dan: When exactly is the deadline for the blog post tomorrow?

I'm busy until 19:00 with projects AT LEAST

Cas: No particular time. Could leave it till Sunday

Dan: and probably going out for dinner afterwards to not die x___x

Cas: What we do have to do though is dial back the sprint contents to be more realistic taking into account school etc

I did a crap little robot btw

G101.vox

Not textured

And only one vox

Dan: Textured meaning...?

Cas: Well coloured

It's just grey

Dan: There would be small discrepancies in performance in that case as it doesn't need to sample a texture

but it should be minimal.

Cas: Well I will colour it a bit then

Really I need to separate the legs into a separate model

That'll be more representative

Dan: You may want to use some kind of special marker palette index to act as a face hider

So the top of the robot's leg section doesn't get faces that are never visible.

Get what I meaN?

Cas: Well... Yes. But hard with voxels

More trouble than worth.....

Also we blow the tops off sometimes and just leave smoking boots

Or have wreckages lying around

Dan: If we can save a decent chunk of triangles by doing it it could be the difference between the game running well and not .

Cas: For the small number of models on screen it's not worth it

Dan: Anyway one test model is good enough for future purposes.

Cas: Yup

I'll separate the legs

Dan: Gonna fire it up and check out the model real quick

It's not committed?

Cas: It's in the trunk branch

Dan: gah

Cas: Silly me

Dan: ... can you send it to me? x___x

I gotta get back to sadder and less fun stuff soon..

Cas: Yeah in a bit you go do school stuff first fun stuff later

Dan: OK see you sunday evening lol

Cas: Heh

Dan: I'm waiting for a response from someone so I got a moment to spare

Cas: Ok sec

https://cdn.discordapp.com/attachments/380372019251773444/388420012559302657/g101.vox

chop that in 2

as in separate the legs into a separate model

that's how 90% of the robots will be in the real game

some will be bigger

some made of more bits some less

Dan: It's much simpler than I expected tbh

Like smaller

Cas: told you 16x16 base for rank-and-file robots. Small.

if we were using 32x32 voxel tiles they'd be bigger

but the map would be 1/4 the area

Dan: If you export that as .obj what's the triangle count?

Cas: having to download and install Blender just to find out...

Dan: oh blender can do it?

Cas: probably

Dan: testing myself...

240 tris

but the triangulization they're doing is better than mine

Cas: even with 2000 of them on screen then thats only 480k

Dan: It'd probably be 300-350 for mine.

that's almost a million tris

well 600-700k

Cas: aye but thats a) the biggest battle the system allows and b) more robots than can fit on the screen even at max zoom out

Dan: Remember we got shadow maps to render too.

Cas: realistically we'll be dealing with a few hundred robots tops

and in most cases only a couple of hundred

however there will also be a few hunded small models scattered about....

trees rocks etc

Dan: we're at 2250k tris righw now with 3 lights.

Cas: 3 lights?

Dan: 2k robots x (1 cam + 2 shadow maps) = 2000*700*3 = 4 200 000

2000 x 700 x 3

discord formatted that due to *

Cas: `use backticks`

Dan: err?

Cas: `200*300*400`

Dan: 'err

''' err

Cas: single backtick `

then end with another `

for oneline snippets

Dan: yah don't got a key for that so it's annoying as fuck to write.

Cas: triple for multiline

oh yeah

i remember

doh

Dan: Need like 5 presses to write it

one

ANYWAY

Cas: ANYWAY

Dan: it's potentially a lot of tris is what I'm saying

Cas: don't worry

nah

you worry about the pathologically worst case

there will not be more than a couple hundred robots onscreen

plus another er say 100 smallish models

another 2.5m tris tops

Dan: 2.5 mill???

not that much of an improvement over 4.2 mill...

Cas: 500 models 500 tris each

Dan: = 250k?

Cas: yeah sorry 😃

x 3

Dan: *3 = 750k at worse?

Cas: for camera & lights

yeah

so that doesn't look too bad

Dan: That gives us a 750k high end 250k low end.

I think we can aim for and achieve <1m tris without shadows.

Cas: but but bu... we want shadows

Dan: I meant FOR mega low-end

Cas: oh right

Dan: The worst case scenario doesn't matter much if you can reduce the graphics settings

Cas: yeah no worries

Dan: It's good that the settings max out a modern GPU

means that you can throw more juice at it and get something out of it

Cas: bear in mind by the time we get to release this damned game everything will be 1.5x faster...

Dan: A very small fraction of people will have the absolute newest cards though

the biggest gain is the old hardware being phased out

Cas: yeah the AVERAGE computer will be 1.5x faster

Dan: hmmm...

Cas: we're already seing dualcores disappearing

Dan: Well we'll see

Cas: http://store.steampowered.com/hwsurvey

Dan: I gotta get back to school stuff

Cas: look at the ends of the graphs

kk

Dan: It sounds like it can work out decently.

Cas: it will look amazing

Dan: I don't think we can get any better triangle generation than what I've already managed to get btw

(The .obj is better technically)

well at least not without wasting a lot of memory.

THe texturing is based on quads

Cas: yeah

Dan: Hmm.

It could be doable I guess...

Is there any way to turn a model into a single voxel type in MagicaVoxel?

AKA turn a voxel model into a 1-bit model where each voxel is either on or off?

Cas: erm

no but its a trivial operation

Dan: You can use the Paint tool to do it it seems.

If we do that we can use MagicaVoxel's obj exporting to get a seemingly perfect mesh.

Then generate textures for it using the colored .vox model + the .obj model as input.

Cas: if you want to look at how it does it just go look at the source and port it to java

Dan: It might waste a tiny bit of memory but it's a drop in the ocean compared to the terrain.

If we can save 25% tris that way that'd be huge.

Going down to 240 tris instead of 300+ would be great.

Cas: we can preprocess vox models using some optimiser to our hearts content if we want

Dan: The vox models?

Oh you mea

yeah I get it

Yeah we got infinite time here.

Did a quick search couldn't find magica voxel source code

is it really open source+

?

Cas: it used to be it seems now its vanished hmm

still the algorithms i think are well understood

I bet your meshifier isn't far off optimal already

despite what you think

Dan: It is though

for some cases

A voxel meshifier may be good enough to fix those problems though

err

a mesh optimizer*

---

Dan: I woke with some a good idea today.

Cas: you home now?

Dan: The problem with MSAA is that postprocessing needs to be supersampled.

Nope on my way to school

Cas: doh

Dan: Won't be home until late evening

Anyway

Postprocessing per sample is expensive

Extra memory usage for intermediate buffers is very bad too.

Remember my initial idea?

Cas: er no

Dan: Do a non-MSAA prepass at the cost of a shadow map read SSAO/transparency while drawing the terrain!

A non-MSAA prepass would cost around 0.3-0.4ms and ELIMINATE the merge pass.

Cas: well do that then 😃

Dan: The terrain is rasterizer limited so the cost there is almost 0.

Cas: I was trying to get streaming working properly last night using the old sort mechanism but after hours I just couldnt get the fucking thing to work. It kept leaving holes in the terrain and I don't know why.

---

Cas: Hola

Been out all weekend

You been up to much?

Dan: Haven't had time last project deadline tonight...

Sorry just got finished with the last project...

Been one of the most hectic weeks of my life. x___x

Pretty sure got finished is an incorrect literal swedish translation. xd

Cas: Heh

Well I'm off to bed

Maybe we can do a patreon post tomorrow evening?

Dan: What do you want finished for it?

If I know what I need to do exactly I can try to plan for that.

Cas: I dunno... Just the fancier ssao and msaa I suppose

That's all we were planning to show

But if it takes a bit more effort there's no point in rusbing

Rushing

Dan: So in other words just: >two level SSAO >adding accurate normals output from the rendering pass >MSAA support for all of this <____<

Cas: Er yeah 😬

Don't rush

Can't be helped

Dan: Do you know of a game called Factorio?

Cas: Yes

Dan: You seen their Friday Facts posts?

Cas: Nope

Dan: They just post work in progress stuff optimization results and comparison images and stuff

I could probably at least get in accurate normals tomorrow. We could do a comparison of the two if you think it'd be interesting. >___>

I dunno how much the readers would appreciate it though...

Cas: I think it would be interesting

---

Cas: hiya

you home?

Dan: I am finishing up some school work...

Will need a bit more time...

Do you need anything urgently?

Cas: not urgently no just wondered if there was anything we could do this evening together on it

might take another swing at the streaming chunks problem

Dan: Sure I should have time in around 60-90 min and from then on.

Cas: ooh nice

Dan: Almost done here... q.q

Cas: yay

Dan: Alright I'm good to go.

Cas: so... normals writing then?

Dan: Well slope writing but yeah.

I'll probably do it in a prepass.

argh the hacks keep piling up

gonna take a step back and clean it up a bit

Cas: ohoh

yes

don't rush it with hacks

Dan: Gonna do no-MSAA first.

Cas: best to get it right

Dan: It's the cleanest one no need for a prepass.

Hmm having trouble with MRT again...

Probably my fault

How does the fragment output annotations work with inheritance?

Cas: it should inherit

Dan: Found another problem that was the cause

Slope is being output correctly finally.

Cas: nice

Dan: zero measurable cost

Cas: annotations working properly then?

Dan: This is without MSAA.

Checking

I moved them to the last class so there was no inheritance

checking

Yes seems to be working.

Cas: I suspect I didn't account for inheritance but it should be a simple fix

Dan: It did work so no problem.

Looks like SSAO normals are fixed for no MSAA.

Cas: cool

I suppose MSAA is vastly more fiddly?

Dan: Yeah doing a prepass messed up the frustum culling.

I don't want to do it twice

so I'll need to store data for it to work...

Need to restructure the code a bit.

Cas: how does it mess the culling?

Dan: Well render() currently does both the culling and rendering.

I want to do culling once and render twice.

Cas: yeah see... update() is supposed to do stuff like culling

and then render() should do the rendering based on what was computed in update()

or at least thats how I've generally structured it everywhere else

Dan: which won't work with multiple terrains/cameras/shadow maps.

I'll ultimately want to do all culling (multithreaded) and store the result for each job

then pick the right result when doing the rendering later.

Cas: yup

Dan: I know how to do it it's just extra work which is why I'm whining lol

Cas: hehe

Dan: hmm need to make the blur check the slope too... =___=

getting ugly lines.

At the edge of each staircase step you get some bleeding.

You see it?

Cas: eww

Dan: (the checkerboard pattern is intentional it's not visible.

the problem is the bleeding there.

IT will be 100x worse with actual MSAA as well since it'll stand out more.

Right now everything just looks like an aliased mess so......

You see the bleeding at least?

Right?

Cas: yes

can even see it when pic is shrunk down

Dan: Fixed it no more bleeding like that.

No significant cost increase to the blur.

Will have to do the same thing when upscaling MSAA...

hmm but I have no slope info then fuck.

not per sample

Damn the (better) slope aware blur looks awesome with high radius SSAO.

open original and switch between the two.

Cas: its rather hard to see a difference except the 2nd one is like a shade darker...

Dan: Look at the in shadow mountain walls.

Cas: ah yes!

Dan:

Cas: fantastic

Dan: Do you still want the small SSAO as well?

Cas: well....

hard to say till I see it for reals at home

Dan: I'll see if I can add it.

Cas: that last screenie seems to be the small radius SSAO though doesnt it?

Dan: No it's not.

It's still the same radius as the previous one.

Cas: oh

Dan: You do get some shaded edges of the bumps/holes in the ground as those surfaces are sloped.

The SSAO filters sample in a half-sphere going out from the surface.

For a vertical surface half the sphere is underground.

Hence the holes/walls get a lot of occlusion and therefore pop out.

That's kinda why I asked about it.

But sec almost got it working with two radii

It's unoptimized but...

When are you gonna be home?

Could use some discussion on this

Cas: leaving work in 20 mins so back home in an hour or so from now... but I'm making dinner tonight so I'll be a bit busy for a few hours

Dan: Able to test stuff right now?

Cas: you in the test branch still?

Dan: err yeah

Cas: ill check it out here at work ...

Dan: Pushed Voxoid.

It shows small scale SSAO on left side of screen large scale on right side and both combined in the middle right now.

So you can see which algorithm does what.

Cas: woo neat

sec

Dan:

preview

Cas: yeah combined does look good

Dan: Not much different to large scale only though

the only real difference at all is the small holes in the ground

which we probably won't have in the end.

Cas: well there might be a few

here and there

Dan: We pretty much need twice as many samples for it to look good...

Cas: what level MSAA is that?

Dan: none still not implemented with MSAA

Cas: uh

looks quite good already then which is promising

Dan: but it's rendered at 1440p so if you don't show it at full res it's pretty much supersampling

This with 32 samples 16 on each of the two levels

Hmm wait.

There's a bug in how I divide the samples...

sec

Cas: well im gonna get ready and go home now

ill be back online a bit later

Dan: Alright

fuuuuuck.

School just pulled another quick one on me and it seems like there's a risk I can't start on my master's thesis in January

I'm gonna need to focus more on my school work during christmas...

Cas: gahhh

i'm going to go and cook dinner

Dan: Pushed a small update fixed some SSAO issues.

Gotten a chance to test it yet?

Cas: Sec

Wrangling

Dan: WHY YOU LITTLE----

Cas: _syncs_

shimmers a lot without msaa!

Dan: sure does

Cas: hm or is it the MSAA

zoomed in it still shimmers as the mouse is moved

needs more samples?

Dan: Mouse is moved?

Cas: yeah the mouse light - turn it back on and see

Dan: That's the mouse light's shadows that shimmer.

Needs higher res shadow map and a bit more filtering.

I turned it off for that reason.

Cas: are you sure?

Dan: Hold G to turn off the SSAO.

Cas: coz its shimmering on a flat surface that's not got shadow

hm yes still does it with SSAO off

but looks oddly like its not the shadow

Dan: It's probably at an edge.

You're looking at the side of a mountain right?

The shadow map res is fairly low right now.

Cas: ah maybe it is

Dan: 2k x 2k

4k x 4k + filtering should be fine.

I think we can both agree that the large scale SSAO looks really good

but I'm not convinced the small scale SSAO is worth it.

Compute SSAO : 0.716ms (31.4%)

Half of that is the small-scale SSAO so if we only use the big-scale SSAO that'd be 0.35ms.

Entire frame: Frame 73423 : 2.278ms (100.0%)     Camera 1 : 2.24ms (98.3%)         Shadow map rendering : 0.193ms (8.4%)             Shadow map 1 : 0.192ms (8.4%)         Clear main buffer : 0.018ms (0.8%)         Terrain rendering : 0.428ms (18.7%)         Skybox : 0.0ms (0.0%)         Post processing : 1.597ms (70.1%)             SSAO rendering : 1.281ms (56.2%)                 Linearize depth : 0.062ms (2.7%)                 Generate depth mipmaps : 0.106ms (4.6%)                 Compute SSAO : 0.716ms (31.4%)                 Blur : 0.392ms (17.2%)             Merge : 0.184ms (8.0%)             Tone mapping : 0.129ms (5.6%)

Cas: the one on the left is small scale?

Dan: Left is small scale right is big scale.

(middle is both)

Cas: i have to say the middle one is the best

has the most detail

Dan: I don't think it adds too much but OK

We could add quality setting for it if you want.

Low quality SSAO = large scale only.

or something like that.

Cas: could even crank the effect up a bit - might play with the figures

Dan: It's hard to tweak the two individually now though.

They're all accumulated together.

I can separate them I think.

I'll disable the test mode too and let you tweak it

Cas: k

and i suppose that the MSAA version is next and is gonna be a bugger?

Dan: It'll take a bit of time yes.

I'm sorry shit has really hit the fan here.

Been throwing emails out all evening...

Cas: dont worry

im gonna hit the hay I think anyway... knackered up at 630am every day to go to work

Dan: Oh

=<

You got like 10 more mins or so...? .__.

Cas: sure

Dan: Pushed

Re-enabled bloom added separate strength variables to ssao.

See ssao.frag.

Cas: k

Dan: Too much SSAO makes the whole thing look dirty I guess.

Especially look at the mountains in direct sunlight.

Cas: jacking up the wide radius SSAO strength makes it look nice

still... we can fiddle with that when theres real graphics

---

Cas: dont worry if you think school sucks just imagine being at work

Dan: work only sucks at work

school sucks no matter where you happen to be at the moment

it never leaves you

I saw someone had written Drop out while you still can! on a whiteboard at uni today

Cas: lol


Transcripts 6th December 2017 - 12th December 2017

More Creators