regolovr

Regolo Engine S 1.40

Added 2022-05-01 01:52:01 +0000 UTC

Hi guys,

who read the latest release posts knew some important changes were in the air, and infact here we are. Regolo Engine gets an additional "S", focuses on Simulation and starts a new cycle. This begins with the features that have been more quiet during these last months: audio and communication.

Beside this, a rework of the engine to isolate the Simulation part was no joke. Also, the release number means that's there's a leap forward but still not completed. You will see other advancements like this one soon, hopefully.

I hope you will enjoy RES 1.40 in the meanwhile.

Let's go check the new stuff!

A VERY important notice before going on

This is a beta version that uses external software and (optionally) connects to Microsoft cloud platform where you could be billed for services used. Please consider to check Azure cognitive services free/billing services terms and conditions before using Regolo Engine S with it.

Regolo Engine software and developer are not accountable for any billing issue or unexpected payment arose as consequence of its usage.

This is an unfinished product and a beta version and doesn't contain any mechanism that can prevent Microsoft from charging you with any amount of money.

All the details I know and can deliver to support are in the related paragraph.

The video

RES 1.40 video full HD

What's new

Here's an overview of the new stuff, check the next paragraphs for insights and instructions on some of them.

Simulation 1.40

simulation is the base behaviour
girl can speak freely in real time (no prerecorded audio)
girl can modulate voice in different intonation
girl can move mouth in sync with free speech
girl can understand better the context
girl can formulate English sentences with a good degree of freedom (simple ones still)
girl can use gestures to underline what she wants to communicate
new idle more dynamic and linked to conversational moves

new features

grammatical module to build English sentences in real time related to the context
Regolo Talking Machine (external software) to connect to Azure Cloud (for Enchanter)

new extroversion stat changes talking rate (only while walking for now)
new animaSim preset "talker" set a quite talkative and curious personality

misc

final step while walking is more smooth now
added to config file simExtroversion to set the starting extroversion value

audio (enchanter)

free real time speech through Azure Cloud
real time mouth sync
girl can speak freely and will produce voice regardless of the sentence type

fixes

fixed major 1.30 bug that prevented girl to start walking over after interaction
removed wrong idle facial expressions (mouth too opened and eyes closed)
she now can always climb max height steps if you take her by hand

current known bugs

Regolo engine S still shares same folder and config files names of standard engine
she cannot stop acting while speaking/chatting, even if she is grabbed
talking machine still early version
a lot of various issues still occur while in the different modes, like bad collisions and stuff

As always, if you notice something that is not working fine please let me know and I will try to fix it asap.

It's a leap of faith

I have had some fear about it but I think times were ready for this jump.

I left behind the old engine, that will be available in 1.30 and past releases only. It was kinda sad but I decided time ago that Simulation is the real thing and that legacy/wanderer belong to the past.

This way a lot of useless stuff can be removed and other can be simplified or converted to the new status with ease. Also, it's more simple to add things. This is not a one shot task since it must get rid of months (years?) of coding, so I will probably go on for the next releases on cleaning up stuff.

Anyway now the user can experience directly the Simulation mode without trying to reach it through complex and infuriating settings.

Also, the girl ui and the plugin menu are now working together and are linked to the same experience. Before they set options related to different gaming modes and this created confusion making hard approaching the plugin.

Then, focus on Simulation allowed to begin on adding contextual and conversational skills. Girl has started to express herself more freely trough chat and audio, building sentences linked to her contextual understanding.

For example she is able to she is able to express opinion about:

how many steps she has taken
the last song she danced
the hour of day
if it's morning, night
if she is tired and needs to rest
if she climbed something and she is up
if boy is far
if boy is kneeling

Also she express herself on other genera argument.

In the end think this was one of the biggest releases ever, I worked a lot for it and I hope you will appreciate execution and in particular the vision behind it. This was possible because the engine is developed entirely by me, from the first line of code to the last one, since this allows to have complete understanding and control over its different components.

I didn't work so much on the explorations and interactions for obviously reasons but they will be improved as well as soon as possible. They are still the same as for 1.30 at the moment.

Please be aware that Regolo Engine S shares the same folder and config files names of the standard engine. This could lead to errors. I suggest to move all the old files to a backup folder before copying the new ones.

Spiritually, ecumenically, grammatically.

So I wanted the girl's communication skills to improve, or better to be part of the experience, so I focused on them. I started to think about syntax and semantics, what really means to "talk". I just started teaching the girl what means to talk and the rules that are at the foundation of language. It was just an inextricable puzzle at the beginning, but it started to slowly shape up.

This is the first step in that direction, a limited deterministic way of building sentences, hence it has still very limited talking abilities. As you can imagine it's very hard to give real context awareness to a virtual doll, and even worse is trying to have her language capabilities to express it.

Also this ability leverage the engine's complexity and all the information that is gathered in a centralized way while it is running and the girl doing her stuff. The girl has space and time information, along with perception data, so she can definitively tell something about them. They are for now too limited information that are interpreted linearly, but they can increase in number and density obviously.

Also, let me vent a bit about going back to studying English and building a conversational model on it. I wanted to kill myself sometimes 😂. I'm pretty sure you now understand why this release took a lot to be released. All the stuff in here took away a huge amount of time spent on tests and study. I felt like I came back to school. Also I'm definitely not the best English speaker around here (and I have been nice). You will notice some bad grammar sentences. It could depend on software or it could depend on me. In any case let me know if I can improve the sentence building or I am doing it completely wrong.

I also don't know if this is a semantic model I'm going to keep. It has a lot of weakness, it's not very robust, I created it with the purpose of showcasing the infinite possibilities of the free speech, but real verbal communication is another thing, obviously.

I feel I could change it, not sure about it, we will see. I would like to have something able to scale well with more language freedom, so I will probably try to figure something else at a certain point.

Open your mouth. Say, "Ah."

I felt the body language is as important as the audio one, so I did an extra effort.

First of all I added mouth movements synchronized with the azure text to speech. It adds a lot to immersion.

There are also new body posture and hands movements that accompany the speech, in order to make the communication more real. As for the speech they are not fixed but could vary depending on the construction and meaning of the sentence.

Unfortunately there are no facial expressions aligned to the message yet. So you will see negative speech sided to happy expression etc. This is a hole I hope to fill when possible.

It's talking, Merry. The tree is talking!

Finally, as I anticipated some builds ago, I worked on a new way to add sounds to conversation and bring it to whole new level. I tried a lot of things but using an external app is the only feasible solution I came with.

I leverage on Azure Recognition Service and unfortunately Vam doesn't allow external calls or bundle with dll that use these libraries, so I had no choice.

So basically the new tool just waits for the Regolo Engine S to notify some speech info and queries the Microsoft service to generate the requested speech in real time. This is a game changer for me. No more prerecorded audio, just the expressive skills of the girl and the infinite possibilities linked to a powerful AI cloud platform like Azure.

Obviously this is quite tricky to setup (spoiler: you need an Azure account) and it has some limitation (the free tier allows half a million character for month), but the benefits are worth the hassle in my opinion.

Also, this is for Enchanter version only, as usual for the audio features.

Anyway I repeat the key point here: the girl builds a sentence and pronounces it in real time with a good range of intonations, syncing the mouth as well 😍.

How to improve this and make it to evolve? Well, for sure it's possible to give even more color to the sentences depending on the intention using the Microsoft cloud power. This is something that I didn't have the time to do this release but I could work on it for the next ones.

Also as I said the model is very simplistic, there's the need for something that would offer evolved conversational skills. But this is not something that can be built in short time, it needs a lot of experience and trial and error, so it will be a process more than a release feature.

For sure the current model can be expanded with more awareness, more knowledge (from the girl point of view), more occasions of speech. Also a lot of work could be done on facial expression alignment with speech.

On the negative side, this release made me realize even more clearly the limits of working with VaM. I struggled a lot searching for a solution to the free speech problem, and I had to give up other cool ideas. VaM has too many restrictions and it makes hard to connect to external services, so basically limits interoperability and hence creativity by a lot. I am kinda disappointed by these findings and I think this is detrimental to the project at a level I probably must still fully realize.

So in the end, revolution or missed opportunity/chance? Who knows, but I'm sure this is in the direction I imagined when I wrote the first line of code of the engine, so it's good like this. We will see where this will lead though 😎 .

The Regolo Talking Machine

Ok, let's speak about this new piece of software available for the Enchanter version. This is the external tool that asks Azure Cloud to do a speech synthesis and receives the answer.

If you want the girl to speak using the Azure cloud you need to have this program to run. First you need to enable it putting valid Azure credentials. This is explained better in the next paragraph.

Then you simply execute it from the Regolo Engine S folder. A window will open and tell you that it's waiting instructions from the engine, as in the picture below.

It basically waits for an input from the Regolo Engine S and assembles the proper request for Azure. It works with text files so you will see some new stuff appearing in the Regolo Engine S folder.

You can execute it and leave it running even if you start the plugin multiple times (but not VaM). It will listen to the engine waiting for new speeches.

If it quits for any reason or VaM crashes I suggest to start it again and close it right after so that it can repair the messy status with a clean exit (in those cases infact it can cause the plugin to go really slow. If you think you're in one of those cases open and close again to clean stuff).

I know a new piece of software is quite a surprise at this point but I didn't find any other of connecting to any external service, so it was mandatory. Anyway it's not mandatory so if you want you can avoid using it or wait for a more stable version.

Also, all the caveat and recommendations you will read in the next paragraph are valid here as well. So please consider always that this is a beta product and that the software is used to connect to a Microsoft service.

Azure is the sky

So here we are, technical paragraph with all the details about the Regolo Engine S connecting to Azure Cloud platform.

First why Azure. Well, I tried other platforms and I think Azure is the one that suited the best. I appreciated the connection system, I liked the communication formats, I enjoyed the features and I loved the voices. Other ones instead gave me always some trouble, so Azure was a no brainer.

Regolo Engine S doesn't try to connect to Azure but instead tries to communicate with the Regolo Talking Machine program if:

it is running
you have flagged it as enabled in the config file

Otherwise Regolo Engine S just uses dialog box and that's it.

Then, if you launch the Regolo Talking Machine and enable it in the config file, you should have specified a valid Azure text to speech key1 and region parameters, otherwise the RTM fails to connect (there's a step by step guide to set and recover them below).

If you have done each of previous steps correctly then the RTM sends a request to Azure cognitive speech service for every sentence created by the Regolo Engine S (well, all the new sentences, the old fixed ones like "ooops" or "hanging around" are not in the right format).

Please consider that when you start using Azure cognitive service as specified in my step by step guide below you have half a milion free characters available monthly, as specified here (https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/), so a good number of words on average but not infinite.

I spoke with support and it seems that after a month you have to upgrade from the trial account to the pay to go one, but basically the free quota remains, so if one doesn't exceed there should be no risk. Probably there's a way to create alert and threshold to avoid bad surprises at the end of the month, making sure to use just the free quota. I'm still learning about it.

Please before proceeding go reading once more time the important notice above in this post and think twice before going on, since Microsoft is an external entity and I cannot be held responsible for any amount of money they will decide to charge you with.

This build in particular is rudimental and it doesn't provide any protection mechanism, so it should be used for testing purpose only!

Anyway, if you read so far and didn't run away in fear here's for you the step by step guide on how to make an Azure account, create a text to speech resource and get the key and region info 😁 .

Azure step by step guide

1. go on azure free account portal: https://azure.microsoft.com/free
2. click on the "Start free" green button
3. login with your Microsoft account or create a new one
4. fill the requested information (name, address, ...)
5. put the credit card information
6. once the account creation process has finished you will be redirected to the quickstart center: https://portal.azure.com/?quickstart=true
7. in any case, wherever you are, you can easily see the vertical drop down menu on the upper left corner. Open it and click on "Create a resource"
8. search for "Speech" in the search field, select it and then click on "Create" on the next page (the one with the service details)
9. on the creation editor you need to put - the subscription (the current one should be already selected) - the resource group (click on create new one and just choose a name) - the location (es: West US) - the pricing tier (choose Free F0) <-------------- very important
10. click on validate and create, and then on create again, distribution will start and text to speech resource will be created
11. congratulations, your resource has been created, now you can go to the resource page (many ways to do that, from the home, from the dashboard, from the vertical menu or from the blue button appeared after creation "go to resource")
12. on the resource page, click on "keys and endpoint" on the left menu
13. copy key1 and location and put them between quotes in the related fields of the "talkingMachine" section in the regoloEngine_E.json config file. Also don't forget to put the "enabled" field to "true"

In the end

This was really a make or break kind of release, and we went really close to death 😂. But I really wanted it, because dream about free speech since the engine's inception.

I'm sorry for this long wait though, I'm moving into complex territories and it's impossible to deliver something meaningful in 4 weeks, now more often than in the past. I'm sure it's clear now why I was venting a lot 😋 .

Next release I will try to make experience more robust and to make the contextual system evolve. I'm planning other good stuff but I need time, obviously.

Tomorrow I will be out of town the whole day but I will check and answer in the evening, so if you have any issue with the new build please let me know.

Have fun, see you in a few weeks!

RegoloVR