Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admin-controllable synchronous execution support #2168

Closed
wants to merge 5 commits into from

Conversation

SolraBizna
Copy link

OpenComputers computers execute in worker threads separate from the world thread. 99.9% of the time, this is a very good thing. Sometimes, however, precise tick-perfect timing is needed. Nuclear reactor controllers, note block playback engines, and Endergenic Generators are all examples of things where tick jitter can have negative consequences. (Obviously, some of these have higher stakes than others.)

This PR adds the ability for computers to request synchronous execution, and be executed on the world thread. Since doing this across the board would be suicidal, it also adds features that allow admins to control where it can be used.

These changes were inspired by @gamax92 's experiment earlier today, and guided by feedback in #oc. @gamax92 was incredibly helpful in tracking down almost every change that needed to be made. Skye requested the microcontrollerOnly option.

Defaults

Unless specifically configured otherwise, no computer is ever Secure, and a Secure boot is required to be synchronous. With an unchanged configuration, the only change from this PR is the ability for computers to voluntarily shorten their own execution timeout.

Secure Boot

There is a new option in the configuration file, computer.sync.whitelist. When a computer first boots up, if its architecture name or EEPROM checksum is in that whitelist, it boots into a Secure state. Otherwise, it boots into an Insecure state. This allows admins to strictly control what code is given the dangerous powers this PR adds. (If the config setting computer.sync.microcontrollerOnly is true, only Microcontrollers can be Secure.)

Secure computers are allowed to request synchronous operation. If the config setting computer.sync.allowOverrideTimeout is true, Secure computers can also increase their execution timeout. (Both Secure and Insecure computers can decrease it, regardless of this config option.)

A Secure computer may enter the Insecure state at will with component.computer.giveUpSyncPrivileges(). Once it is Insecure, the only way for it to become Secure again is to reboot with a whitelisted architecture or EEPROM. This would allow "secure bootloaders" to pass Secure status only to trusted code.

Whitelisting a specific EEPROM is useful if the admin has vetted it as being "safe"; either a bootloader that only boots admin-signed code, or a self-contained "safe" program on its own. Whitelisting a specific architecture is useful if the architecture in question has its own lag-friendly execution caps. (With OC-ARM, for instance, four computers configured to run at 100x the normal clock rate, running busy looping test programs, only added ~8 milliseconds to each tick.) Either way, the computer doesn't actually become synchronous until synchronous operation is requested.

Synchronous Operation

Computers may request synchronous operation with component.computer.setSyncMode(true). Secure computers may always request synchronous operation. Insecure computers may also do so if the (dangerous!) config setting computer.sync.insecure is true.

A synchronous computer is executed on the world thread rather than in a worker thread. This means that it can respond to events in the world with tick-perfect timing. This also means it can horribly lag the world if its execution takes too long, which is why the Secure Boot feature is there.

Computers that wish to do so can return to asynchronous operation with component.computer.setSyncMode(false). If a computer is in synchronous mode and computer.sync.insecure is false and component.computer.giveUpSyncPrivileges() is called, that will also cause it to become asynchronous.

Custom Timeouts

To help mitigate potential timeout-related problems, this PR also adds a computer.setCustomTimeout function to machine.lua. Normally, only shorter timeouts than the configured default are honored; this allows synchronous computers to voluntarily limit their world-lagging potential. If the config option computer.sync.allowOverrideTimeout is set, Secure computers can also lengthen the timeout. They should only do this when the alternative is oblivion; e.g. nuclear reactor control computers.

@payonel
Copy link
Member

payonel commented Dec 9, 2016

if a machine is in synchronous mode, and it computer.pullSignal(0)s, is my machine coroutine resumed at MOST in the next tick?

@Vexatos
Copy link
Contributor

Vexatos commented Dec 9, 2016

It's a good thing that EEPROMs and architectures can only be added directly via the config file; it's a tedious process making sure that you really, really know what you're getting yourself into.

@SolraBizna
Copy link
Author

SolraBizna commented Dec 9, 2016

if a machine is in synchronous mode, and it computer.pullSignal(0)s, is my machine coroutine resumed at MOST in the next tick?

Yes. If there is a signal in the queue, it will return immediately. If there is not a signal in the queue, it will return next tick. In either case, do not pass go and do not collect $200.

Copy link
Member

@payonel payonel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some suggestions for refactoring a bit. Afterwards I would be glad to test this.

setCustomTimeout = function(timeout)
if timeout ~= nil then
checkArg(1, timeout, "number")
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider changing to checkArg(1, timeout, "number", "nil")

@@ -12,6 +12,8 @@ local function checkDeadline()
end
end

local customTimeout = nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this up with the other machine variables at line 4

/** The computer is in the Secure state, and is currently synchronous. */
val SecureSync = Value("SecureSync")
}

/** Signals are messages sent to the Lua state from Java asynchronously. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some opinions about the naming of values here and the joining of two orthagonal ideas. I would prefer two separate enums, Insecure/Secure and Async/Sync
Also, I would prefer Untrusted/Trusted in place of Insecure/Secure
I would like you to genuinely consider this separation of enums as it would simplfy the code. Give some consideration for the name change as I feel it makes it more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this would cause quite a few changes to the way you've already written the code, but I feel it would make the correctness of the state changes more obvious.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good feedback. I will execute changes along these lines tomorrow.

In particular, Trusted/Untrusted is way better terminology for communicating what's actually happening than Secure/Insecure.

@SolraBizna
Copy link
Author

I made the requested changes. I also went ahead and added trustedSecret, since without it it's difficult to make a "secure bootloader" that's actually secure. (It was going to be part of the original PR, but I honestly forgot about it...)

@asiekierka
Copy link
Contributor

I feel a better approach would be to sync the worker thread to the server ticks (even delaying the end of the server tick until all computer computation has been done, if necessary) instead of removing the multithreaded-ness - mainly for performance reasons, there's already far too much happening in the server thread.

Also, Secure Boot? Please.

@SoniEx2
Copy link
Contributor

SoniEx2 commented Dec 9, 2016

Oh come on just use the time and yield-resume loops (good ol' queueEvent-pullEvent-checkTime-doStuff).

@SolraBizna
Copy link
Author

I feel a better approach would be to sync the worker thread to the server ticks (even delaying the end of the server tick until all computer computation has been done, if necessary) instead of removing the multithreaded-ness - mainly for performance reasons

I'm not sure what you mean by "sync the worker thread to the server tick". With the current world model, there's no way to thread a TileEntity update while remaining perfectly synchronized with the rest of the world. At the very least, it introduces inconsistencies in timing between the computer's TileEntity ticking and the computer making decisions that affect the world, which is exactly what synchronous mode is designed to prevent. If your application can handle that degree of inconsistency, then it can almost certainly handle just running asynchronously.

To reiterate, this PR does not make all computers synchronous, and the computers it does make synchronous are not necessarily synchronous all the time. And every synchronous application I ever planned to write will have an "inner loop" that takes up less CPU time than your average IC2 crop block.

Also, Secure Boot? Please.

Your intended meaning is not clear.

@asiekierka
Copy link
Contributor

asiekierka commented Dec 10, 2016 via email

@SolraBizna
Copy link
Author

No, but you can move just the operations requiring thread safety to the server thread and get the kind of timing you want without negatively impacting performance.

I don't see how. I think I'm not understanding what you're suggesting.

You maybe, but can you say that for every user of the mod?

I most emphatically cannot. But the existence of fools does not mean we stop making dangerous tools.

In any case, it's not the users of the mod who are the real threat, it's the server admins. In order for any user to cause real problems with this PR, they have to be enabled to do so by the server admin. Providing this feature on a server is much like providing MystCraft on a server, except that the default configuration is safe, and it's much harder to reach a dangerous configuration by accident.

In normal use on servers, the admin would have a very small set of "safe EEPROMs" that are whitelisted, and be very cautious about adding any new ones to that list. (That's ignoring the majority of servers, which would simply never turn this feature on.)

In a survival game, the idea of unhackable computers, inventories, machines, etc. terrifies me.

Rightly so. But, consider:

  • How to hack a computer in current OC: Take the EEPROM out of the computer and put a malicious one in, optionally spoofing eeprom.get() to avoid detection. Computer hacked.
  • How to hack a post-"Secure Boot" computer: Take the EEPROM out of the computer and put in a malicious one in that spoofs computer.isSecure(), computer.setSyncState(), and optionally eeprom.get(). Computer hacked. (This is only necessary if a misguided admin went far out of their way to "protect" this computer; otherwise, the pre-"Secure Boot" solution still works just as well.)

"Secure Boot" isn't designed to protect computers, it's designed to protect worlds.

@SolraBizna
Copy link
Author

No, but you can move just the operations requiring thread safety to the server thread and get the kind of timing you want without negatively impacting performance.

I don't see how. I think I'm not understanding what you're suggesting.

If I understand you correctly, I think you're suggesting having the "computer thread" running in its own Java thread, which receives signals and writes callbacks once a tick, on the tick. Is this correct?

@SolraBizna
Copy link
Author

Upon understanding what you were suggesting, I recognized it as a better solution. It's a bit of extra overhead for very short inner loops, but it quickly becomes a big relief as inner loops grow larger. Instead of having to have a short inner loop, the synchronized computer now only need have an inner loop less than 50ms long to avoid causing world lag. Much more achievable. Unfortunately, in exchange, it causes a small amount of temporal decoherence.

Taking the current tick as T, the timing guarantees are now:

  • Pulling signals will pull every signal that was received before T, and possibly some between T and T+1.
  • Performing a synchronized call will take effect at exactly T+1.
  • Performing a direct call will take effect somewhere between T and T+1, exclusive.

I'm not positive that first one is good enough. I think there ought to be a stricter guarantee on signal receival. (Maybe synchronous computers can lock their signal queue until they yield?) The third one might also cause problems (on redstone inputs, for instance), but in practice such problems can generally be avoided by careful programming IFF a strict signal receival cutoff is added.

(Fiddly temporal mechanics note: tick T is actually a sequence point in the world's overall processing. When a computer experiences tick T, some of the world will already have experienced tick T, and some will still be waiting. This is okay in practice, as long as it's consistent.)

@SolraBizna
Copy link
Author

With considerable effort, the "trusted secret" does enable "unhackable" computers. However, so do tier 2 data cards; unlike those, "trusted secret" requires admin intervention, and since it doesn't (itself) provide a good encryption scheme, it actually involves more work than the data card method.

@xarses xarses mentioned this pull request Feb 10, 2017
@bonnedav
Copy link

Any update on this?

@SolraBizna
Copy link
Author

I wasn't able to get it working reliably with the new approach, and unfortunately I don't have any more spare time to spend trying. (Meanwhile, OpenComputers has continued evolving and the original changes don't work anymore.)

@payonel
Copy link
Member

payonel commented Jun 16, 2017

Thanks for the digging into this and contributing @SolraBizna! This was an interesting area to explore

@payonel payonel closed this Jun 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants