Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sock_accept() to snapshot1 #458

Merged
merged 2 commits into from
Jan 18, 2022
Merged

Add sock_accept() to snapshot1 #458

merged 2 commits into from
Jan 18, 2022

Conversation

npmccallum
Copy link
Contributor

@npmccallum npmccallum commented Dec 14, 2021

Update

Today (2022-01-13) we discussed this PR in the WASI subgroup call. There was general agreement that updating snapshot1 with an accept() call was a valuable endeavor. I have updated the PR as I believe it is now ready for merge.

Although this proposal previously suggested the publication of a snapshot2, the consensus on the call was that implicitly updating snapshot1 is the preferred route.

This proposal needs to be adopted and implemented in the libc implementation before transitioning to streams so that consumers can lock to an older version (per @sunfishcode).

Overview

WASI snapshot1 is very broadly implemented. It is also fairly usable for a number of cases. However, for networking its use is quite limited. This is particularly because there are a few omissions that create real blockers in downstream users of this API. For this reason while there has been extensive demand to support WASI in the Rust ecosystem, real penetration of support in the crate ecosystem remains elusive for major networking libraries.

This PR sets out to add accept(), which makes it possible to accept distinct connections inside of wasm.

Supporting Accept

All network APIs need to support calling accept() on pre-opened listening sockets. For example, this is widely implemented in systemd and other init systems.

For example, in Rust you can create a TcpListener from a RawFd on Unix platforms and a parallel exists for Windows. This doesn't work on WASI, but only because we are missing the accept() call. Enabling the creation of a TcpListener in Rust from a pre-opened listening socket would substantially increase the usability of WASM and would incentivize the adoption by frameworks like tokio.

Proposal

While I'm aware of the great strides being taken to modularize WASI, I am essentially proposing an incremental snapshot2 aimed at improving the existing toolchain support.

CC

@tschneidereit @lukewagner @sunfishcode @wgwoods @MikeCamel

@MikeCamel
Copy link

A few examples of possible uses for this functionality.

Simple echo server

Client connects to listening WASI process over TCP, process accepts connection, client sends data, process echoes data back to client.

HTTP(S) server

Similar to echo server, but with HTTP(S) support.

Database server

Database implemented in WASI, accepts connections from clients, accepts queries (reads and writes), updates database and/or serves data to client.

Remote logging server

Remote logging protocol implemented in WASI, accepts connections from logging agents, adds to log, writing periodically.

@programmerjake
Copy link
Contributor

If posix_spawn was also implemented, this should be enough to run cargo in wasi:
rust-lang/cargo#9287
#414

@npmccallum
Copy link
Contributor Author

@programmerjake While I'm not opposed to that, it isn't exactly low-hanging fruit. I would also like to avoid scope creep on this discussion.

@sunfishcode
Copy link
Member

Thanks for posting this! One of the big questions is whether it makes sense to do this kind of "snapshot2" at this time, using the old witx-based tooling and shapshot scheme. We have a lot of experience and feedback on the limitations of the old witx-based tooling and proposal repo format, and the underlying interface types design has evolved significantly from when witx was designed, so it's worth asking whether it makes sense to take time out of the efforts to work towards better tools.

The key observation that I see here is that accept is a very simple function that doesn't open up the kind of surface area that bind etc. have, while still being sufficient to support meaningful systemd-style use cases. That puts it in a kind of sweet spot, of relatively low complexity, while still having significant usefulness, and makes it worth considering here.

The pipe() call is broadly useful for composability of file descriptor oriented APIs. One important use of pipe() that currently blocks mio implementation on wasm32-wasi is that there is no way to wake up poll_oneoff(). Although this threaded use might be less common, the blocking factor for mio is that it is not possible to implement the mio API (regardless of how threading is used).

Could you be more specific about the motivation for pipe here? I'm aware of it's usefulness in general, but because of the nature the old witx-based tooling, it's important for us to have explicit motivations every time we propose to spend more effort on it.

Also, Re: accept returning an address, and the addition of getpeername. This is an interesting question: Should we expose IP addresses and Unix-domain socket paths to WASI? Without bind or similar things, the only thing WASI programs could do with these addresses would be to log them. Would it make sense to have code outside the wasm sandbox do that logging? That would avoid exposing network addresses to WASI programs, since they can carry some amount of identifying information. WASI libc could still implement the POSIX accept API by having it fill in addresses from one of the reserved ranges.

@npmccallum
Copy link
Contributor Author

Thanks for posting this! One of the big questions is whether it makes sense to do this kind of "snapshot2" at this time, using the old witx-based tooling and shapshot scheme. We have a lot of experience and feedback on the limitations of the old witx-based tooling and proposal repo format, and the underlying interface types design has evolved significantly from when witx was designed, so it's worth asking whether it makes sense to take time out of the efforts to work towards better tools.

I'm all for working on the better tools. My concern is that snapshot1 has basically been frozen for over a year with no development. The new tools aren't ready yet. I get that. But it would seem to me that the best thing for consumers of WASI is to have limited, intentional evolution of the "old" system until the "new" system becomes usable in many/most of the applications of the old system. OTOH, shutting down the "old" system with out something to replace it is pretty much the precise recipe for forks (That's not a threat! Promise!). What I'm hoping for is just enough evolution to keep people engaged with WASI until it is time to rebase on the "new" system.

The key observation that I see here is that accept is a very simple function that doesn't open up the kind of surface area that bind etc. have, while still being sufficient to support meaningful systemd-style use cases. That puts it in a kind of sweet spot, of relatively low complexity, while still having significant usefulness, and makes it worth considering here.

👍

The pipe() call is broadly useful for composability of file descriptor oriented APIs. One important use of pipe() that currently blocks mio implementation on wasm32-wasi is that there is no way to wake up poll_oneoff(). Although this threaded use might be less common, the blocking factor for mio is that it is not possible to implement the mio API (regardless of how threading is used).

Could you be more specific about the motivation for pipe here? I'm aware of it's usefulness in general, but because of the nature the old witx-based tooling, it's important for us to have explicit motivations every time we propose to spend more effort on it.

👍 I'm on board with this. To be clear accept() is a lot more valuable, I think.

Pipe is used to manually interact with main loop control functions. select(), poll(), epoll() and kqueue() are the big players here. WASI implements poll_oneoff() but offers no way to manually control the loop from another thread. Many main loop abstraction libraries expect for pipe() or one of its equivalents to be available. In C, major libraries like libev, libevent, tevent, etc require this. In Rust, mio requires it. The entirety of the Rust async stack is built on mio.

There are several alternatives. For example, Linux offers socketpair(). However, only pipe() is universal enough.

Also, Re: accept returning an address, and the addition of getpeername. This is an interesting question: Should we expose IP addresses and Unix-domain socket paths to WASI? Without bind or similar things, the only thing WASI programs could do with these addresses would be to log them. Would it make sense to have code outside the wasm sandbox do that logging? That would avoid exposing network addresses to WASI programs, since they can carry some amount of identifying information. WASI libc could still implement the POSIX accept API by having it fill in addresses from one of the reserved ranges.

That was my initial thought too. It isn't a bad idea. We could always do getpeername() later. However, for workflows where you do need the address after a new connection arrives (which is apparently quite common) it means that your one context switch now becomes two. This isn't great for DoS prevention.

@npmccallum
Copy link
Contributor Author

I'm going to remove the request for pipe(). I'm hoping to solve the problems with io libraries in other ways.

I think the major question that remain are:

  1. Do we want to expose IP addresses to WASI?
  2. Do we want to have a snapshot2 with accept()?

I will be changing the documents in future commits. This ensures they
are up to date before making any changes.

Signed-off-by: Nathaniel McCallum <nathaniel@profian.com>
This function allows a server to accept incoming connections. A few
notes are in order.

Although there are legitimate questions about the capabilities model in
numerous networking APIs, this particular API is both immensely useful
and not a violation of the capabilities model. There is still no way to
create an arbitrary socket. The only thing that WASM can do is accept
incoming connections on a predefined listening socket.

Signed-off-by: Nathaniel McCallum <nathaniel@profian.com>
@npmccallum
Copy link
Contributor Author

Per the discussion today, I have updated the pull request to remove the IP addressing functionality of accept. I believe this is now ready for merge. I will provide a paragraph of documentation containing recommended details for implementors in a subsequent comment. I am now marking this PR as ready for review.

@npmccallum npmccallum marked this pull request as ready for review January 13, 2022 21:02
@npmccallum
Copy link
Contributor Author

Recommendations for Implementors

The WASI sock_accept() call differs from the Berkeley sockets accept() call in that it does not return the address of the newly connected peer. Implementors, such as WASI's libc, will have to decide how to make this compatible with accept().

Return EINVAL

One possible approach is to return EINVAL if the addr or addrlen output variables are present. However, this approach is likely to significantly reduce compatibility. Its use is not advised.

Return an Invalid or Reserved Address and Port

A better approach is to return a reserved address and port. Under IPv4, the address 255.255.255.255 is invalid in this context. Likewise, under IPv6 the address :: is reserved. For the port, 0 is reserved for TCP.

Drawbacks

Although the compatibility of this approach is high, it could have some drawbacks.

  • Naive implementations might try to accept these addresses as valid. However, this behavior is already undefined and should not be relied upon in any existing code.

  • Conversely, a strict consumer of this API might try to defend against such invalid address. Such code would need to updated for proper execution under WASI. These cases are likely rare.

@npmccallum
Copy link
Contributor Author

@sunfishcode Can you please review the above text? Feel free to tag any other reviewers you deem relevant.

@npmccallum npmccallum changed the title A Discussion PR for low-hanging network API fruit on snapshot1 Add sock_accept() to snapshot1 Jan 13, 2022
Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me!

;;; The listening socket.
(param $fd $fd)
;;; The desired values of the file descriptor flags.
(param $flags $fdflags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of accept a fdflags value here to have an equivalent to the SOCK_NONBLOCK flag? If so, it'd be good to mention that flag in the comment above, and document that accept doesn't recognize any of the other $fdflags flags.

@sunfishcode
Copy link
Member

One possible approach is to return EINVAL if the addr or addrlen output variables are present. However, this approach is likely to significantly reduce compatibility. Its use is not advised.

I agree, and I think it'd be simpler if we just don't mention the EINVAL option here.

Return an Invalid or Reserved Address and Port

A better approach is to return a reserved address and port. Under IPv4, the address 255.255.255.255 is invalid in this context. Likewise, under IPv6 the address :: is reserved. For the port, 0 is reserved for TCP.

255.255.255.255 is the IPv4 broadcast address, which could potentially be confusing. :: is the IPv6 wildcard address, which could also potentially be confusing. Also libc won't know if the connection is IPv4 or IPv6, so we'll need to just pick one for it to always use.

What would you think about using fec0::1 here? That's a normal address, but it's in the IPv6 unique local address range, so it will appear to the WASI program that it's talking to a host which won't be an IP address it will recognize from the public internet.

@cardil
Copy link

cardil commented Jan 14, 2022

I think, setting one specific value is pointless, as the returned value would be useless for the end user. In regular accept(), the addr parameter can be NULL if the caller is not interested in the client address. So, if we don't like to support those addresses, let's just do nothing or fail if user gives something different from NULL.

But, I think the address can be easily computed, and returned without exposing any real information to wasm module. We can describe a deterministic one-way function that takes a real client address and translates it to some CIDR like 10.0.0.0/8. The same could be done for port number. Example:

203.112.131.21:31231 -> 10.117.104.98:45911

Such a result could be useful to wasm module end-user, at the same time it doesn't expose real information. Limiting the output CIDR or port range, could be beneficial, as it prevents malicious users trying to reverse the one-way function to get a real address.

@PiotrSikora
Copy link
Contributor

@cardil I'm strongly against returning a valid-but-incorrect IP address to the application.

  1. All possible values in 32-bit space can be easily computed for a deterministic one-way hash, making it a two-way hash.
  2. If the application uses this valid-but-incorrect IP address, then this information won't match the real IP address in other pieces of complex/distributed systems, unless they are all WASI-based.
  3. If the application logs this valid-but-incorrect IP address, then this is pretty close to falsyfing information, and may point blame at the real people behind the valid-but-incorrect IP address.

@npmccallum
Copy link
Contributor Author

@sunfishcode If we are going to recommend the synthesizing of an IP address in libc, we need to provide suggestions for both IPv4 and IPv6 since the caller might supply sockaddr_in which has insufficient size to write an IPv6 address.

@npmccallum
Copy link
Contributor Author

I think, setting one specific value is pointless, as the returned value would be useless for the end user. In regular accept(), the addr parameter can be NULL if the caller is not interested in the client address. So, if we don't like to support those addresses, let's just do nothing or fail if user gives something different from NULL.

This only works for libc. For the Rust standard library, the API requires a SocketAddr value.

But, I think the address can be easily computed, and returned without exposing any real information to wasm module. We can describe a deterministic one-way function that takes a real client address and translates it to some CIDR like 10.0.0.0/8. The same could be done for port number. Example:

203.112.131.21:31231 -> 10.117.104.98:45911

Such a result could be useful to wasm module end-user, at the same time it doesn't expose real information. Limiting the output CIDR or port range, could be beneficial, as it prevents malicious users trying to reverse the one-way function to get a real address.

The problem here is that libc is calling WASI's sock_accept() which provides no address information. Therefore there is no input to this function. The only option is to hard-code a value.

@npmccallum
Copy link
Contributor Author

@cardil I'm strongly against returning a valid-but-incorrect IP address to the application.

This is why I proposed 255.255.255.255 and ::. Both might look odd in a log, but they are unambiguously invalid in this context and so they cannot be confused with other legitimately valid addresses. However, the idea of using a link-local, non-routable address such as @sunfishcode suggested is not a bad idea. This only works for IPv6, however, and therefore would produce compatibility issues if someone called accept(&sockaddr_in). So we at least need an IPv4 fallback.

@cardil
Copy link

cardil commented Jan 14, 2022

At @PiotrSikora:

  1. All possible values in 32-bit space can be easily computed for a deterministic one-way hash, making it a two-way hash.

This is why I propose to use limited CIDR. For example, using CIDR 10.117.0.0/16 gives an address space of 65k. That means there are 65k different real addresses which could result in a specific crafted one - utilizing a hash collision. That makes it impossible to effectively reverse this function.

  1. If the application uses this valid-but-incorrect IP address, then this information won't match the real IP address in other pieces of complex/distributed systems, unless they are all WASI-based.

Yes. It is simple address translation. Host knows both addresses, real and crafted. Wasm module knows only the crafted one. This should be enough from monitoring purposes.

  1. If the application logs this valid-but-incorrect IP address, then this is pretty close to falsyfing information, and may point blame at the real people behind the valid-but-incorrect IP address.

I don't think so. Docker/Podman/Kubernetes also uses crafted addresses, and hostnames. They are valid within the system, giving users an abstraction layer, but they are essentially fake. They do not work outside the system.

Here similarly, the crafted addresses could be valid within system - across different WASM modules, but be invalid outside, where specific translation is required.

This is just a NAT. A private, separated network. And the router is doing the translation. In our case, the WASM host can act as network gateway.

At @npmccallum:

he problem here is that libc is calling WASI's sock_accept() which provides no address information. Therefore there is no input to this function. The only option is to hard-code a value.

That's a bummer. So, at least for this implementation, we need a static value, then. The thing I'm proposing could be possible with full implementation, right?

In general

I'm not strongly after this addresses translation, at least in scope of this PR.

But, I just have got an idea, that this could help create sandbox abstraction for networking in general.

What if we allow the host to define CIDR mount-points?! With it, host could say that, for ex.: real addresses 192.168.225.8/29 be "mounted" in WASM module as 10.225.0.32/29. Then, WASM module could safely interact with those mounted addresses, and host translate the communication (just like NAT). They have been mounted by host, similarly like host pre-opens files for WASM, so it creates a sandbox environment.

@sbc100
Copy link
Member

sbc100 commented Jan 14, 2022

Given that the proposal sock_accept does not return an address, why don't we focus on just adding this new syscall for now. Then we can consider any recommendations for higher level libraries in rust and libc separately/later.

Its not clear to me we even need to make such recommendations, given that libc/library implementations can choose whatever solution they want, regardless.

@npmccallum
Copy link
Contributor Author

@sunfishcode Do you agree with the proposal from @sbc100 ? I think we have general consensus that:

  1. we want this
  2. we don't want IP addresses in the API
  3. we can add recommendations later

@sunfishcode
Copy link
Member

@npmccallum Yes, that sounds right to me.

@devsnek
Copy link
Member

devsnek commented Jan 18, 2022

can we call it socket_accept 👀

@sunfishcode
Copy link
Member

Merging, as discussed in the meeting, and with overall consensus here.

I'm not strongly after this addresses translation, at least in scope of this PR.

This is an interesting space to be thinking about. There are several folks talking about networking proposals, and as these progress, it will be useful to bring these ideas up. And if future proposals need a new accept call which can return an address, the existence of this sock_accept here won't preclude adding a new one.

can we call it socket_accept eyes

@devsnek Since this is just a special one-time extension of snapshot_preview1, and not WASI's last work on the topic of sockets, we don't need to worry too much about the aesthetics of the name here :-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants