OpenACS / openacs-4 / packages / acs-tcl / tcl

cluster-procs.tcl

Tree
Options

oacs-5-10:gustafn:20221229130248

added multiple delivery methods to intra-server talk

Here is some background information for my experiments with the delivery methods.
For this experiment, I compared 5 different means for this kind of
communications

- ns_http over HTTP (the standard setup, which is used in OpenACS 5.10)
- ns_http over HTTPS
- ns_conn over HTTP using persistent connections
- ns_conn over HTTPS using persistent connections
- ns_udp using UDP

I tested the is in 2-node cluster to make measurements simple consisting
of the canonical server and one node listening on the following protocols/ports:

- http://127.0.0.1:8101
- https://127.0.0.1:8444
- udp://127.0.0.1:8101

The first test sends per call 1000 intra-server commands from the canonical server
to the 2nd node over the various delivery methods:

set times 1000
lappend _ ns_http-[time {::acs::CS_127.0.0.1_8101 message set x ns_http} $times]
lappend _ ns_https-[time {::acs::CS_127.0.0.1_8444 message set x ns_https} $times]
lappend _ ns_connchan-http-[time {::acs::CS_127.0.0.1_8101 message -delivery connchan set x ns_http} $times]
lappend _ ns_connchan-https-[time {::acs::CS_127.0.0.1_8444 message -delivery connchan set x ns_https} $times]
lappend _ ns_udp-[time {::acs::CS_127.0.0.1_8101 message -delivery udp set x udp} $times]
join $_ \n

This leads to the following results:

ns_http 564.027083 microseconds per iteration
ns_https 1483.478916 microseconds per iteration
ns_connchan-http 147.688541 microseconds per iteration
ns_connchan-https 68.480875 microseconds per iteration
ns_udp 198.343416 microseconds per iteration

Since the commands are sent in sequence, the variant with the
persistent HTTP connection is the fastest, although this is Tcl
implemented. The slowest is the version with HTTPS via ns_http without
persistent connections. We see a factor of 20 in terms of performance.
When using ns_udp with the "-noreply" option, we have would have
a "fire and forget" solution, which might be ok when the packet loss
rate is low. That would lead to 54 microseconds.

Clearly, the numbers for persistent connections look the best, but it has
as well some disadvantages compared to other solutions:
- the server has to keep a socket open to every node (but no
connection thread)
- the keepalive setting of the server must set sufficiently long to
gain advantage of persistent connections (e.g. 5 sec keepalive,
heart beat frequency of 1s)
- Since the whole communication goes over a single connection, it is
necessary to serialize the requests to avoid that multiple
connection threads write concurrently to the same connection and
interfere with each other
- It is probably necessary to have a separate thread handling the
outgoing intra-server talk (implementing cmd queuing,
async-handling, heart-beat, etc.). Since this has to be a Tcl-thread
it will use up some memory (similar to a connection thread).
- This intra-server talk thread requires queuing and event handling we
have so far just in xotcl-core, so when implemented, it will require
the xotcl-core package (maybe this can be put later to acs-core).

As a second experiment, I've implemented a simple heart-beat service
inside the request monitor that checks the liveliness of the nodes
every second. So, in contrary to the back to back commands of the
first experiment, these are single calls. Here are some random
values for the 5 delivery methods:

[27/Dec/2022:20:29:34.171376][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_http sent total 2.907ms
[27/Dec/2022:20:29:34.182241][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x ns_https sent total 10.798ms
[27/Dec/2022:20:29:34.183475][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_connchan sent total 1.161m
[27/Dec/2022:20:29:34.183657][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x https-connchan sent total 0.086ms
[27/Dec/2022:20:29:34.188564][::throttle] Notice: -cluster: udp://127.0.0.1:8101 set x udp sent total 4.861ms

[27/Dec/2022:20:30:25.494080][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_http sent total 2.049ms
[27/Dec/2022:20:30:25.516306][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x ns_https sent total 21.903ms
[27/Dec/2022:20:30:25.517239][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_connchan sent total 0.814ms
[27/Dec/2022:20:30:25.522957][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x https-connchan sent total 0.33ms
[27/Dec/2022:20:30:25.534274][::throttle] Notice: -cluster: udp://127.0.0.1:8101 set x udp sent total 11.099ms

[27/Dec/2022:20:31:54.993455][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_http sent total 2.431ms
[27/Dec/2022:20:31:55.003036][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x ns_https sent total 9.499ms
[27/Dec/2022:20:31:55.010100][::throttle] Notice: -cluster: http://127.0.0.1:8101 set x ns_connchan sent total 6.981ms
[27/Dec/2022:20:31:55.010585][::throttle] Notice: -cluster: https://127.0.0.1:8444 set x https-connchan sent total 0.322ms
[27/Dec/2022:20:31:55.017764][::throttle] Notice: -cluster: udp://127.0.0.1:8101 set x udp sent total 7.13ms

We see in essence the same pattern. The approach with the persistent
connections looks here the best as well. It is not clear to me, why
HTTPS over connchan is the best, but the communication seems ok. Maybe
some buffering/nagle algorithm is responsible for this. We see as well
that the round-trip takes typically single to double-digit
milliseconds. So when a single HTTP request to nsd triggers multiple
cache-flush operations to multiple nodes, this will take some
time. When e.g., the request issues 5 cash-flush operations, which are
sent to 5 nodes, and every request with take 1ms, the cache flushing
will make the original request about 25ms slower. This might also be
an argument for a separate thread doing these operations
asynchronously.

29 Dec 22 gustafn oacs-5-10:gustafn:20221229130248

Loading ...