TCP Socket Implementation On Golang

Golang is surely my first to go language to write web application, it hides many details but still gives a lot of flexibility. Recently I did strace to an http service app, actually did it without a reason but I found something interesting.

Here’s what strace -c gave me :

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 91.24    0.397615         336      1185        29 futex
  4.13    0.018009           3      7115           clock_gettime
  2.92    0.012735          19       654           epoll_wait
  1.31    0.005701           6       911           write
  0.20    0.000878           3       335           epoll_ctl
  0.12    0.000525           1       915       457 read
  0.02    0.000106           2        59           select
  0.01    0.000059           0       170           close
  0.01    0.000053           0       791           setsockopt
  0.01    0.000035           0       158           getpeername
  0.01    0.000034           0       170           socket
  0.01    0.000029           0       160           getsockname
  0.01    0.000026           0       159           getsockopt
  0.00    0.000000           0         7           sched_yield
  0.00    0.000000           0       166       166 connect
  0.00    0.000000           0         3         1 accept4
------ ----------- ----------- --------- --------- ----------------
100.00    0.435805                 12958       653 total

There are a lot of interesting things in that profiling result, but the highlight of this article are the error number of read and the number of futex call

At the first sight I didn’t think about futex call, which most of them are a wake call. I thought that this app serve hundreds req per second, it should has many go routine; At the other hand it also leverage channel, it should has many under the hood blocking mechanism. So a lot of futex call should be normal, later I found out that this number was also from other things -will back to it later-.

Why you no read

Who’s like error? Worst when it happens hundreds time in less than a minute, that’s what I thought when looking to that profiling result. What the heck with that read call?

read(36, "GET /xxx/v3?q=xx%20ch&d"..., 4096) = 520
...
read(36, 0xc422aa4291, 1)               = -1 EAGAIN (Resource temporarily unavailable)

After every read call there is (maybe) always one read to the same file descriptor with EAGAIN error. Hey I remember this error, it comes when the file descriptor is not ready to certain operation; in this case read . But Why golang do this way?

I thought there was bug from the epoll_wait which gave wrong ready event to every fd? every fd? It looks like double ready rather than wrong event, why double?

Honestly my experience with epoll is shallow, basic event loop socket handler [0] is the only one. No multi threading, no synchronization, it’s very plain. After googling I found Marek (clever guy) criticizing epoll [1], and it’s like boom!!!

Important TL;DR from that post : unnecessary wake up is ‘generally’ unavoidable when using epoll with multi threaded worker, because weneed to notify each worker about waiting ready events.

Hey that also explain about our futex wake numbers. But lets take a look at simplified version on how to use epoll in event loop socket implementation to further understand it :

  1. Bind socket listener to file descriptor, lets call it s_fd
  2. Create epoll file descriptor using epoll_create, lets call it e_fd
  3. Bind s_fd to e_fd using epol_ctl for particular events (usually bothEPOLLIN|EPOLLOUT )
  4. Create forever loop (event loop) which call epoll_wait in every iteration to get incoming ready connections fd from binded events
  5. Handle the ready connection, in multi worker implementation means notifying each worker

Using strace I found that golang using edge triggered epoll :

epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2490298448, u64=140490870550608}}) = 0

Which means this is also valid on golang socket implementation :

  1. Kernel: Receives a new connection.
  2. Kernel: Notifies two waiting threads A and B. Due to “thundering herd” behavior with level-triggered notifications kernel must wake up both.
  3. Thread A: Finishes epoll_wait().
  4. Thread B: Finishes epoll_wait().
  5. Thread A: Performs accept(), this succeeds.
  6. Thread B: Performs accept(), this fails with EAGAIN.

At this point I am 80% confident this was the case, but let’s do some analysis using this simple web app [2]. Here is what it does on single request.

epoll_wait(4, [{EPOLLIN|EPOLLOUT, {u32=2186919600, u64=140542106779312}}], 128, -1) = 1
futex(0x7c1bd8, FUTEX_WAKE, 1)          = 1
futex(0x7c1b10, FUTEX_WAKE, 1)          = 1
read(5, "GET / HTTP/1.1\r\nHost: localhost:"..., 4096) = 348
futex(0xc420060110, FUTEX_WAKE, 1)      = 1
write(5, "HTTP/1.1 200 OK\r\nDate: Sat, 03 J"..., 116) = 116
futex(0xc420060110, FUTEX_WAKE, 1)      = 1
read(5, 0xc4200f6000, 4096)             = -1 EAGAIN (Resource temporarily unavailable)

Look how after epoll_wait there are two futex wake call which I think is the worker and then there are error read.

How if I set GOMAXPROCS to 1, enforcing single worker behavior.

epoll_wait(4,[{EPOLLIN, {u32=1969377136, u64=140245536493424}}], 128, -1) = 1
futex(0x7c1bd8, FUTEX_WAKE, 1)          = 1
accept4(3, {sa_family=AF_INET6, sin6_port=htons(54400), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1969376752, u64=140245536493040}}) = 0
getsockname(6, {sa_family=AF_INET6, sin6_port=htons(8080), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(6, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_KEEPINTVL, [180], 4) = 0
setsockopt(6, SOL_TCP, TCP_KEEPIDLE, [180], 4) = 0
accept4(3, 0xc42004db78, 0xc42004db6c, SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(6, "GET /test?kjhkjhkjh HTTP/1.1\r\nHo"..., 4096) = 92
write(6, "HTTP/1.1 200 OK\r\nDate: Sat, 03 J"..., 139) = 139
read(6, "", 4096)                       = 0

When using 1 worker, there is only once futex wake call after epoll_wait and no read error. However I found this behavior is not consistent, in some trial I still got read error and 2 futex wake call.

And then what to do?

In the same post [1] Marek talk about EPOLLEXCLUSIVE which available starting from linux version 4.5. I’m using version 4.8 why it’s still happening on my maching? Turns out that golang is not using that flag yet, I hope at the next next version it will support that flag when available.

I learned a lot from this journey, hope you too.

[0] https://banu.com/blog/2/how-to-use-epoll-a-complete-example-in-c/
[1] https://idea.popcount.org/2017-02-20-epoll-is-fundamentally-broken-12/
[2] https://gist.github.com/wejick/2cef1f8799361318a62a59f6801eade8

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s