Browse Source

bitbake: server/process: Handle error in heartbeat funciton in OOM case

We've seen cases where an OOM error causes bitbake server to hang:

9171 02:21:09.127810 Command Completed
Traceback (most recent call last):
  File "/home/pokybuild/yocto-worker/qemux86/build/bitbake/bin/bitbake-server", line 51, in <module>
    bb.server.process.execServer(lockfd, readypipeinfd, lockname, sockname, timeout, xmlrpcinterface)
  File "/home/pokybuild/yocto-worker/qemux86/build/bitbake/lib/bb/server/process.py", line 550, in execServer
    server.run()
  File "/home/pokybuild/yocto-worker/qemux86/build/bitbake/lib/bb/server/process.py", line 108, in run
    ret = self.main()
  File "/home/pokybuild/yocto-worker/qemux86/build/bitbake/lib/bb/server/process.py", line 242, in main
    ready = self.idle_commands(.1, fds)
  File "/home/pokybuild/yocto-worker/qemux86/build/bitbake/lib/bb/server/process.py", line 370, in idle_commands
    bb.event.fire(heartbeat, self.cooker.data)
  File "/home/pokybuild/yocto-worker/qemux86/build/bitbake/lib/bb/event.py", line 216, in fire
    fire_class_handlers(event, d)
  File "/home/pokybuild/yocto-worker/qemux86/build/bitbake/lib/bb/event.py", line 123, in fire_class_handlers
    execute_handler(name, handler, event, d)
  File "/home/pokybuild/yocto-worker/qemux86/build/bitbake/lib/bb/event.py", line 93, in execute_handler
    ret = handler(event)
  File "/home/pokybuild/yocto-worker/qemux86/build/meta/classes/buildstats.bbclass", line 182, in defaultrun_buildstats
    write_host_data(os.path.join(bsdir, "host_stats"), e, d, "interval")
  File "/home/pokybuild/yocto-worker/qemux86/build/meta/classes/buildstats.bbclass", line 160, in write_host_data
    output = subprocess.check_output(c.split(), stderr=subprocess.STDOUT, timeout=limit).decode('utf-8')
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

We need to wrap the calls in the same high level wrapper as idle function calls
and trigger an exit upon an unhandled exception.

(Bitbake rev: 74042b5b89d5a170013fc1a327ce3a6530fbf7d5)

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Richard Purdie 4 years ago
parent
commit
2f12a20935
1 changed files with 6 additions and 1 deletions
  1. 6 1
      bitbake/lib/bb/server/process.py

+ 6 - 1
bitbake/lib/bb/server/process.py

@@ -367,7 +367,12 @@ class ProcessServer():
                 self.next_heartbeat = now + self.heartbeat_seconds
             if hasattr(self.cooker, "data"):
                 heartbeat = bb.event.HeartbeatEvent(now)
-                bb.event.fire(heartbeat, self.cooker.data)
+                try:
+                    bb.event.fire(heartbeat, self.cooker.data)
+                except Exception as exc:
+                    if not isinstance(exc, bb.BBHandledException):
+                        logger.exception('Running heartbeat function')
+                    self.quit = True
         if nextsleep and now + nextsleep > self.next_heartbeat:
             # Shorten timeout so that we we wake up in time for
             # the heartbeat.