bitbake: event/utils: Avoid deadlock from lock_timeout() and recursive events
We've been seeing intermittent failures on Ubuntu 22.04 in oe-selftest which
were problematic to debug. The failure was inside lock_timeout and once that was
identified and the backtrace obtained, the problem becomes clearer:
File "X/bitbake/lib/bb/server/process.py", line 466, in idle_thread_internal
retval = function(self, data, False)
File "X/bitbake/lib/bb/command.py", line 123, in runAsyncCommand
self.cooker.updateCache()
File "X/bitbake/lib/bb/cooker.py", line 1629, in updateCache
self.parser = CookerParser(self, mcfilelist, total_masked)
File "X/bitbake/lib/bb/cooker.py", line 2141, in __init__
self.bb_caches = bb.cache.MulticonfigCache(self.cfgbuilder, self.cfghash, cooker.caches_array)
File "X/bitbake/lib/bb/cache.py", line 772, in __init__
loaded += c.prepare_cache(progress)
File "X/bitbake/lib/bb/cache.py", line 435, in prepare_cache
loaded = self.load_cachefile(progress)
File "X/bitbake/lib/bb/cache.py", line 516, in load_cachefile
progress(cachefile.tell() + previous_progress)
File "X/bitbake/lib/bb/cache.py", line 751, in progress
bb.event.fire(bb.event.CacheLoadProgress(current_progress, cachesize),
File "X/bitbake/lib/bb/event.py", line 234, in fire
fire_ui_handlers(event, d)
File "X/bitbake/lib/bb/event.py", line 210, in fire_ui_handlers
_ui_handlers[h].event.send(event)
File "X/bitbake/lib/bb/cooker.py", line 117, in send
str_event = codecs.encode(pickle.dumps(event), \'base64\').decode(\'utf-8\')
File "/usr/lib/python3.10/asyncio/sslproto.py", line 320, in __del__
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
File "/usr/lib/python3.10/warnings.py", line 109, in _showwarnmsg
sw(msg.message, msg.category, msg.filename, msg.lineno,
File "X/bitbake/lib/bb/main.py", line 113, in _showwarning
warnlog.warning(s)
File "/usr/lib/python3.10/logging/__init__.py", line 1489, in warning
self._log(WARNING, msg, args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
self.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
self.callHandlers(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
hdlr.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
self.emit(record)
File "X/bitbake/lib/bb/event.py", line 778, in emit
fire(record, None)
File "X/bitbake/lib/bb/event.py", line 234, in fire
fire_ui_handlers(event, d)
File "X/bitbake/lib/bb/event.py", line 197, in fire_ui_handlers
with bb.utils.lock_timeout(_thread_lock):
File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "X/bitbake/lib/bb/utils.py", line 1888, in lock_timeout
bb.server.process.serverlog("Couldn\'t get the lock for 5 mins, timed out, exiting. %s" % traceback.format_stack())
or put in simpler terms, whilst sending an event(), an unrelated warning
message happens to be triggered from asyncio:
/usr/lib/python3.10/asyncio/sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x7f0e797d3100>
which triggers a second event() which can't be sent as we're already
in the critcal section and already hold the lock.
That warning is due to the version of asyncio used on Ubuntu 22.04 with
python 3.10 and that comined with timing issues explains why we don't
see it on other python versions or distros.
We can't handle the second event as the lock is there to serialise the
events. Instead, we queue the event and then process the queue later.
Add a new version of lock_timeout which allows us to handle the situation
more gracefully.
(Bitbake rev: 82b9f42126983579da03bdbb4e3ebf07346118a7)
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 2c590ff1aff89d23b25ce808650f200013a1e6af)
Signed-off-by: Steve Sakoman <steve@sakoman.com>