Coredump 是分析Android native exception和kernel exception的利器,coredump是核心转储,可以理解为当进程发生异常无法挽救时,OS机制把这块出问题的内存取出来打包成核心转储供给离线分析用。有了coredump 不但可以定位具体出异常的代码所在文件行数,还可以离线调试,一步步还原问题现场,抓出导致异常真凶.但是很多时候由于系统挂得太突然等某些原因来不及打包coredump,导致无法获取到核心转储,只留下一堆
tombstone 的残余信息,要使用有限的调试信息分析问题原因并解决之,这个时候GNU tools工具家族的addr2line工具就可以发挥作用了,addr2line工具可以根据内存地址加上符号库文件即可“翻译”出代码出错的具体位置(这里工具定位到的代码位置很多情况下只是供参考,不一定是真正的错误原因,特别是内存被踩的情况)。
tombstone的本意是“墓碑”,这里形象的用于描述进程挂了之后留下供调试的线索,
如下是某进程崩溃后留下的 tombstone
中的的 backtrace:
Revision: '0'
ABI: 'arm64'
pid: 24377, tid: 24377, name: gx_fpd >>> /system/bin/gx_fpd <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
x0 0000000000000000 x1 0000000000005f39 x2 0000000000000006 x3 0000000000000000
x4 0000000000000000 x5 0000000000000001 x6 0000000000000000 x7 0000000000000000
x8 0000000000000083 x9 0000007fb4eec110 x10 0000000000000002 x11 0000000000000003
x12 0000000000000000 x13 0000000000000043 x14 0000007fcc97a768 x15 0000000000000000
x16 0000007fb4b866a8 x17 0000007fb4b48b6c x18 0000000000000002 x19 0000007fb4f670a8
x20 0000007fb4f66fe8 x21 000000000000000b x22 0000000000000006 x23 0000005582219f90
x24 0000007fcc97ac90 x25 0000007fb4e04d18 x26 0000000000000000 x27 0000000000000000
x28 0000000000000000 x29 0000007fcc97ab60 x30 0000007fb4b46308
sp 0000007fcc97ab60 pc 0000007fb4b48b74 pstate 0000000020000000
v0 2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e v1 006370692e67756265642e6f6e6e6974
v2 636f69203a4457540000000000000031 v3 80000000000000000000000000000000
v4 00000000000000008020080280200800 v5 00000000400000000000040000000000
v6 00000000000000000000000000000000 v7 80200802802008028020080280200802
v8 00000000000000000000000000000000 v9 00000000000000000000000000000000
v10 00000000000000000000000000000000 v11 00000000000000000000000000000000
v12 00000000000000000000000000000000 v13 00000000000000000000000000000000
v14 00000000000000000000000000000000 v15 00000000000000000000000000000000
v16 40100401401004014010040140100401 v17 00000000a00aa0080000aaa880400400
v18 00000000000000008020080280200800 v19 0833083a082f08240828083c082e0832
v20 0c950c920c9a0c950c960c950c970c9a v21 000000000000000000000055822a6c18
v22 083a083e083408380834083b084f084b v23 0c950c960c960c930c970c8d0c930c9a
v24 000000000000000000000055822a6c08 v25 085908470837083f083e083f08410843
v26 0c950c930c920c940c950c960c920c97 v27 000000000000000000000055822a6bf8
v28 0862084c084e083b084608350826082e v29 0c920c960c930c950c920c970c900c98
v30 000000000000000000000055822a6be8 v31 0838083c0850085a08410851082f0846
fpsr 00000000 fpcr 00000000
backtrace:
#00 pc 000000000006ab74 /system/lib64/libc.so (tgkill+8)
#01 pc 0000000000068304 /system/lib64/libc.so (pthread_kill+68)
#02 pc 00000000000212f8 /system/lib64/libc.so (raise+28)
#03 pc 000000000001ba98 /system/lib64/libc.so (abort+60)
#04 pc 000000000002e104 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+216)
#05 pc 0000000000004c5c /system/bin/gx_fpd (main+236)
#06 pc 0000000000019794 /system/lib64/libc.so (__libc_init+100)
#07 pc 0000000000004d78 /system/bin/gx_fpd
从发现异常的信号 signal 6 (SIGABRT) 看第一印象就是发生了NULL内存范围,被MMU拦截了,ARM异常处理报出 data abort异常所致。 这里很重要一点是要知道具体backtrace代表的源代码是什么,也就是从backtrace翻译成具体的源代码。addr2line工具则提供了此功能。
用法如下:(一定要用带sysmbol目录下的库)
addr2line -e symbols/system/lib64/xxx.so -f -C <addr
./aarch64-linux-android-addr2line -e symbols/system/lib64/libc.so 000000000006ab74
bionic/libc/arch-arm64/syscalls/tgkill.S:9
./aarch64-linux-android-addr2line -e symbols/system/lib64/libc.so 0000000000068304
bionic/libc/bionic/pthread_kill.cpp:45 (discriminator 1)
./aarch64-linux-android-addr2line -e symbols/system/lib64/libc.so 00000000000212f8
bionic/libc/bionic/raise.cpp:34 (discriminator 1)
./aarch64-linux-android-addr2line -e symbols/system/lib64/libc.so 000000000001ba98
bionic/libc/bionic/abort.cpp:47
./aarch64-linux-android-addr2line -e symbols/system/lib64/libbinder.so 000000000002e104
frameworks/native/libs/binder/IPCThreadState.cpp:608
转换后如下 ==》
backtrace:
#00 pc 000000000006ab74 /system/lib64/libc.so (tgkill+8) tgkill.S:9
#01 pc 0000000000068304 /system/lib64/libc.so (pthread_kill+68) pthread_kill.cpp:45
#02 pc 00000000000212f8 /system/lib64/libc.so (raise+28) raise.cpp:34
#03 pc 000000000001ba98 /system/lib64/libc.so (abort+60) abort.cpp:47
#04 pc 000000000002e104 /system/lib64/libbinder.so IPCThreadState.cpp:608 (android::IPCThreadState::joinThreadPool(bool)+216)
#05 pc 0000000000004c5c /system/bin/gx_fpd (main+236)
#06 pc 0000000000019794 /system/lib64/libc.so (__libc_init+100)
#07 pc 0000000000004d78 /system/bin/gx_fpd
这里注意下,因为gx_fpd 是第三方库,不带symbol,所以无法解析出具体代码位置。
然后我们可以看下发生异常的代码,IPCThreadState.cpp:608
void IPCThreadState::joinThreadPool(bool isMain)
{
LOG_THREADPOOL("**** THREAD %p (PID %d) IS JOINING THE THREAD POOL\n", (void*)pthread_self(), getpid());
mOut.writeInt32(isMain ? BC_ENTER_LOOPER : BC_REGISTER_LOOPER);
// This thread may have been spawned by a thread that was in the background
// scheduling group, so first we will make sure it is in the foreground
// one to avoid performing an initial transaction in the background.
set_sched_policy(mMyThreadId, SP_FOREGROUND);
status_t result;
do {
processPendingDerefs();
// now get the next command to be processed, waiting if necessary
result = getAndExecuteCommand();
if (result < NO_ERROR && result != TIMED_OUT && result != -ECONNREFUSED && result != -EBADF) {
ALOGE("getAndExecuteCommand(fd=%d) returned unexpected error %d, aborting",
mProcess->mDriverFD, result);
abort(); <======= LINE 608
}
上面代码可以出,这个abort不是发生NULL指针所致,而是为了拦截程序发生超出预期的行为而人为的加了abort 动作, 这里就需要分析这个result为什么会异常导致跑到这个陷阱中了,而这块属于binder通信的核心代码,所以需要对binder的原理深入理解以及其代码非常的熟悉才能从容的进一步调试分析.
|