VM 코어덤프 분석 (비정상적인 시스템 종료)

1. crash에서 log 분석

log 명령의 실행 결과 아래와 같이 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 메시지를 확인함.

crash> log

(..생략..)

cal.start_seq 4014915650 != tcph seq 4014915916 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; lo
cal.start_seq 4014915650 != tcph seq 4014915916 | tb_tcpv6_conn.c:927
+ BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa019dc6a>] dsa_slim_input+0x7a/0xc90 [dsa_filter]
PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/online
CPU 0
Modules linked in: gsch(U) redirfs(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc
ipv6 dsa_filter(P)(U) ppdev parport_pc parport microcode xen_netfront sg i2c_piix4
i2c_core ext4 jbd2 mbcache sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix
 dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

(..생략..)

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 메시지를 포함한 프로세스(Pid 64616)에 대한 정보는 다음과 같음.

+ httpd: page allocation failure. order:5, mode:0x20                       [478/2761]
+ Pid: 64616, comm: httpd Tainted: P           --------------   2.6.32-504.el6.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8113438a>] ? __alloc_pages_nodemask+0x74a/0x8d0
 [<ffffffffa01a3de0>] ? stateful_tcp_filter+0x870/0x11b0 [dsa_filter]
 [<ffffffffa01a4765>] ? stateful_process+0x45/0xc0 [dsa_filter]
 [<ffffffff81173332>] ? kmem_getpages+0x62/0x170
 [<ffffffff81173f4a>] ? fallback_alloc+0x1ba/0x270
 [<ffffffff8117399f>] ? cache_grow+0x2cf/0x320
 [<ffffffff81173cc9>] ? ____cache_alloc_node+0x99/0x160
 [<ffffffff81451ea2>] ? pskb_expand_head+0x62/0x280
 [<ffffffff81174a99>] ? __kmalloc+0x189/0x220
 [<ffffffff81451ea2>] ? pskb_expand_head+0x62/0x280
 [<ffffffff8145278a>] ? __pskb_pull_tail+0x2aa/0x360
 [<ffffffffa01e3b89>] ? lin_nf_packet_wrapper.clone.0+0x49/0x3d0 [dsa_filter]
 [<ffffffff8149cdd0>] ? ip_finish_output+0x0/0x310
 [<ffffffffa012eb4f>] ? xennet_start_xmit+0x5ef/0x7cc [xen_netfront]
 [<ffffffffa01e40bc>] ? lin_nf_packet_wrapper_all.clone.1+0x1ac/0x1e0 [dsa_filter]
 [<ffffffff8149cdd0>] ? ip_finish_output+0x0/0x310
 [<ffffffffa01e4111>] ? lin_nf_packet_wrapper_inet+0x21/0x30 [dsa_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff8149cdd0>] ? ip_finish_output+0x0/0x310
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff8149cdd0>] ? ip_finish_output+0x0/0x310
 [<ffffffff8149d184>] ? ip_output+0xa4/0xc0
 [<ffffffffa01a4765>] ? stateful_process+0x45/0xc0 [dsa_filter]
 [<ffffffff8149c475>] ? ip_local_out+0x25/0x30
 [<ffffffff8149c970>] ? ip_queue_xmit+0x190/0x420
 [<ffffffffa0190047>] ? core_pkt_hook+0x267/0x8b0 [dsa_filter]
 [<ffffffff814b2024>] ? tcp_transmit_skb+0x4b4/0x8b0
 [<ffffffff814b456a>] ? tcp_write_xmit+0x1da/0xa90
 [<ffffffff814b5150>] ? __tcp_push_pending_frames+0x30/0xe0
 [<ffffffff814ac733>] ? tcp_data_snd_check+0x33/0x100
 [<ffffffff814b03c1>] ? tcp_rcv_established+0x391/0x7e0
 [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
 [<ffffffffa01e4ad0>] ? lin_pkt_get_frame_header+0x0/0x5d0 [dsa_filter]
 [<ffffffffa01e4730>] ? lin_pkt_get_length+0x0/0x20 [dsa_filter]
 [<ffffffffa01e4990>] ? lin_pkt_read_start+0x0/0x140 [dsa_filter]
 [<ffffffff814b8893>] ? tcp_v4_do_rcv+0x2e3/0x490
 [<ffffffff814ba1a2>] ? tcp_v4_rcv+0x522/0x900
 [<ffffffffa01e4111>] ? lin_nf_packet_wrapper_inet+0x21/0x30 [dsa_filter]
 [<ffffffff81496ded>] ? ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497078>] ? ip_local_deliver+0x98/0xa0
 [<ffffffff8149653d>] ? ip_rcv_finish+0x12d/0x440
 [<ffffffff81496ac5>] ? ip_rcv+0x275/0x350
 [<ffffffff8145c88b>] ? __netif_receive_skb+0x4ab/0x750
 [<ffffffff81460588>] ? netif_receive_skb+0x58/0x60
[<ffffffffa012da96>] ? xennet_poll+0xba6/0xd50 [xen_netfront]
 [<ffffffff81462083>] ? net_rx_action+0x103/0x2f0
 [<ffffffff810eaac2>] ? handle_IRQ_event+0x92/0x170
 [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
 [<ffffffff810b034a>] ? tick_program_event+0x2a/0x30
 [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
 [<ffffffff8100fc15>] ? do_softirq+0x65/0xa0
 [<ffffffff8107d765>] ? irq_exit+0x85/0x90
 [<ffffffff813239b5>] ? xen_evtchn_do_upcall+0x35/0x50
 [<ffffffff8100c433>] ? xen_hvm_callback_vector+0x13/0x20
<EOI>              net.tcp/1  | SYN invalid retransmit in(remote set=1) ; remote.start_seq 3256175445 != tcph seq 2388075115 | tb_tcpv6_conn.c:867
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195753 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195753 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195753 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195815 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195829 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195836 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195855 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195949 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195949 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937195949 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196033 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196122 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196221 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196221 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196472 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196576 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196673 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196673 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196768 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196779 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 937195752 != tcph seq 937196869 | tb_tcpv6_conn.c:927
net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230081 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230081 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230081 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230143 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230157 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230164 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230183 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230277 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230277 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230361 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230451 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230550 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230550 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230801 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501230905 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501231002 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 1501230080 != tcph seq 1501231002 | tb_tcpv6_conn.c:927

(..생략..)

             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 2699821124 != tcph seq 2699824124 | tb_tcpv6_conn.c:927           [95/1981]
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 2699821124 != tcph seq 2699824210 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 2699821124 != tcph seq 2699824308 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 2699821124 != tcph seq 2699824308 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 2699821124 != tcph seq 2699824444 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 2699821124 != tcph seq 2699824444 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 2699821124 != tcph seq 2699824580 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 2699821124 != tcph seq 2699824580 | tb_tcpv6_conn.c:927
             net.tcp/1  | SYN invalid retransmit in(remote set=1) ; remote.start_seq 3897748963 != tcph seq 3498325270 | tb_tcpv6_conn.c:867
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915651 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915651 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915651 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915713 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915731 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915738 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915757 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915905 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915905 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915916 | tb_tcpv6_conn.c:927
             net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915916 | tb_tcpv6_conn.c:927
+ BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
+ IP: [<ffffffffa019dc6a>] dsa_slim_input+0x7a/0xc90 [dsa_filter]
PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/online
CPU 0
Modules linked in: gsch(U) redirfs(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 dsa_filter(P)(U) ppdev parport_pc parport microcode xen_netfront sg i2c_piix4 i2c_co
re ext4 jbd2 mbcache sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 99088, comm: httpd Tainted: P           ---------------    2.6.32-504.el6.x86_64 #1 Xen HVM domU
RIP: 0010:[<ffffffffa019dc6a>]  [<ffffffffa019dc6a>] dsa_slim_input+0x7a/0xc90 [dsa_filter]

IP: [] dsa_slim_input+0x7a/0xc90 [dsa_filter]의 의미는 다음과 같음. –> 참고

항목 의미
IP Instruction Pointer
Crash function dsa_slim_input
Crash offset 0x7a (10진수 122)
End 0xc90

dsa_slim_input 함수의 122번째 offset을 disassemble 함.

crash> dis dsa_slim_input+122
0xffffffffa019dc6a <dsa_slim_input+122>:        sub    0x8(%rdi),%r11d
crash>

결국 sub 0x8(%rdi),%r11d 명령을 수행하던 중 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008가 발생한 것임. 이 부분만 봐서는 왜 NULL pointer dereference가 발생했는지 잘 모르겠음. 1-(1)에서 dsa_slim_input 함수 전체를 disassemble 해보겠음.

1-(1). dsa_slim_input 함수를 디스어셈블

crash> dis 0xffffffff8100ad40 0xffffffff8100ad40 : retq

dsa_slim_input 함수를 디스어셈블.

crash> dis dsa_slim_input
0xffffffffa019dbf0 <dsa_slim_input>:    push   %rbp
0xffffffffa019dbf1 <dsa_slim_input+1>:  mov    %rsp,%rbp ----> 스택 프레임 생성.
0xffffffffa019dbf4 <dsa_slim_input+4>:  push   %r15 ----> rbx, rbp, rdi, rsi, r12-r15 are nonvolatile.
0xffffffffa019dbf6 <dsa_slim_input+6>:  push   %r14 ----> r14는 레지스터. r14 레지스터에 저장된 값을 스택에 저장함.
0xffffffffa019dbf8 <dsa_slim_input+8>:  push   %r13
0xffffffffa019dbfa <dsa_slim_input+10>: push   %r12
0xffffffffa019dbfc <dsa_slim_input+12>: push   %rbx
0xffffffffa019dbfd <dsa_slim_input+13>: sub    $0x58,%rsp ----> 0x58은 10진수 88. 88바이트를 메모리에 할당함.
0xffffffffa019dc01 <dsa_slim_input+17>: callq  0xffffffff8100ad40 <mcount>  ----> retq (리턴을 의미함)
0xffffffffa019dc06 <dsa_slim_input+22>: mov    %rsi,-0x48(%rbp) ----> rsi 레지스터에 저장된 값을 'rbp 레지스터 주소 - 72바이트(0x48)' 가 가 리키는 주소에 저장함. rsi(복사할 데이터의 source(출발지 주소))와 rdi(데이터 복사의 destination(도착지 주소))는 한 쌍으로 주로 쓰임.
0xffffffffa019dc0a <dsa_slim_input+26>: mov    0xa8(%rsi),%rax ----> rsi 레지스터에 저장된 주소에서 168바이트(0xa8) 만큼 떨어진 곳의 값을 rax 레지스터에 저장함.
0xffffffffa019dc11 <dsa_slim_input+33>: mov    %rdi,%r12  ----> rdi 레지스터에 저장된 값을 r12 레지스터에 옮김.
0xffffffffa019dc14 <dsa_slim_input+36>: mov    0x8(%rax),%r13d ----> r13d는 r13 레지스터의 하위 32비트를 의미함.
0xffffffffa019dc18 <dsa_slim_input+40>: mov    0x4(%rax),%ebx
0xffffffffa019dc1b <dsa_slim_input+43>: bswap  %r13d ----> bswap은 바이트 순서를 변경함 (리틀 엔디안 -> 빅 엔디안. and vice versa)
0xffffffffa019dc1e <dsa_slim_input+46>: bswap  %ebx
0xffffffffa019dc20 <dsa_slim_input+48>: testb  $0x2,0xd(%rax) ----> testb 명령어는 비트 연산 and를 수행한다.
+ 0xffffffffa019dc24 <dsa_slim_input+52>: je     0xffffffffa019dc60 ----> and 연산을 수행(testb  $0x2,0xd(%rax))한 결과가 true이면 0xffffffffa019dc60 주소로 점프함.
(아래에 빨강색으로 표시한 부분은 해석할 필요 없음.)
- 0xffffffffa019dc26 <dsa_slim_input+54>: movzbl 0x54(%rdi),%eax
- 0xffffffffa019dc2a <dsa_slim_input+58>: test   $0x1,%al
- 0xffffffffa019dc2c <dsa_slim_input+60>: jne    0xffffffffa019e170
- 0xffffffffa019dc32 <dsa_slim_input+66>: add    $0x1,%ebx
- 0xffffffffa019dc35 <dsa_slim_input+69>: or     $0x1,%eax
- 0xffffffffa019dc38 <dsa_slim_input+72>: mov    %ebx,(%rdi)
- 0xffffffffa019dc3a <dsa_slim_input+74>: mov    %ebx,0x4(%rdi)
- 0xffffffffa019dc3d <dsa_slim_input+77>: mov    %al,0x54(%rdi)
- 0xffffffffa019dc40 <dsa_slim_input+80>: movl   $0x0,-0x4c(%rbp)
- 0xffffffffa019dc47 <dsa_slim_input+87>: mov    -0x4c(%rbp),%eax
- 0xffffffffa019dc4a <dsa_slim_input+90>: add    $0x58,%rsp
- 0xffffffffa019dc4e <dsa_slim_input+94>: pop    %rbx
- 0xffffffffa019dc4f <dsa_slim_input+95>: pop    %r12
- 0xffffffffa019dc51 <dsa_slim_input+97>: pop    %r13
- 0xffffffffa019dc53 <dsa_slim_input+99>: pop    %r14
- 0xffffffffa019dc55 <dsa_slim_input+101>:        pop    %r15
- 0xffffffffa019dc57 <dsa_slim_input+103>:        leaveq
- 0xffffffffa019dc58 <dsa_slim_input+104>:        retq
- 0xffffffffa019dc59 <dsa_slim_input+105>:        nopl   0x0(%rax)
+ 0xffffffffa019dc60 <dsa_slim_input+112>:        mov    -0x48(%rbp),%rax ----> 여기로 점프함.
0xffffffffa019dc64 <dsa_slim_input+116>:        mov    %r13d,%r11d
0xffffffffa019dc67 <dsa_slim_input+119>:        mov    %ebx,%r10d
+ 0xffffffffa019dc6a <dsa_slim_input+122>:        sub    0x8(%rdi),%r11d ----> 문제 발생!!!!!! (unable to handle kernel NULL pointer dereference at 0000000000000008)
(아래에 빨강색으로 표시한 부분은 해석할 필요 없음.)
- 0xffffffffa019dc6e <dsa_slim_input+126>:        sub    (%rdi),%r10d
- 0xffffffffa019dc71 <dsa_slim_input+129>:        cmpl   $0x2,0x5ee68(%rip)        # 0xffffffffa01fcae0
- 0xffffffffa019dc78 <dsa_slim_input+136>:        mov    0xa4(%rax),%r14d
- 0xffffffffa019dc7f <dsa_slim_input+143>:        mov    0x58(%rdi),%rax
- 0xffffffffa019dc83 <dsa_slim_input+147>:        mov    0xc30(%rax),%r15
- 0xffffffffa019dc8a <dsa_slim_input+154>:        jg     0xffffffffa019e180
- 0xffffffffa019dc90 <dsa_slim_input+160>:        cmpq   $0x0,0x8(%r15)
- 0xffffffffa019dc95 <dsa_slim_input+165>:        je     0xffffffffa019dd1f
- 0xffffffffa019dc9b <dsa_slim_input+171>:        mov    0x10(%r15),%rdx
- 0xffffffffa019dc9f <dsa_slim_input+175>:        movzwl 0xda(%rdx),%eax
- 0xffffffffa019dca6 <dsa_slim_input+182>:        mov    %rax,%rcx
- 0xffffffffa019dca9 <dsa_slim_input+185>:        add    $0x1,%eax
- 0xffffffffa019dcac <dsa_slim_input+188>:        and    $0x1f,%ecx
- 0xffffffffa019dcaf <dsa_slim_input+191>:        mov    %ax,0xda(%rdx)
- 0xffffffffa019dcb6 <dsa_slim_input+198>:        mov    %rcx,%rax
- 0xffffffffa019dcb9 <dsa_slim_input+201>:        shl    $0x5,%rax
- 0xffffffffa019dcbd <dsa_slim_input+205>:        lea    0x4d0(%rdx,%rax,1),%rsi
- 0xffffffffa019dcc5 <dsa_slim_input+213>:        mov    0x10(%r15),%rax
- 0xffffffffa019dcc9 <dsa_slim_input+217>:        movzwl 0xd8(%rax),%eax
- 0xffffffffa019dcd0 <dsa_slim_input+224>:        mov    %ebx,0x10(%rsi)
- 0xffffffffa019dcd3 <dsa_slim_input+227>:        shl    $0x14,%eax
- 0xffffffffa019dcd6 <dsa_slim_input+230>:        or     $0x10003,%eax
- 0xffffffffa019dcdb <dsa_slim_input+235>:        mov    %eax,0xc(%rsi)
- 0xffffffffa019dcde <dsa_slim_input+238>:        mov    %rcx,%rax
- 0xffffffffa019dce1 <dsa_slim_input+241>:        add    $0x27,%rcx
- 0xffffffffa019dce5 <dsa_slim_input+245>:        shl    $0x5,%rax
- 0xffffffffa019dce9 <dsa_slim_input+249>:        shl    $0x5,%rcx

(..생략..)

1-(2). 결론

2016년 06월 16일 20시 18분에 K의 VM에서 비정상적인 Shutdown이 발생한 원인은 트렌드마이크로 사의 통합 서버 보안 솔루션인 Deep Security의 dsa_slim_input 함수에서 NULL pointer dereference가 발생한 것으로 추정됨.


crash> dis dsa_slim_input+122
0xffffffffa019dc6a <dsa_slim_input+122>:        sub    0x8(%rdi),%r11d
crash>

sub 0x8(%rdi),%r11d 명령의 의미는 r11 레지스터의 하위 32비트 -= rdi 레지스터에서 offset 8만큼 떨어진 주소 값의 데이터

항목 명령어 비고
source 0x8(%rdi) 포인터를 의미함
destination %r11d r11 레지스터의 하위 32비트

즉, r11 레지스터의 하위 32비트(%r11d)에서 rdi 레지스터를 기준으로 offset 8만큼 떨어진 주소 값의 데이터를 뺄셈 한 값을 r11 레지스터의 하위 32비트에 저장함.

하지만 0x8(%rdi)의 주소에는 NULL 값이 있기 때문에 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 메시지가 발생함.

0000000000000008은 rdi 레지스터에서 오프셋 8(0x8)만큼 떨어진 곳을 의미함.

1) 문제 발생 원인 분석

다른 함수 혹은 외부적인 요인으로 0x8(%rdi) 주소가 가리키는 값(포인터)이 null로 초기화된 것으로 보임. -> 즉, 실행의 흐름이 sub 0x8(%rdi),%r11d에 도착했을 때 0x8(%rdi)은 null을 가리킴.

이 때문에 sub 연산을 수행할 수 없으므로 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 메시지를 출력한 것으로 판단됨.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 메시지가 발생하기 전까지 실행 흐름을 따라가보면 rdi 레지스터의 값을 r12 레지스터에 옮길 때는 문제가 발생하지 않았음. (0xffffffffa019dc11 : mov %rdi,%r12)

2) 문제 해결 방법

  • dsa_slim_input 함수 내부에서 null pointer 예외 처리를 해야 함.
  • dsa_slim_input 함수 내부의 0x8(%rdi)가 가리키는 값이 왜 null로 바뀌는지 실제 프로그램 소스를 보고 확인해야 함. (버그로 추정됨)

3) dsa_slim_input 함수의 실행 흐름

dsa_slim_input 함수 실행 시 ~ BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 메시지가 발생하기 전(sub 0x8(%rdi),%r11d) 까지의 실행 흐름임.

순서 명령 설명 비고
1 스택 프레임 생성 push %rbp, mov %rsp,%rb p dsa_sli m_input 함수가 호출됨
2

레지스터

값 저장

예) push %r15
3 값 복사 esi, edi 레지스터 이용, mov 명령어
4 바이트 순서 변경 ( 리틀 엔디안 <—> 빅 엔디안 )

리틀 엔디안은 빅 엔디안으로

변경. 빅 엔디안은 리틀 엔디안으로

변경.

네트워크 에서 데이터를 전송할 때는 Byte order를 빅 엔디안 방식으로 통일한다

이 때문에 바이트 순서를 변경하는 것으로 보임

5 if (2 & 0xd(%ra x))

2와 0xd(%r ax)

비트 연산(AND) 해서 True이면 0xffffff ffa019dc 60으로 점프함 (True로 동작함)

상수 2와 레지스터 의 값을 비트 연산하는 것으로 보아 특정 패킷을 판별하는 것으로 보임
5-(1) mov -0x48(% rbp),%r ax move
5-(2) mov %r13d,% r11d move
5-(3) mov %ebx,%r 10d move
5-(4) sub 0x8(%rd i),%r11 d BUG: unable to handle kernel NULL pointer derefere nce at 00000000 00000008 발생함

4) dsa_slim_input 함수의 기능

dsa_slim_input 함수의 기능은 다음과 같이 추측됨.

순서 기능 비고
1 K 서버가 받아들이는 패킷의 네트워크 Byte order를 빅 엔디안으로 변경
2 비트 연산을 통해 특정 패킷을 판별함 if..else 진입
2-(1) 특정 패킷을 판별하는 조건문이 일 경우 해당 명령을 실행.
2-(2) 특정 패킷을 판별하는 조건문이 거짓일 경우 해당 명령을 실행.

dsa_slim_input 함수의 실행 흐름, 기능을 미루어볼 때 crash 쉘에서 log 명령으로 보였던 아래 log는 Heartbeat가 아니라 DSA(Deep Security Agent)의 Deep Security 기능으로 판단됨.

(..생략..)
net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915757 | tb_tcpv6_conn.c:927
net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915905 | tb_tcpv6_conn.c:927
net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915905 | tb_tcpv6_conn.c:927
net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915916 | tb_tcpv6_conn.c:927
net.tcp/1  | Unexpected pkt out in establising_1 out(local set=1) ; local.start_seq 4014915650 != tcph seq 4014915916 | tb_tcpv6_conn.c:927
(..생략..)

Deep Security의 버전 확인 후 해당 버전의 Deep Security 매뉴얼을 확인해 봐야함. 매뉴얼에 IPv6에 대한 내용이 언급됨. Deep Security가 java 기반의 서비스를 제공하는지 확인 해야함.