Well, this is my situation. We had Ubuntu 14.04 server for AOSP compiling with Oracle JDK 1.6.0_45 installed, and around five or six people use this server. In order to reduce the compiling time, usually we will invoke 30 or more jobs to compile. But here comes the problem, some times it causes kernel panic.
[294338.010590] ------------[ cut here ]------------
[294338.011468] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
[294338.013235] invalid opcode: 0000 [#1] SMP
[294338.014037] Modules linked in: hid_generic usbhid hid quota_v2 quota_tree bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw serio_raw hpwdt sb_edac gf128mul hpilo gpio_ich glue_helper ioatdma edac_core acpi_power_meter ablk_helper cryptd ipmi_si dca lpc_ich tpm_infineon mac_hid lp parport tg3 ptp psmouse hpsa pps_core
[294338.024056] CPU: 13 PID: 19564 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu
[294338.025475] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 12/20/2013
[294338.026716] task: ffff88010abd17f0 ti: ffff88015cd40000 task.ti: ffff88015cd40000
[294338.028119] RIP: 0010:[] [] handle_mm_fault+0xe61/0xf10
[294338.029728] RSP: 0000:ffff88015cd41d98 EFLAGS: 00010246
[294338.030724] RAX: 0000000000000100 RBX: 0000000791000000 RCX: 000000000000001c
[294338.032063] RDX: ffff88010abd17f0 RSI: 0000000000000000 RDI: 000000010f4009e6
[294338.036109] RBP: ffff88015cd41e20 R08: 0000000000000000 R09: 00000000000000a9
[294338.037721] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880b4507d440
[294338.039059] R13: ffff880b846c6900 R14: ffff880fe3a61500 R15: 0000000000000080
[294338.087244] FS: 00002b999fffa700(0000) GS:ffff88203f860000(0000) knlGS:0000000000000000
[294338.188145] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[294338.254032] CR2: 00000000e0036000 CR3: 0000000b8c13d000 CR4: 00000000001407e0
[294338.368533] Stack:
[294338.425737] 0000000000000001 ffff88015cd41db0 ffff88015cd41f20 ffff88015cd41dd0
[294338.520319] 00002b99ecdb59a0 0000000700000080 0000000000000148 ffffea006c1926c0
[294338.626938] 0000001b0649b867 ffffea00040a3530 ffff8800000000a9 0000000000000006
[294338.723231] Call Trace:
[294338.771398] [] __do_page_fault+0x184/0x560
[294338.820300] [] ? acct_account_cputime+0x1c/0x20
[294338.862860] [] ? account_user_time+0x8b/0xa0
[294338.908198] [] ? vtime_account_user+0x54/0x60
[294338.955022] [] do_page_fault+0x1a/0x70
[294339.003222] [] page_fault+0x28/0x30
[294339.050179] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
[294339.204017] RIP [] handle_mm_fault+0xe61/0xf10
[294339.255863] RSP
[294339.377075] ------------[ cut here ]------------
[294339.377175] ---[ end trace e09489d0a574e658 ]---
[294339.470018] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
[294339.509694] invalid opcode: 0000 [#2] SMP
[294339.553485] Modules linked in: hid_generic usbhid hid quota_v2 quota_tree bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw serio_raw hpwdt sb_edac gf128mul hpilo gpio_ich glue_helper ioatdma edac_core acpi_power_meter ablk_helper cryptd ipmi_si dca lpc_ich tpm_infineon mac_hid lp parport tg3 ptp psmouse hpsa pps_core
[294339.735363] CPU: 5 PID: 19567 Comm: java Tainted: G D 3.13.0-24-generic #47-Ubuntu
[294339.791441] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 12/20/2013
[294339.821393] task: ffff880fdadb17f0 ti: ffff8807cd2d2000 task.ti: ffff8807cd2d2000
[294339.880945] RIP: 0010:[] [] handle_mm_fault+0xe61/0xf10
[294339.938758] RSP: 0000:ffff8807cd2d3d98 EFLAGS: 00010246
[294339.963947] RAX: 0000000000000100 RBX: 00000007908178e8 RCX: 000000000000002b
[294340.021388] RDX: ffff880fdadb17f0 RSI: 0000000000000000 RDI: 000000010ac009e6
[294340.078229] RBP: ffff8807cd2d3e20 R08: 0000000000000000 R09: 00000000000000a9
[294340.133115] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880b4507d420
[294340.192382] R13: ffff880b846c6900 R14: ffff880fe3a61500 R15: 0000000000000080
[294340.247471] FS: 00002b99a7302700(0000) GS:ffff880fffaa0000(0000) knlGS:0000000000000000
[294340.312602] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[294340.343338] CR2: 00007fd89036b000 CR3: 0000000b8c13d000 CR4: 00000000001407e0
[294340.401079] Stack:
[294340.433014] 0000000000000001 ffff8807cd2d3db0 ffffffff8109a780 ffff8807cd2d3dd0
[294340.495806] ffffffff810d7ad6 0000000000000001 ffffffff81f1ddd8 ffff8807cd2d3e78
[294340.560030] ffffffff810d983d ffff8807cd2d3e48 00000000000000a9 00000001ffffffff
[294340.616200] Call Trace:
[294340.644656] [] ? wake_up_state+0x10/0x20
[294340.673884] [] ? wake_futex+0x66/0x90
[294340.702571] [] ? futex_wake_op+0x4ed/0x620
[294340.728706] [] __do_page_fault+0x184/0x560
[294340.754071] [] ? acct_account_cputime+0x1c/0x20
[294340.780144] [] ? account_user_time+0x8b/0xa0
[294340.809561] [] ? vtime_account_user+0x54/0x60
[294340.836378] [] do_page_fault+0x1a/0x70
[294340.864213] [] page_fault+0x28/0x30
[294340.889126] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
[294340.971126] RIP [] handle_mm_fault+0xe61/0xf10
[294340.997886] RSP
[294341.024224] ---[ end trace e09489d0a574e659 ]---
The message above is the output from dmesg when the kernel panic happen. At this time, the server is still in normal state from user view (unbelievable). Use top command and get the following output.
357 root 39 19 0 0 0 D 0.0 0.0 0:17.45 khugepaged
2228 root 20 0 4372 696 528 D 0.0 0.0 0:00.01 pidof
19517 daiwei 20 0 4773688 861100 9740 D 0.0 0.7 262:13.12 java
42508 thor 20 0 17192 1400 812 D 0.0 0.0 0:00.00 w
We found the kernel process khugepage is in the status of interruptible sleep which isn't very normal.
By disable kernel huge page, everything goes fine. And such problem never happen again.
$ echo "never" | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
No comments:
Post a Comment