...
08:00:00 1 0 0 98
08:10:00 3 1 0 97
08:20:00 2 1 0 97
08:30:00 3 1 0 97
08:40:00 3 1 0 97
08:50:00 2 1 0 97
09:00:00 2 1 0 97
09:10:00 2 1 0 97
09:20:00 2 1 0 97
09:30:00 2 1 0 98
09:40:00 2 0 0 98
09:50:00 2 0 0 98
10:00:00 1 0 0 98
10:10:00 2 0 0 98
10:20:00 1 0 0 98
10:30:01 1 0 0 98
10:40:00 1 0 0 99
10:50:00 1 0 0 99
11:00:00 1 0 0 99
11:10:00 1 0 0 99
11:20:00 1 0 0 99
11:30:00 1 0 0 99
11:40:00 1 0 0 99
11:50:00 1 0 0 99
12:00:00 1 0 0 99
...
Such workload is the same for many days.
But from time to time I noticed strange behaviour - cpu becomes very busy (again: for many days):
...
06:10:00 54 1 0 45
06:20:00 54 2 0 45
06:30:00 53 1 0 46
06:40:00 52 1 0 47
06:50:00 52 1 0 47
07:00:00 52 1 0 47
07:10:00 52 1 0 47
07:20:00 52 1 0 47
07:30:00 52 1 0 47
07:40:00 52 1 0 46
07:50:00 52 1 0 46
08:00:00 53 1 0 46
...
As you can see even in the night ! I didn't expected that. And type of workload doesn't justify such behaviour (I have also installed the latest java patch: 1.5.0_11)
I was courious and run prstat to see which application might do that (I was almost sure that the java app is guilty but to be sure):PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWPIndeed, the application used most of cpu time. This time prstat -mL -p 4498:
4498 przemol 964M 804M cpu0 0 10 62:18:01 51% java/81
26850 root 7600K 3796K sleep 59 0 0:00:00 0.1% sshd/1
26861 root 1076K 900K sleep 59 0 0:00:00 0.0% sh/1
26866 root 2832K 2608K cpu1 59 0 0:00:00 0.0% prstat/1
136 daemon 3932K 1704K sleep 59 0 0:00:01 0.0% kcfd/3
26865 root 2452K 1680K sleep 49 0 0:00:00 0.0% bash/1
26853 przemol 7372K 2000K sleep 59 0 0:00:00 0.0% sshd/1
26855 przemol 1160K 932K sleep 59 0 0:00:00 0.0% ksh/1
...
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPIDWhat caught my eye was the LCK column. The application spent significant amount of time waiting for locks. I remember that in Solaris 10 plockstat can reveal user-level lock statistics:
4498 przemol 48 0.0 0.0 0.0 0.0 21 31 0.6 77 165 77 1 java/5897
4498 przemol 31 0.0 0.0 0.0 0.0 20 49 0.4 103 88 96 4 java/6948
4498 przemol 19 0.0 0.0 0.0 0.0 80 0.5 0.0 40 58 178 0 java/4
4498 przemol 1.5 0.2 0.0 0.0 0.0 0.0 98 0.0 93 4 304 0 java/7629
4498 przemol 1.0 0.1 0.0 0.0 0.0 99 0.0 0.1 2K 3 2K 0 java/3
4498 przemol 0.9 0.1 0.0 0.0 0.2 99 0.0 0.0 511 1 521 0 java/2
4498 przemol 0.0 0.0 0.0 0.0 0.0 0.9 99 0.0 152 1 89 0 java/16
4498 przemol 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 51 0 java/7634
4498 przemol 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 24 0 35 0 java/7
4498 przemol 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 3 0 3 0 java/17
[...]
bash-3.00# plockstat -n 20 -A -e 600 -p `pgrep java`
[...]
Count nsec Lock Caller
-------------------------------------------------------------------------------
574 30515 libc.so.1`libc_malloc_lock libjvm.so`__1cCosEfree6Fpv_v_+0x18
[...]
Mutex hold
Count nsec Lock Caller
-------------------------------------------------------------------------------
1 16340 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 15707 libdvmti.so`.XAKUgFjQxPCE3Rm.data libdvmti.so`class_create_entry+0xf7
144 15211 0xd02b8900 libjvm.so`__1cCosTget_native_priority6F
690 14849 0xd02ac1c0 libjvm.so`__1cCosTget_native_priority6F
1 14822 0xd02b4000 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 14364 libc.so.1`libc_malloc_lock libjava_crw_demo.so`method_write_all+0x
543 14317 0xd02b8240 libjvm.so`__1cCosTget_native_priority6F
10107 14224 libc.so.1`libc_malloc_lock libnet.so`Java_java_net_SocketInputStre
1 13958 0xd02b42c0 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13940 0xd02b4180 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13809 0xd02b4280 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13804 libdvmti.so`.XAKUgFjPxPCEnMm.data+0x8 libdvmti.so`cbClassFileLoadHook+0x3e
1 13800 0xd02b3f80 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13697 0xd02b40c0 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13679 0xd02b4080 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13677 0xd02b4100 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13661 0xd02b4240 libjvm.so`__1cCosPpd_start_thread6FpnGT
6 13655 libc.so.1`_uberdata libc.so.1`_thrp_create+0xc8
1 13644 0xd02b4200 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13601 libc.so.1`libc_malloc_lock libnsl.so.1`free_entry+0x1d
[...]
Mutex spin
Count nsec Lock Caller
-------------------------------------------------------------------------------
1 13180 libc.so.1`libc_malloc_lock libjava.so`JNU_GetStringPlatformChars+0
1 13053 libc.so.1`libc_malloc_lock libc.so.1`dgettext+0x5b
493 12796 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
Mutex hold
Count nsec Lock Caller
-------------------------------------------------------------------------------
408 12779 libc.so.1`_uberdata+0x3c0 0xffffffff
10107 12549 libc.so.1`libc_malloc_lock libnet.so`Java_java_net_SocketInputStre
18 12461 libnsl.so.1`netpp_mutex libnsl.so.1`setnetconfig+0x3e
1 12429 0xd02b3fc0 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 12388 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 12255 libc.so.1`libc_malloc_lock libjava_crw_demo.so`java_crw_demo+0x437
1 12142 libc.so.1`_uberdata+0x440 0xffffffff
Mutex spin
Count nsec Lock Caller
-------------------------------------------------------------------------------
2 12115 libc.so.1`libc_malloc_lock libjava.so`JNU_ReleaseStringPlatformCha
1 12038 libc.so.1`libc_malloc_lock libnsl.so.1`netconfig_dup+0x1e
Mutex hold
Count nsec Lock Caller
-------------------------------------------------------------------------------
440 11918 libc.so.1`_door_lock libnsl.so.1`_door_gethostbyname_r+0x7e
Mutex spin
Count nsec Lock Caller
-------------------------------------------------------------------------------
1 11892 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen
Mutex hold
Count nsec Lock Caller
-------------------------------------------------------------------------------
1 11887 0xd02b4300 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 11830 0xd02b3f40 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 11789 0xd02b4000 libc.so.1`thr_create+0x42
1 11403 0xd02b41c0 libjvm.so`__1cCosPpd_start_thread6FpnGT
2 10895 libc.so.1`_uberdata+0x40 0xd0582481
1 10830 0xd02b3fc0 0
252 10440 libc.so.1`libc_malloc_lock libjava.so`JNU_GetStringPlatformChars+0
18 10396 libnsl.so.1`netpp_mutex libnsl.so.1`getnetconfigent+0x42
[...]
Mutex hold
Count nsec Lock Caller
-------------------------------------------------------------------------------
2 9967 libc.so.1`libc_malloc_lock libjvm.so`__1cCosEfree6Fpv_v_+0x18
238 9828 libc.so.1`_door_lock libnsl.so.1`_door_gethostbyaddr_r+0x73
220 9575 libnsl.so.1`netpp_mutex libnsl.so.1`getnetconfigent+0x42
1 9491 libc.so.1`_uberdata+0xac0 libjvm.so`__1cDhpiEread6FipvI_I_+0xa8
Mutex spin
Count nsec Lock Caller
-------------------------------------------------------------------------------
2 9386 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
Mutex hold
Count nsec Lock Caller
-------------------------------------------------------------------------------
1 9351 libc.so.1`libc_malloc_lock libdvmti.so`class_create_entry+0x107
2 9240 libc.so.1`libc_malloc_lock libc.so.1`dgettext+0x5b
2 9167 libc.so.1`libc_malloc_lock libjava_crw_demo.so`method_init+0x89
148 8997 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen
309 8928 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 8862 libc.so.1`libc_malloc_lock libnsl.so.1`free_entry+0x1d
1 8834 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 8765 0xd02b39c0 0
3536 8747 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 8727 libc.so.1`libc_malloc_lock libnsl.so.1`free_entry+0x1d
1 8496 0x8071a50 libdvmti.so`hash_lookup+0x63
1 8489 0xd02b3fc0 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8380 0xd02b3ec0 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8380 0xd02b3d80 0
148 8357 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen
1 8334 0xd02b4140 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8321 0xd02b4200 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8308 0xd02b40c0 libc.so.1`thr_create+0x42
1 8305 0xd02b4240 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8280 0xd02b4080 libjvm.so`__1cG_start6Fpv_0_+0x6a
Mutex spin
Count nsec Lock Caller
-------------------------------------------------------------------------------
3 7443 libc.so.1`libc_malloc_lock libjava.so`JNU_ReleaseStringPlatformCha
Mutex hold
Count nsec Lock Caller
-------------------------------------------------------------------------------
2 7302 libc.so.1`_uberdata+0x40 0xd0582481
148 7291 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen
2 7248 libc.so.1`_uberdata+0xe00 libc.so.1`setlocale+0x2b
1 7200 0xd02b2a80 0
2 7119 libc.so.1`libc_malloc_lock libjava_crw_demo.so`method_init+0x133
1 7101 0xd02b4040 0
91058 7057 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 6940 0x8071a50 libdvmti.so`hash_add+0x70
1 6935 libc.so.1`libc_malloc_lock libdvmti.so`class_set_method_count+0x6e
148 6922 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen
19 6894 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
2 6888 libc.so.1`libc_malloc_lock libc.so.1`dgettext+0x5b
17 6846 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
8 6831 libc.so.1`_uberdata+0x3c0 0xc9c0f8b3
309 6818 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 6809 0xd02b3e80 libc.so.1`_thrp_create+0xc8
1 6772 0xd02ac4c0 libjvm.so`__1cCosTget_native_priority6F
1 6742 0xd02b3680 0
1 6735 0xd02b3fc0 libc.so.1`thr_create+0x42
11 6734 libc.so.1`_uberdata+0x3c0 0
(BTW: I have installed java dtrace probes from https://solaris10-dtrace-vm-agents.dev.java.net/ That explains libdvmti in 'Caller' columns)
Add the winner is: libc_malloc_lock ! As Adam Leventhal said: "In multi-threaded apps, lock contention can be huge performance killer" (http://blogs.sun.com/ahl/entry/baby_s_first_dtrace). Because malloc on Solaris performs well only in single-threaded applications tomorrow I will try to run the java program using another memory allocator, mtmalloc which was designed for multi-threaded apps (you will find comparison of memory allocators here http://developers.sun.com/solaris/articles/multiproc/multiproc.html):LD_PRELOAD=/usr/lib/libmtmalloc.so; export LD_PRELOAD
Because the problem with locks comes after a few days of running I will have to wait again.
But two questions remains:
Do you imagine doing the same analysis without dtrace ?
Because the problem with locks comes after a few days of running I will have to wait again.
But two questions remains:
- what is causing (which java operations) so big memory pressure (frequent allocations/deallocations) ?
- is it a bug in jvm or jvm is doing really hard work caused by the java app ?
Do you imagine doing the same analysis without dtrace ?