2007-10-03

Sunsolve: A deeper look into vmstat statistics using DTrace and mdb

There is an excellent document released two days ago at Sunsolve:
A deeper look into vmstat statistics using DTrace and mdb
The document describes how to drill down into vmstat statistics using DTrace and mdb. There are so many fantastic DTrace and mdb commands that it is really worth looking (like any other DTrace documentation ;-) )
I like especially the mdb command to see swapped out processes:

mdb -k << EOF
::walk thread myvar|::print kthread_t t_schedflag|::grep .==0|::eval p_user.u_comm
EOF

Excellent document !

One should also note that there is (for a long time) similar document: "Using DTrace to understand mpstat and vmstat output"

PS. What a pity that this page is only for customers with Sunsolve account. Is should be available for everyone, especially for Linux advocates ;-)

2007-08-07

ZFS vs VxFS vs UFS on SCSI array

Back from holidays. Yes. :-)
I wish it were longer ... But coming back to reality.

While ZFS is renown for its love to particular disks not RAID arrays, I wanted to test how it works on such array. The reason for the test is that in datacenters RAID arrays are wide-spread and this is rather difficult to find servers rooms where disks (without HW RAID) are dominant. At the moment I could test only 3310 (configured with RAID10) connected to v880 with Solaris 10 11/06. There was one SCSI cable between them (to not favour VxDMP). The test was executed using Filebench with OLTP workload:

set $runtime=60
set $iosize=8k
set $nshadows=200
set $ndbwriters=10
set $usermode=20000
set $filesize=5g
set $memperthread=1m
set $workingset=0
set $cached=0
set $logfilesize=10m
set $nfiles=10
set $nlogfiles=1
set $directio=1

The setting were used to simulate Oracle workload with directio and 8k IO.
Below are results (number of IO per second):


I wasn't surprised that ZFS is slower than the others. But was astonished that the differ is so huge. On the other hand I must admit that ZFS is still quite young and there is constant improvement to its performance.
There are also some links which might help in understanding ZFS behaviour regarding OLTP workload:
http://blogs.sun.com/roch/entry/zfs_and_oltp
http://blogs.sun.com/realneel/entry/zfs_and_databases

Update:
I have changed title of the chart to keep it a bit more readable. :-)

2007-06-15

ZFS vs ( VxFS | EXT3 | NTFS )

There is interesting entry at http://blogs.sun.com/Peerapong/entry/solaris_zfs_performance
about performance comparison of ZFS versus VxFS, ext3 and NTFS. While in almost all workloads ZFS is better or at least comparable to those three, in one of the most often met environments, OLTP with noncached, 8kB io (like Oracle on filesystem mounted without buffering - aka directio), VxFS is still almost three times faster then ZFS.

2007-06-06

DTrace & Postgresql - io tuning

We have one application based on postgresql database. Recently we encountered some performance problems with uploading bigger amount of data into the database. To find out what is going on I wanted to compare normal utility of the server to the state when we are loading the additional data and see the differences. One of the things to watch was IO. To see what the server was doing at the moment I ran simple dtrace one-liner:

time dtrace -wn 'io:::start {@[args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W"] = count ();}' -n 'END {trunc(@,3)}'

dtrace: description 'io:::start ' matched 6 probes
dtrace: description 'END' matched 1 probe
dtrace: allowing destructive actions
^C
CPU ID FUNCTION:NAME
3 2 :END

/export/home/postgres/nowa_baza/base/16399/33743 R 4816
/export/home/postgres/nowa_baza/pg_xlog/000000010000000200000068 W 5436
/export/home/postgres/nowa_baza/base/16399/40932 R 138166

real 12m36.483s
user 0m0.601s
sys 0m0.477s


At the moment the server was really idle so such amount of reads from database file was astounding ! I told our DBA about it and he answered "Wait a moment ...". After a minute I ran again the same DTrace one-liner and :

dtrace: description 'io:::start ' matched 6 probes
dtrace: description 'END' matched 1 probe
dtrace: allowing destructive actions
^C
CPU ID FUNCTION:NAME
3 2 :END

/export/home/postgres/nowa_baza/pg_xlog/000000010000000200000067 W 3005
/export/home/postgres/nowa_baza/base/16399/33743 R 3812
/export/home/postgres/nowa_baza/pg_xlog/000000010000000200000068 W 4226

The reads from nowa_baza/base/16399/40932 disappeared ! I asked again our DBA what he had done and he answered "Just refreshed database statistics."

You know: just daily use of DTrace :-)



2007-05-21

dtrace & java: part 3

My last struggles with application in java led me to the following observations (again from the very begin):

PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
27607 inkaso 502M 436M sleep 59 0 10:59:30 5.4% java/91
517 root 3916K 3080K sleep 59 0 0:02:09 0.1% nscd/26
18198 root 2848K 2732K cpu1 59 0 0:00:00 0.0% prstat/1
18202 root 964K 676K sleep 59 0 0:00:00 0.0% sleep/1
[...]

prstat -mL -p 27607

PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
27607 inkaso 3.7 0.0 0.0 0.0 0.0 4.4 92 0.1 163 4 101 2 java/5142
27607 inkaso 0.8 0.0 0.0 0.0 0.0 99 0.0 0.0 7 2 15 0 java/4
27607 inkaso 0.1 0.0 0.0 0.0 0.0 0.2 100 0.0 6 0 127 0 java/5676
27607 inkaso 0.1 0.0 0.0 0.0 0.0 0.0 100 0.0 8 1 72 0 java/5679
27607 inkaso 0.1 0.0 0.0 0.0 0.0 0.0 100 0.0 4 0 52 0 java/5662
[...]

Thread id = 5142 is:

bash-3.00# grep 5142 nohup.out
S, 5142, 192.168.210.13-Connection-5122, 2007-05-21--07:24:16
S, 5162, 192.168.209.216-Connection-5142, 2007-05-21--07:33:21

bash-3.00# netstat -a|grep 192.168.210.13
inkaso220.6400 192.168.210.135.50835 5840 0 49640 0 ESTABLISHED
inkaso220.6400 192.168.210.13.51108 2144 0 49312 0 CLOSE_WAIT

It means, again, that some thread (here 5142) is doing some work instead of being already dead. The state of network connection is CLOSE_WAIT and according to: sunsolve document "indefinite CLOSE_WAIT state normally indicates some application level bug.".
And more:

bash-3.00# while true
> do
> date
> netstat -a|grep 192.168.210.13
> sleep 30
> done
Mon May 21 09:12:18 CEST 2007
inkaso220.6400 192.168.210.139.50235 5840 0 49640 0 ESTABLISHED
inkaso220.6400 192.168.210.135.50835 5840 0 49640 0 ESTABLISHED
inkaso220.6400 192.168.210.13.51108 2144 0 49312 0 CLOSE_WAIT
inkaso220.6400 192.168.210.131.59018 5840 0 49640 0 ESTABLISHED
Mon May 21 09:12:49 CEST 2007
inkaso220.6400 192.168.210.139.50235 5840 0 49640 0 ESTABLISHED
inkaso220.6400 192.168.210.135.50835 5840 0 49640 0 ESTABLISHED
inkaso220.6400 192.168.210.13.51108 2144 0 49312 0 CLOSE_WAIT
inkaso220.6400 192.168.210.131.59018 5840 0 49640 0 ESTABLISHED
Mon May 21 09:13:19 CEST 2007
inkaso220.6400 192.168.210.139.50235 5840 0 49640 0 ESTABLISHED
inkaso220.6400 192.168.210.135.50835 5840 0 49640 0 ESTABLISHED
inkaso220.6400 192.168.210.13.51108 2144 0 49312 0 CLOSE_WAIT
inkaso220.6400 192.168.210.131.59018 5840 0 49640 0 ESTABLISHED
...
Mon May 21 10:21:07 CEST 2007
inkaso220.6400 192.168.210.135.50835 5840 0 49640 0 ESTABLISHED
inkaso220.6400 192.168.210.13.51108 2144 0 49312 0 CLOSE_WAIT
Mon May 21 10:21:37 CEST 2007
inkaso220.6400 192.168.210.135.50835 5840 0 49640 0 ESTABLISHED
inkaso220.6400 192.168.210.13.51108 2144 0 49312 0 CLOSE_WAIT
...

The changes which our java programmer did last time has not helped anything. Meanwhile a new software developer took over the software (the previous one is tasked with another one) and now I am going to pass all my observations to him.

2007-05-15

How to change zone configuration ?

I always forget how to change configuration of already configured zone. I hope from now on I won't forget it.

How to remove fs already defined/assigned to a zone:

bash-3.00# zonecfg -z urticaS
zonecfg:urticaS> info
zonepath: /zones/urticaS
autoboot: true
pool:
inherit-pkg-dir:
dir: /lib
inherit-pkg-dir:
dir: /platform
inherit-pkg-dir:
dir: /sbin
inherit-pkg-dir:
dir: /usr
fs:
dir: /d8
special: /vxfs/sowa-urticaS-vol
raw not specified
type: lofs
options: []
net:
address: 192.168.220.113
physical: ce3
zonecfg:urticaS> remove fs dir=/d8
zonecfg:urticaS> info
zonepath: /zones/urticaS
autoboot: true
pool:
inherit-pkg-dir:
dir: /lib
inherit-pkg-dir:
dir: /platform
inherit-pkg-dir:
dir: /sbin
inherit-pkg-dir:
dir: /usr
net:
address: 192.168.220.113
physical: ce3
zonecfg:urticaS> verify
zonecfg:urticaS> commit
zonecfg:urticaS> exit

How to change fs already defined/assigned to a zone:

bash-3.00# zonecfg -z obs1
zonecfg:obs1> select fs dir=/d8
zonecfg:obs1:fs> info
fs:
dir: /d8
special: /vxfs/drozd-obs1-vol
raw not specified
type: lofs
options: []
zonecfg:obs1:fs> set special=/vxfs/wilga3-obs1-vol
zonecfg:obs1:fs> info
fs:
dir: /d8
special: /vxfs/wilga3-obs1-vol
raw not specified
type: lofs
options: []
zonecfg:obs1:fs> end
zonecfg:obs1> verify
zonecfg:obs1> commit
zonecfg:obs1> exit


Just for myself ... :-)

dtrace & java: part 2

Do you remember my last problem with java application which behaves strangely ? Changing memory allocator from standard malloc into mtmalloc has not changed a lot. There is still high cpu pressure.
I thought for a while how to change approach to catch what is wrong with the application. Because prstat can show which thread is using cpu I started again with it:

PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
28123 przemol 596M 522M sleep 59 0 15:42:15 17% java/172
15973 root 7600K 3876K sleep 59 0 0:00:00 0.1% sshd/1
15986 przemol 2840K 2612K cpu1 59 0 0:00:00 0.0% prstat/1
126 daemon 3916K 2044K sleep 59 0 0:00:00 0.0% kcfd/3
15978 przemol 1160K 932K sleep 49 0 0:00:00 0.0% ksh/1
15976 przemol 7372K 1928K sleep 59 0 0:00:00 0.0% sshd/1


And again which java threads are using cpu the most:

PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
28123 przemol 12 0.0 0.0 0.0 0.0 4.8 83 0.1 152 18 83 0 java/5069
28123 przemol 2.2 0.0 0.0 0.0 0.0 98 0.0 0.0 7 4 26 0 java/4
28123 przemol 0.2 0.0 0.0 0.0 0.0 0.3 99 0.0 35 0 162 0 java/5364
28123 przemol 0.1 0.0 0.0 0.0 0.0 0.1 100 0.1 6 0 70 0 java/5359
28123 przemol 0.1 0.0 0.0 0.0 0.0 0.1 100 0.0 6 0 56 0 java/5351
28123 przemol 0.1 0.0 0.0 0.0 0.0 0.1 100 0.0 5 0 56 0 java/5343
28123 przemol 0.1 0.0 0.0 0.0 0.0 0.1 100 0.0 5 0 57 0 java/5355
28123 przemol 0.1 0.0 0.0 0.0 0.0 0.1 100 0.0 4 0 70 0 java/5354
28123 przemol 0.1 0.0 0.0 0.0 0.0 0.1 100 0.0 7 0 53 0 java/5331
28123 przemol 0.1 0.0 0.0 0.0 0.0 0.1 100 0.0 10 0 60 0 java/5347
28123 przemol 0.1 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 40 0 java/5325


As we can see, java thread 5069 is generating the cpu pressure. But what is doing the thread ? As I mentioned before the application is a sort of interface between mobile devices and database: whenever end user commits an operation on mobile device, it is sent to the java application and the application commits it into database. Every device is served by one, dedicated java thread. Our java programmer designed the application in a way that while creating the threads each thread has a meaningful name, e.g. 192.168.1.1-Connection-Number. So if I have thread name (the application writes all the activity into separated, connected to thread name, logs) and thread id (tid - from prstat) the only problem was how to connect them together ? Because java dtrace probes doesn't allow me (does it ?) to trace thread names during all the operations (except thread-start and thread-end probe) I had to watch all thread-start and thread-end probes during all the activity. To do so I wrote a simple DTrace script (to be more precise: two scripts) which allowed me to watch whole life (start, end and name) of all threads. The first one, istart.d :

#!/usr/sbin/dtrace -ws

proc:::exec-success
/zonename == "zone1" && execname == "java"/
/* there is only one java process in zone1 */
{
printf ("Start: pid=%d\n", pid);
system ("/root/d/itrace2.d %d", pid);
}

And itrace2.d:

#!/usr/sbin/dtrace -wqs

dvm$1:::thread-start
{
printf ("S, %10d, %35s, ",tid, copyinstr(arg0));
system("/root/print_date");
}

dvm$1:::thread-end
{
printf ("E, %10d, ", tid);
system ("/root/print_date");
}

And simple print_date one-line script:

/usr/bin/date +%Y-%m-%d--%H:%M:%S

I have run it by 'nohup istart.d &'. And the result is:

bash-3.00# head -30 nohup.out
dtrace: script './istart.d.old' matched 1 probe
dtrace: allowing destructive actions
CPU ID FUNCTION:NAME
1 554 exec_common:exec-success Start: pid=28123
S, 13, Timer-0, 2007-05-10--06:43:21
S, 14, RMI TCP Accept-0, 2007-05-10--06:43:21
S, 15, RMI TCP Accept-8000, 2007-05-10--06:43:21
E, 1, 2007-05-10--06:43:21
S, 1, DestroyJavaVM, 2007-05-10--06:43:21
S, 17, Connection Controller, 2007-05-10--06:43:21
S, 19, Server, 2007-05-10--06:43:21
S, 18, Statystyka watkow, 2007-05-10--06:43:21
S, 1, main, 2007-05-10--06:43:21
S, 20, 192.168.210.109-Connection-0, 2007-05-10--06:43:42
S, 21, 192.168.209.74-Connection-1, 2007-05-10--06:43:47
S, 22, 192.168.210.116-Connection-2, 2007-05-10--06:43:53
S, 23, 192.168.209.125-Connection-3, 2007-05-10--06:44:06
S, 24, 192.168.210.11-Connection-4, 2007-05-10--06:44:30
S, 25, 192.168.209.12-Connection-5, 2007-05-10--06:45:08
S, 26, 192.168.210.72-Connection-6, 2007-05-10--06:45:36
S, 27, 192.168.209.159-Connection-7, 2007-05-10--06:45:44
S, 28, 192.168.210.93-Connection-8, 2007-05-10--06:45:55
S, 29, 192.168.209.238-Connection-9, 2007-05-10--06:46:03
E, 23, 2007-05-10--06:46:50
E, 20, 2007-05-10--06:47:02
S, 30, 192.168.209.116-Connection-10, 2007-05-10--06:47:10
S, 31, 192.168.210.5-Connection-11, 2007-05-10--06:47:34
E, 22, 2007-05-10--06:47:49
S, 32, 192.168.209.210-Connection-12, 2007-05-10--06:49:05
S, 33, 192.168.209.55-Connection-13, 2007-05-10--06:50:11
...

First column says whether the thread is starting (S) or finishing its life (E). Second column tells thread id and third column thread name. Thus I can watch when exactly started and ended each thread. For example:

bash-3.00# grep " 20," nohup.out
S, 20, 192.168.210.109-Connection-0, 2007-05-10--06:43:42
E, 20, 2007-05-10--06:47:02

what means that thread no 20 was alive for about 3-4 minutes. In my case thread id = 5069 is the most interesting:

bash-3.00# grep " 5069" nohup.out
S, 5069, 192.168.210.185-Connection-5049, 2007-05-14--21:45:20

It started yesterday and is still alive. But the most valuable information is thread name ! Now I can look at logs related to this thread:

bash-3.00# tail log_192.168.210.185_May14
May 14, 2007 6:57:02 PM 192.168.210.185-Connection-4695 Zamykam polaczenia do bazy TC
May 14, 2007 6:57:02 PM 192.168.210.185-Connection-4695 Zamykam polaczenia do bazy WONLOK
May 14, 2007 6:57:15 PM May 14, 2007 6:57:15 PM Polaczenie z IP:192.168.210.185
May 14, 2007 6:57:15 PM 192.168.210.185-Connection-4999 IP = 192.168.210.185
May 14, 2007 6:57:15 PM May 14, 2007 6:57:15 PM 192.168.210.185 Metoda waitForCommand(Hashtable pCommands,,) ROZPOCZYNA DZIA?ANIE
May 14, 2007 6:57:15 PM pCommands.size() 21
May 14, 2007 9:45:19 PM May 14, 2007 9:45:19 PM Polaczenie z IP:192.168.210.185
May 14, 2007 9:45:19 PM 192.168.210.185-Connection-5049 IP = 192.168.210.185
May 14, 2007 9:45:19 PM May 14, 2007 9:45:19 PM 192.168.210.185 Metoda waitForCommand(Hashtable pCommands,,) ROZPOCZYNA DZIA?ANIE
May 14, 2007 9:45:19 PM pCommands.size() 21

According to author of the application the above logs indicates normal activity. So the thread is just waiting for activity on mobile device side. But:

bash-3.00# netstat -a|grep 192.168.210.185
bash-3.00#

gives nothing ! There is no ESTABLISHED state (the application exchanges data with the devices using TCP/IP protocol and sockets). Whenever I check any other thread, e.g. 5778 I get:

bash-3.00# grep " 5778" nohup.out
S, 5778, 192.168.209.162-Connection-5758, 2007-05-15--10:24:07

and

bash-3.00# netstat -a |grep 192.168.209.162
inkaso220.6400 192.168.209.162.50639 2144 28 49312 0 ESTABLISHED

So each "normal" thread has TCP/IP socket in an ESTABLISHED state. But the thread 5069 has nothing ! It seems that the mobile device has already disconnected but the application doesn't register that. I have shown that to the author and she mentioned that she doesn't check timeouts in TCP/IP sockets because there was such requirement during design (the mobile devices are connected using GPRS and it is cheaper (!) to hold already established connection for many hours then initiate a new one). She admitted that there is a line with setSoTimeout but it had been commented out. She agreed to uncomment it and see if it changes anything tomorrow after restart.

2007-04-03

Linux Fedora 6 vs Solaris 10 (apache+php)

Recently we were doing internal basic performance tests of apache(+php) webserver on Fedora and Solaris 10. Both systems (working in 64bit mode) were installed on two identical, small servers (Intel CPU 3.2 GHz, RAM 512 MB). While this configuration is rather not popular in big datacenters we wanted to test both OSes (in performance terms). At the begin there were no OS tuning and both servers had the same httpd.conf. On Solaris 10 CoolStack 1.1 was installed and configured. The result (reqs/sec):



A few details:
  • there were almost all apache modules removed from httpd.conf
  • PHP was used with APC
  • the command we used: "ab -c 50 -n 100000 http://192.168.1.1/test.php"
  • sizeof (file1+file2+file3+file4+file5) = 5Kb
  • test.php is simple:


require ('file1');
echo "Status OK\n";
$ala = 0;
$ola = "";
for ($i=0; $i<500; $i++) {
$ala += $i;
$ola .= "dsagfd5y32432";
}
require ('file2');
$ola = md5($ola.$ala);
require ('file3');
require ('file4');
require ('file5');
?>

While Fedora is a little bit faster than Solaris 10 there is one reason why I would choose the latter. Yes, you guessed: dtrace. But lets start from the very begin. Just simple vmstat 1 to see the whole picture:

kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr cd -- -- -- in sy cs us sy id
...
60 0 30 469784 73808 1 454 0 0 0 0 0 0 0 0 0 10933 49181 1204 55 45 0
56 0 30 465332 72860 1 685 0 0 0 0 0 0 0 0 0 11040 48967 1143 55 45 0
55 0 30 461044 72232 0 0 0 0 0 0 0 0 0 0 0 10942 49198 1084 55 45 0
53 0 30 461044 72228 1 392 0 0 0 0 0 0 0 0 0 11137 49314 1347 55 45 0
57 0 30 451020 71284 1 340 0 0 0 0 0 0 0 0 0 11083 49109 1169 55 45 0
60 0 30 444872 70656 2 814 0 0 0 0 0 0 0 0 0 10893 48968 1141 55 45 0
57 0 30 444436 70128 0 0 0 0 0 0 0 0 0 0 0 11061 49538 1173 55 45 0
59 0 30 440296 69704 2 497 0 0 0 0 0 0 0 0 0 11023 48569 1559 55 45 0
60 0 30 436140 69152 1 628 0 0 0 0 0 0 0 0 0 11031 49290 1369 55 45 0
...

CPU saturated but time is divided between user space and system calls. I was wonder why almost half of the time is spent in syscalls:

bash-3.00# dtrace -n 'syscall::: {@[execname,probefunc] = count();}'
...
httpd fchdir 20000
httpd writev 20002
httpd stat64 20002
httpd shutdown 20002
httpd getsockname 20002
httpd pollsys 20031
httpd accept 20032
httpd lwp_sigmask 20124
httpd sigaction 20186
httpd open 20248
httpd setitimer 40000
httpd fcntl 40194
httpd close 40444
httpd read 60516
httpd getcwd 100000
httpd resolvepath 100062
httpd gtime 120000
httpd semsys 480000

No surprise that httpd is doing the hardest job. But the syscalls ! Semsys ?!?! Why ? Let's see:

bash-3.00# dtrace -n 'syscall::semsys:entry /execname == "httpd"/ {@[ustack()] = count();}' -n 'END {trunc(@,5)}'
dtrace: description 'syscall::semsys:entry ' matched 1 probe
dtrace: description 'END ' matched 1 probe
^C
CPU ID FUNCTION:NAME
0 2 :END


libc.so.1`syscall+0x13
apc.so`apc_sem_unlock+0x35
apc.so`apc_cache_release+0x3e
apc.so`apc_deactivate+0x37
apc.so`apc_request_shutdown+0x16
apc.so`zm_deactivate_apc+0x29
libphp5.so`module_registry_cleanup+0x21
libphp5.so`zend_hash_apply+0x3d
libphp5.so`zend_deactivate_modules+0x6c
libphp5.so`php_request_shutdown+0x2fd
libphp5.so`php_handler+0x3a6
httpd`ap_invoke_handler+0xee
httpd`ap_process_request+0x18d
httpd`0x8084011
httpd`ap_process_connection+0x76
httpd`0x808a4d1
httpd`0x808a78a
httpd`ap_mpm_run+0xc69
httpd`main+0x4ea
httpd`_start+0x80
1626

libc.so.1`syscall+0x13
apc.so`apc_sem_lock+0x35
apc.so`apc_cache_release+0x32
apc.so`apc_deactivate+0x37
apc.so`apc_request_shutdown+0x16
apc.so`zm_deactivate_apc+0x29
libphp5.so`module_registry_cleanup+0x21
libphp5.so`zend_hash_apply+0x3d
libphp5.so`zend_deactivate_modules+0x6c
libphp5.so`php_request_shutdown+0x2fd
libphp5.so`php_handler+0x3a6
httpd`ap_invoke_handler+0xee
httpd`ap_process_request+0x18d
httpd`0x8084011
httpd`ap_process_connection+0x76
httpd`0x808a4d1
httpd`0x808a78a
httpd`ap_mpm_run+0xc69
httpd`main+0x4ea
httpd`_start+0x80
1650

libc.so.1`syscall+0x13
apc.so`apc_sem_unlock+0x35
apc.so`apc_cache_release+0x3e
apc.so`apc_deactivate+0x37
apc.so`apc_request_shutdown+0x16
apc.so`zm_deactivate_apc+0x29
libphp5.so`module_registry_cleanup+0x21
libphp5.so`zend_hash_apply+0x3d
libphp5.so`zend_deactivate_modules+0x6c
libphp5.so`php_request_shutdown+0x2fd
libphp5.so`php_handler+0x3a6
httpd`ap_invoke_handler+0xee
httpd`ap_process_request+0x18d
httpd`0x8084011
httpd`ap_process_connection+0x76
httpd`0x808a4d1
httpd`0x808a78a
httpd`ap_mpm_run+0xc69
httpd`main+0x4ea
httpd`_start+0x80
1650

libc.so.1`syscall+0x13
apc.so`apc_sem_lock+0x35
apc.so`apc_cache_release+0x32
apc.so`apc_deactivate+0x37
apc.so`apc_request_shutdown+0x16
apc.so`zm_deactivate_apc+0x29
libphp5.so`module_registry_cleanup+0x21
libphp5.so`zend_hash_apply+0x3d
libphp5.so`zend_deactivate_modules+0x6c
libphp5.so`php_request_shutdown+0x2fd
libphp5.so`php_handler+0x3a6
httpd`ap_invoke_handler+0xee
httpd`ap_process_request+0x18d
httpd`0x8084011
httpd`ap_process_connection+0x76
httpd`0x808a4d1
httpd`0x808a78a
httpd`ap_mpm_run+0xc69
httpd`main+0x4ea
httpd`_start+0x80
1650

libc.so.1`syscall+0x13
apc.so`apc_sem_unlock+0x35
apc.so`apc_cache_release+0x3e
apc.so`apc_deactivate+0x37
apc.so`apc_request_shutdown+0x16
apc.so`zm_deactivate_apc+0x29
libphp5.so`module_registry_cleanup+0x21
libphp5.so`zend_hash_apply+0x3d
libphp5.so`zend_deactivate_modules+0x6c
libphp5.so`php_request_shutdown+0x2fd
libphp5.so`php_handler+0x3a6
httpd`ap_invoke_handler+0xee
httpd`ap_process_request+0x18d
httpd`0x8084011
httpd`ap_process_connection+0x76
httpd`0x808a4d1
httpd`0x808a78a
httpd`ap_mpm_run+0xc69
httpd`main+0x4ea
httpd`_start+0x80
1650

Well, at last I wanted to see what type of semsys:

bash-3.00# dtrace -n 'syscall::semsys:entry /execname == "httpd"/ {@[arg0] = count();}'
dtrace: description 'syscall::semsys:entry ' matched 1 probe
^C

2 240000

According to Solaris Dynamic Tracing Guide, "the system calls related to System V semaphores (semctl(2), semget(2), semids(2), semop(2), and semtimedop(2)) are implemented as suboperations of a single system call, semsys." Here "2" means SEMOP which basically a semop call.

During the tests I have shown above dtrace statistics to our PHP guru (he is sort of "nothing-but-Fedora" guy). He was really, really surprised !
I wasn't :-)

2007-03-16

Real life example of dtrace usage

If you are looking for simple example how one dtrace script can help in performance problems just look at that link: http://blogs.sun.com/portalperformance/entry/hire_dtrace_detective_to_expose
Just six lines of dtrace script.
The simplicity of the example is astounding.

2007-03-02

Running apache with limited privileges (as non-root)

There is interesting document "Limiting Service Privileges in the Solaris 10 Operating System " which describe how to start services with limited privileges: http://www.sun.com/blueprints/0505/819-2680.pdf
and http://www.sun.com/software/solaris/howtoguides/s10securityhowto.jsp#5
The examples uses Apache webserver and show how to start it as non-root user.

2007-02-15

dtrace & java: part 1

We have application written in java which is a sort of proxy between mobile device and our database. Every time end user commits some transaction on the mobile device, it is sent to the application which, on behalf of the device, commits it in database. It works very well and doesn't overload server. In fact cpu is almost all the time idle (sar output):

...
08:00:00 1 0 0 98
08:10:00 3 1 0 97
08:20:00 2 1 0 97
08:30:00 3 1 0 97
08:40:00 3 1 0 97
08:50:00 2 1 0 97
09:00:00 2 1 0 97
09:10:00 2 1 0 97
09:20:00 2 1 0 97
09:30:00 2 1 0 98
09:40:00 2 0 0 98
09:50:00 2 0 0 98
10:00:00 1 0 0 98
10:10:00 2 0 0 98
10:20:00 1 0 0 98
10:30:01 1 0 0 98
10:40:00 1 0 0 99
10:50:00 1 0 0 99
11:00:00 1 0 0 99
11:10:00 1 0 0 99
11:20:00 1 0 0 99
11:30:00 1 0 0 99
11:40:00 1 0 0 99
11:50:00 1 0 0 99
12:00:00 1 0 0 99
...
Such workload is the same for many days.
But from time to time I noticed strange behaviour - cpu becomes very busy (again: for many days):
...
06:10:00 54 1 0 45
06:20:00 54 2 0 45
06:30:00 53 1 0 46
06:40:00 52 1 0 47
06:50:00 52 1 0 47
07:00:00 52 1 0 47
07:10:00 52 1 0 47
07:20:00 52 1 0 47
07:30:00 52 1 0 47
07:40:00 52 1 0 46
07:50:00 52 1 0 46
08:00:00 53 1 0 46

...
As you can see even in the night ! I didn't expected that. And type of workload doesn't justify such behaviour (I have also installed the latest java patch: 1.5.0_11)
I was courious and run prstat to see which application might do that (I was almost sure that the java app is guilty but to be sure):
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
4498 przemol 964M 804M cpu0 0 10 62:18:01 51% java/81
26850 root 7600K 3796K sleep 59 0 0:00:00 0.1% sshd/1
26861 root 1076K 900K sleep 59 0 0:00:00 0.0% sh/1
26866 root 2832K 2608K cpu1 59 0 0:00:00 0.0% prstat/1
136 daemon 3932K 1704K sleep 59 0 0:00:01 0.0% kcfd/3
26865 root 2452K 1680K sleep 49 0 0:00:00 0.0% bash/1
26853 przemol 7372K 2000K sleep 59 0 0:00:00 0.0% sshd/1
26855 przemol 1160K 932K sleep 59 0 0:00:00 0.0% ksh/1

...
Indeed, the application used most of cpu time. This time prstat -mL -p 4498:
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
4498 przemol 48 0.0 0.0 0.0 0.0 21 31 0.6 77 165 77 1 java/5897
4498 przemol 31 0.0 0.0 0.0 0.0 20 49 0.4 103 88 96 4 java/6948
4498 przemol 19 0.0 0.0 0.0 0.0 80 0.5 0.0 40 58 178 0 java/4
4498 przemol 1.5 0.2 0.0 0.0 0.0 0.0 98 0.0 93 4 304 0 java/7629
4498 przemol 1.0 0.1 0.0 0.0 0.0 99 0.0 0.1 2K 3 2K 0 java/3
4498 przemol 0.9 0.1 0.0 0.0 0.2 99 0.0 0.0 511 1 521 0 java/2
4498 przemol 0.0 0.0 0.0 0.0 0.0 0.9 99 0.0 152 1 89 0 java/16
4498 przemol 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 51 0 java/7634
4498 przemol 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 24 0 35 0 java/7
4498 przemol 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 3 0 3 0 java/17
[...]

What caught my eye was the LCK column. The application spent significant amount of time waiting for locks. I remember that in Solaris 10 plockstat can reveal user-level lock statistics:
bash-3.00# plockstat -n 20 -A -e 600 -p `pgrep java`

[...]

Count nsec Lock Caller
-------------------------------------------------------------------------------
574 30515 libc.so.1`libc_malloc_lock libjvm.so`__1cCosEfree6Fpv_v_+0x18

[...]

Mutex hold

Count nsec Lock Caller
-------------------------------------------------------------------------------
1 16340 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 15707 libdvmti.so`.XAKUgFjQxPCE3Rm.data libdvmti.so`class_create_entry+0xf7
144 15211 0xd02b8900 libjvm.so`__1cCosTget_native_priority6F
690 14849 0xd02ac1c0 libjvm.so`__1cCosTget_native_priority6F
1 14822 0xd02b4000 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 14364 libc.so.1`libc_malloc_lock libjava_crw_demo.so`method_write_all+0x
543 14317 0xd02b8240 libjvm.so`__1cCosTget_native_priority6F
10107 14224 libc.so.1`libc_malloc_lock libnet.so`Java_java_net_SocketInputStre
1 13958 0xd02b42c0 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13940 0xd02b4180 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13809 0xd02b4280 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13804 libdvmti.so`.XAKUgFjPxPCEnMm.data+0x8 libdvmti.so`cbClassFileLoadHook+0x3e
1 13800 0xd02b3f80 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13697 0xd02b40c0 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13679 0xd02b4080 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13677 0xd02b4100 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13661 0xd02b4240 libjvm.so`__1cCosPpd_start_thread6FpnGT
6 13655 libc.so.1`_uberdata libc.so.1`_thrp_create+0xc8
1 13644 0xd02b4200 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 13601 libc.so.1`libc_malloc_lock libnsl.so.1`free_entry+0x1d

[...]

Mutex spin

Count nsec Lock Caller
-------------------------------------------------------------------------------
1 13180 libc.so.1`libc_malloc_lock libjava.so`JNU_GetStringPlatformChars+0
1 13053 libc.so.1`libc_malloc_lock libc.so.1`dgettext+0x5b
493 12796 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28

Mutex hold

Count nsec Lock Caller
-------------------------------------------------------------------------------
408 12779 libc.so.1`_uberdata+0x3c0 0xffffffff
10107 12549 libc.so.1`libc_malloc_lock libnet.so`Java_java_net_SocketInputStre
18 12461 libnsl.so.1`netpp_mutex libnsl.so.1`setnetconfig+0x3e
1 12429 0xd02b3fc0 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 12388 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 12255 libc.so.1`libc_malloc_lock libjava_crw_demo.so`java_crw_demo+0x437
1 12142 libc.so.1`_uberdata+0x440 0xffffffff

Mutex spin

Count nsec Lock Caller
-------------------------------------------------------------------------------
2 12115 libc.so.1`libc_malloc_lock libjava.so`JNU_ReleaseStringPlatformCha
1 12038 libc.so.1`libc_malloc_lock libnsl.so.1`netconfig_dup+0x1e

Mutex hold

Count nsec Lock Caller
-------------------------------------------------------------------------------
440 11918 libc.so.1`_door_lock libnsl.so.1`_door_gethostbyname_r+0x7e

Mutex spin

Count nsec Lock Caller
-------------------------------------------------------------------------------
1 11892 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen

Mutex hold

Count nsec Lock Caller
-------------------------------------------------------------------------------
1 11887 0xd02b4300 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 11830 0xd02b3f40 libjvm.so`__1cCosPpd_start_thread6FpnGT
1 11789 0xd02b4000 libc.so.1`thr_create+0x42
1 11403 0xd02b41c0 libjvm.so`__1cCosPpd_start_thread6FpnGT
2 10895 libc.so.1`_uberdata+0x40 0xd0582481
1 10830 0xd02b3fc0 0
252 10440 libc.so.1`libc_malloc_lock libjava.so`JNU_GetStringPlatformChars+0
18 10396 libnsl.so.1`netpp_mutex libnsl.so.1`getnetconfigent+0x42

[...]

Mutex hold

Count nsec Lock Caller
-------------------------------------------------------------------------------
2 9967 libc.so.1`libc_malloc_lock libjvm.so`__1cCosEfree6Fpv_v_+0x18
238 9828 libc.so.1`_door_lock libnsl.so.1`_door_gethostbyaddr_r+0x73
220 9575 libnsl.so.1`netpp_mutex libnsl.so.1`getnetconfigent+0x42
1 9491 libc.so.1`_uberdata+0xac0 libjvm.so`__1cDhpiEread6FipvI_I_+0xa8

Mutex spin

Count nsec Lock Caller
-------------------------------------------------------------------------------
2 9386 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28

Mutex hold

Count nsec Lock Caller
-------------------------------------------------------------------------------
1 9351 libc.so.1`libc_malloc_lock libdvmti.so`class_create_entry+0x107
2 9240 libc.so.1`libc_malloc_lock libc.so.1`dgettext+0x5b
2 9167 libc.so.1`libc_malloc_lock libjava_crw_demo.so`method_init+0x89
148 8997 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen
309 8928 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 8862 libc.so.1`libc_malloc_lock libnsl.so.1`free_entry+0x1d
1 8834 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 8765 0xd02b39c0 0
3536 8747 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 8727 libc.so.1`libc_malloc_lock libnsl.so.1`free_entry+0x1d
1 8496 0x8071a50 libdvmti.so`hash_lookup+0x63
1 8489 0xd02b3fc0 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8380 0xd02b3ec0 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8380 0xd02b3d80 0
148 8357 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen
1 8334 0xd02b4140 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8321 0xd02b4200 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8308 0xd02b40c0 libc.so.1`thr_create+0x42
1 8305 0xd02b4240 libjvm.so`__1cG_start6Fpv_0_+0x6a
1 8280 0xd02b4080 libjvm.so`__1cG_start6Fpv_0_+0x6a

Mutex spin

Count nsec Lock Caller
-------------------------------------------------------------------------------
3 7443 libc.so.1`libc_malloc_lock libjava.so`JNU_ReleaseStringPlatformCha

Mutex hold

Count nsec Lock Caller
-------------------------------------------------------------------------------
2 7302 libc.so.1`_uberdata+0x40 0xd0582481
148 7291 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen
2 7248 libc.so.1`_uberdata+0xe00 libc.so.1`setlocale+0x2b
1 7200 0xd02b2a80 0
2 7119 libc.so.1`libc_malloc_lock libjava_crw_demo.so`method_init+0x133
1 7101 0xd02b4040 0
91058 7057 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 6940 0x8071a50 libdvmti.so`hash_add+0x70
1 6935 libc.so.1`libc_malloc_lock libdvmti.so`class_set_method_count+0x6e
148 6922 libc.so.1`libc_malloc_lock libmanagement.so`Java_com_sun_managemen
19 6894 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
2 6888 libc.so.1`libc_malloc_lock libc.so.1`dgettext+0x5b
17 6846 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
8 6831 libc.so.1`_uberdata+0x3c0 0xc9c0f8b3
309 6818 libc.so.1`libc_malloc_lock libjvm.so`__1cCosGmalloc6FI_pv_+0x28
1 6809 0xd02b3e80 libc.so.1`_thrp_create+0xc8
1 6772 0xd02ac4c0 libjvm.so`__1cCosTget_native_priority6F
1 6742 0xd02b3680 0
1 6735 0xd02b3fc0 libc.so.1`thr_create+0x42
11 6734 libc.so.1`_uberdata+0x3c0 0

(BTW: I have installed java dtrace probes from https://solaris10-dtrace-vm-agents.dev.java.net/ That explains libdvmti in 'Caller' columns)

Add the winner is: libc_malloc_lock ! As Adam Leventhal said: "In multi-threaded apps, lock contention can be huge performance killer" (http://blogs.sun.com/ahl/entry/baby_s_first_dtrace). Because malloc on Solaris performs well only in single-threaded applications tomorrow I will try to run the java program using another memory allocator, mtmalloc which was designed for multi-threaded apps (you will find comparison of memory allocators here http://developers.sun.com/solaris/articles/multiproc/multiproc.html):

LD_PRELOAD=/usr/lib/libmtmalloc.so; export LD_PRELOAD

Because the problem with locks comes after a few days of running I will have to wait again.
But two questions remains:
  • what is causing (which java operations) so big memory pressure (frequent allocations/deallocations) ?
  • is it a bug in jvm or jvm is doing really hard work caused by the java app ?
Because java details are beyond my knowledge I have asked our java guru for help.

Do you imagine doing the same analysis without dtrace ?



2007-02-14

Just start

My name is Przemysław Bąk (nickname: przemol) and I am UNIX sysadmins in Lodz, Poland. Finally I decided to write a blog. Area of interest:
  • Performance tuning
  • Solaris (especially Solaris 10)
  • Dtrace
  • Unix in general
  • Storage
  • ...
The future is unknown ... ;-)