2008-02-17

Solaris IO tuning: monitoring disk queue - lack of knowledge

A few months ago we faced some performance problems on one of our servers. There is One Very Well Known Big Database (OVWKBD - complicated, isn't it ? ;-) ) running on it. One end user reported us that there are some hangs during office hours. We (me and one of ours DBA who is responsible for the OVWKBD) were surprised since it had never happened before (or, as I assume, nobody had told us before) and began investigation. After a few days our DBA pointed out that it may be related to redo logs writing (which were, as the rest of database, on SAN-attached disk array). In fact he was sure that it is the problem but when I asked for any proof he didn't deliver any. He insisted on changing the array configuration but since I don't like to proceed blindly, I was going to monitor io (disk) queue(s) before any significant change. I set /etc/system according to docs and the ssd_max_throttle variable is the key setting. But still wanted to be more aware what is going on at the disk queue level and setup ssd_max_throttle according to _real_ needs. When I started hunting high and low I realized that the disk queue (which is after all one of the key areas of IO tuning) is poorly documented so I began desperately seeking any knowledge. Neither Sunsolve nor docs.sun.com helped me. One page, http://wikis.sun.com/display/StorageDev/The+Solaris+OS+Queue+Throttle, gave me a tiny piece of information but still ...

Oh, God ! I don't believe that there is no docs about it !

I tried Opensolaris mailing lists but still no really useful information. In one e-mail, Robert Miłkowski, suggested to use scsi.d DTrace script. I had used it before but it hadn't helped me. Maybe I wasn't able to read it properly ? Well, at least I could try it again:

00000.868602160 fp2:-> 0x2a WRITE(10) address 2287:76, lba 0x00897a89, len 0x000002, control 0x00 timeout 60 CDBP 6002064db9c RDBMS(25751) cdb(10) 2a0000897a8900000200
00000.890551520 fp2:-> 0x28 READ(10) address 2287:76, lba 0x00a877f5, len 0x000001, control 0x00 timeout 60 CDBP 3000e37cde4 RDBMS(23974) cdb(10) 280000a877f500000100
00000.584069360 fp2:-> 0x2a WRITE(10) address 2287:66, lba 0x019c01f8, len 0x00013d, control 0x00 timeout 60 CDBP 60026a1e4d4 RDBMS(18244) cdb(10) 2a00019c01f800013d00
00000.239667600 fp2:-> 0x2a WRITE(10) address 2287:18, lba 0x00d93c20, len 0x000010, control 0x00 timeout 60 CDBP 300270fadec RDBMS(25753) cdb(10) 2a0000d93c2000001000
00000.958698480 fp2:-> 0x2a WRITE(10) address 2287:08, lba 0x00001f10, len 0x000010, control 0x00 timeout 60 CDBP 30025654084 sched(0) cdb(10) 2a0000001f1000001000
00000.240042160 fp2:<- 0x2a WRITE(10) address 2287:16, lba 0x01d4889c, len 0x000010, control 0x00 timeout 60 CDBP 60019209b84, reason 0x0 (COMPLETED) state 0x1f Time 820us
00000.240213360 fp2:<- 0x2a WRITE(10) address 2287:15, lba 0x0274b920, len 0x000010, control 0x00 timeout 60 CDBP 6002064c4f4, reason 0x0 (COMPLETED) state 0x1f Time 912us
00000.240311200 fp2:<- 0x2a WRITE(10) address 2287:18, lba 0x00d93c20, len 0x000010, control 0x00 timeout 60 CDBP 300270fadec, reason 0x0 (COMPLETED) state 0x1f Time 730us
00000.585352960 fp2:<- 0x2a WRITE(10) address 2287:67, lba 0x004a5ef8, len 0x00013d, control 0x00 timeout 60 CDBP 3003bb1add4, reason 0x0 (COMPLETED) state 0x1f Time 1390us
00000.586121680 fp2:<- 0x2a WRITE(10) address 2287:66, lba 0x019c01f8, len 0x00013d, control 0x00 timeout 60 CDBP 60026a1e4d4, reason 0x0 (COMPLETED) state 0x1f Time 2136us
00000.868869200 fp2:<- 0x2a WRITE(10) address 2287:17, lba 0x005ca80b, len 0x000002, control 0x00 timeout 60 CDBP 30053138df4, reason 0x0 (COMPLETED) state 0x1f Time 404us
00000.869025920 fp2:<- 0x2a WRITE(10) address 2287:76, lba 0x00897a89, len 0x000002, control 0x00 timeout 60 CDBP 6002064db9c, reason 0x0 (COMPLETED) state 0x1f Time 501us
00000.889036480 fp2:-> 0x28 READ(10) address 2287:76, lba 0x00a879d9, len 0x000001, control 0x00 timeout 60 CDBP 6002064db9c RDBMS(23974) cdb(10) 280000a879d900000100
00000.889377200 fp2:<- 0x28 READ(10) address 2287:76, lba 0x00a879d9, len 0x000001, control 0x00 timeout 60 CDBP 6002064db9c, reason 0x0 (COMPLETED) state 0x1f Time 409us
00000.890777520 fp2:<- 0x28 READ(10) address 2287:76, lba 0x00a877f5, len 0x000001, control 0x00 timeout 60 CDBP 3000e37cde4, reason 0x0 (COMPLETED) state 0x1f Time 267us
00000.959244800 fp2:<- 0x2a WRITE(10) address 2287:08, lba 0x00001f10, len 0x000010, control 0x00 timeout 60 CDBP 30025654084, reason 0x0 (COMPLETED) state 0x1f Time 642us
00000.239373680 fp2:-> 0x2a WRITE(10) address 2287:16, lba 0x01d4889c, len 0x000010, control 0x00 timeout 60 CDBP 60019209b84 RDBMS(25753) cdb(10) 2a0001d4889c00001000
00000.868509120 fp2:-> 0x2a WRITE(10) address 2287:17, lba 0x005ca80b, len 0x000002, control 0x00 timeout 60 CDBP 30053138df4 RDBMS(25751) cdb(10) 2a00005ca80b00000200
00000.239401200 fp2:-> 0x2a WRITE(10) address 2287:15, lba 0x0274b920, len 0x000010, control 0x00 timeout 60 CDBP 6002064c4f4 RDBMS(25753) cdb(10) 2a000274b92000001000
00000.584010640 fp2:-> 0x2a WRITE(10) address 2287:67, lba 0x004a5ef8, len 0x00013d, control 0x00 timeout 60 CDBP 3003bb1add4 RDBMS(18244) cdb(10) 2a00004a5ef800013d00

Still no joy :-(. I become pessimist about finding any answer to my questions.
Next day I couldn't stop thinking about it and become down in the dumps ...

Suddenly, wait a minute ! Have I checked who wrote the scsi.d script ?!?!? No ! Let's quickly find out ! Maybe this is the way I can find any answer ?!?!?! The begin of the script says:

...
/*
* Chris.Gerhard@sun.com
* Joel.Buckley@sun.com
*/

#pragma ident "@(#)scsi.d 1.12 07/03/16 SMI"
...


Hope got back. ;-)

I know that hope often blinks at a fool but you understand me ? Yes, you do ! Thanks ! ;-)

Let's see if these guys would help me. I sent them an e-mail without (well, almost ...) any belief that it would work ... and a few hours later Chris answered ! I still didn't believe it while reading his e-mail ! To make it not so easy, Chris offered a "deal": if I describe him my problem he will answer it via his blog. And that is how
http://blogs.sun.com/chrisg/entry/latency_bubble_in_your_io
was born ...
More such deals ! ;-)

2 comments:

Anonymous said...

Hi, I can’t understand how to add your site in my rss reader. Can you Help me, please :)

Przemysław Bąk (przemol) said...

Copy/paste the following link to your RSS reader:
http://przemol.blogspot.com/feeds/posts/default