2008-09-30
Solaris support for Rock processor
While looking for some information I came across the following link :-)
2008-07-31
swat & vdbench - excellent couple
There has been anouncement about SWAT - another tool helpful in performing benchmarking tests. Together with Vdbench we get almost the perfect couple for playing with storage benchmarking.
2008-06-08
vdbench - disk I/O workload generator
There are several tools which allow to test performance of filesystems like Iozone, Bonnie, FileBench. There is also one unknown widely called vdbench. It is disk I/O workload generator written by Sun's employee, Henk Vandenbergh and used internally at Sun and its customers.
How does it differ from the others tools ? We'll see ... :-)
Vdbench, besides classical CLI has also GUI which simplifies its usage. However I am going to show how to use it via CLI. After downloading one need to install it. Vdbench has optional, -t, parameter which allows to specify target directory:
Now its time to write parameter file with description of needed workload. There are same examples within vdbench directory (example*). But let's begin with the following, simple one:
How to interpret that file ? For example:
sd - Storage Definition (where the I/O is going to/from - storage, disks, files)
wd - Workload Definition (precise definition of workload) - some explanations:
Of course we need to create needed files:
What I like in this tool is the ability to show IOPS in each second of test what gives excellent view of tested environment. Let's see the example:
iostat during the run:
In OLTP environments the column i/o rate is one of the most important while in DSS world MB/sec is probably more important. "Ok", you say, "you have stable io rate. Nothing really interesting." Are you sure ? Can you show me another, non-commercial tool which shows the same: io rate in each second ? How about this example:
And iostat:
Quite interesting, isn't it ? "What did you change to get so high io rate", you might ask ? Hmmm, I am not sure if I want to reveal ... ;-)
Seriously: the first test was on Veritas File System while the second one on ZFS. Does it mean that ZFS is faster then VxFS ? I don't think so. But using vdbench you can watch two different nature of mentioned filesystems. If you look closer at ZFS you will see that there is no reads from filesystem. Why ? Vdbench parameter "rdpct=70" forces to generate 70% reads and 30% writes.
It is because intensive kernel memory usage in ZFS.
Another example: have a look at first vdbench output. It is interesting that io rate goes up and down. I tracked it down earlier and revealed that hardware array, which I had put the filesystems on, has some bizarre behaviour using RAID-5.
Anyway you can see that without any sophisticated (and expansive) tool you are able to bring to light new, performance related information about your storage and filesystem.
Vdbench was released to the public a few weeks ago so we can download it over here.
How does it differ from the others tools ? We'll see ... :-)
Vdbench, besides classical CLI has also GUI which simplifies its usage. However I am going to show how to use it via CLI. After downloading one need to install it. Vdbench has optional, -t, parameter which allows to specify target directory:
kruk:/tmp/vdbench# ./install_unix.sh -t /opt/vdbench
Sun Microsystems, Inc. ("Sun") ENTITLEMENT for SOFTWARE
Licensee/Company: Entity receiving Software.
Efective Date: Date Sun delivers the Software to You.
Software: Sun StorageTek vdbench 4.07
[...]
Please contact Sun Microsystems, Inc. 4150 Network Circle, Santa
Clara, California 95054 if you have questions.
Accept User License Agreement (yes/no): yes
08:51:47.887 plog(): execute(): tar -tf /tmp/vdbench/vdbench407.tar
********************************************************************************
[...]
Tool will expire on: sobota, kwiecień 25 2009, 23:27:38
********************************************************************************
Tool installation to /opt/vdbench successful
Now its time to write parameter file with description of needed workload. There are same examples within vdbench directory (example*). But let's begin with the following, simple one:
sd=sd1,lun=/vdb/file1
sd=sd2,lun=/vdb/file2
sd=sd3,lun=/vdb/file3
sd=sd4,lun=/vdb/file4
wd=rg-1,sd=sd*,rdpct=70,rhpct=0,whpct=0,xfersize=8k,seekpct=70
rd=rd_rg-1,wd=rg-1,interval=1,iorate=max,elapsed=30,forthreads=(64)
How to interpret that file ? For example:
sd - Storage Definition (where the I/O is going to/from - storage, disks, files)
wd - Workload Definition (precise definition of workload) - some explanations:
- rdpct - read percentage; 70 means that 70% of time is spent on reading and the rest, 30%, on writing
- xfersize - size of each I/O
- seekpct - percentage of random seeks
Of course we need to create needed files:
kruk:/root# cd /opt/vdbench/
kruk:/opt/vdbench# mkdir /vdb
kruk:/opt/vdbench# mkfile 100m /vdb/file1
kruk:/opt/vdbench# mkfile 100m /vdb/file2
kruk:/opt/vdbench# mkfile 100m /vdb/file3
kruk:/opt/vdbench# mkfile 100m /vdb/file4
What I like in this tool is the ability to show IOPS in each second of test what gives excellent view of tested environment. Let's see the example:
kruk:/opt/vdbench# ./vdbench -f my-parm.cfg
[...]
interval i/o MB/sec bytes read resp resp resp cpu% cpu%
rate 1024**2 i/o pct time max stddev sys+usr sys
14:50:41.208 1 1770,85 13,83 8192 79,48 104,913 774,921 170,206 20,8 10,5
14:50:42.097 2 1600,90 12,51 8192 67,65 159,721 961,760 225,067 20,0 11,8
14:50:43.103 3 1302,75 10,18 8192 68,05 184,123 1439,223 262,792 13,7 8,0
14:50:44.085 4 1112,86 8,69 8192 68,20 219,451 1954,038 315,391 12,6 7,1
14:50:45.080 5 1210,84 9,46 8192 68,83 220,502 1511,942 322,902 12,0 7,8
14:50:46.078 6 1192,25 9,31 8192 70,59 213,559 1474,794 318,486 11,7 7,2
14:50:47.081 7 899,99 7,03 8192 70,48 253,603 1654,079 378,854 10,3 6,0
14:50:48.058 8 1251,91 9,78 8192 71,13 219,671 1831,191 340,373 11,8 7,5
14:50:49.049 9 1004,77 7,85 8192 68,77 251,295 1668,598 364,461 10,0 6,7
14:50:50.049 10 1124,07 8,78 8192 68,28 229,389 1804,713 329,042 10,5 6,5
14:50:51.047 11 1099,94 8,59 8192 68,47 236,588 1699,419 344,097 10,0 6,7
14:50:52.040 12 629,53 4,92 8192 69,45 265,482 1742,241 374,407 7,7 4,7
14:50:53.043 13 1042,71 8,15 8192 69,05 344,049 2308,431 532,435 8,8 6,0
14:50:54.042 14 1452,97 11,35 8192 68,11 174,344 2086,119 251,800 11,8 8,8
14:50:55.075 15 1175,48 9,18 8192 69,67 212,452 1504,912 312,161 9,8 6,8
14:50:56.046 16 1048,33 8,19 8192 68,92 227,462 1595,952 325,352 10,8 7,0
14:50:57.047 17 881,26 6,88 8192 68,19 264,582 1160,291 365,083 9,2 5,7
14:50:58.068 18 1023,81 8,00 8192 71,98 282,999 1757,541 435,796 12,3 6,5
14:50:59.056 19 1076,79 8,41 8192 71,47 218,663 1339,287 339,072 11,8 7,8
14:51:00.044 20 1177,03 9,20 8192 68,99 231,724 1469,797 339,442 11,2 7,2
14:51:01.037 21 1136,34 8,88 8192 72,33 221,309 1807,646 343,556 10,3 6,8
14:51:02.043 22 655,37 5,12 8192 69,91 279,581 1286,680 402,433 6,5 4,5
14:51:03.040 23 1022,76 7,99 8192 70,30 336,784 1886,443 511,220 9,8 6,3
14:51:04.047 24 1322,93 10,34 8192 70,74 198,405 1732,970 295,461 11,8 8,0
14:51:05.049 25 1028,04 8,03 8192 68,67 228,957 1436,083 321,115 9,7 6,7
14:51:06.039 26 1075,99 8,41 8192 69,78 241,907 1642,574 349,677 9,5 6,3
14:51:07.046 27 908,38 7,10 8192 70,23 248,514 1661,609 363,362 9,0 6,0
14:51:08.044 28 827,90 6,47 8192 70,35 331,738 1898,630 486,697 7,8 5,3
14:51:09.046 29 1208,92 9,44 8192 68,49 225,420 1581,279 321,723 10,9 7,2
14:51:10.042 30 1025,04 8,01 8192 70,23 239,669 1502,131 347,456 9,3 6,5
14:51:10.055 avg_2-30 1086,79 8,49 8192 69,51 234,502 2308,431 353,760 10,7 6,9
iostat during the run:
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
395.0 178.0 3160.0 1424.0 0.0 2.6 0.0 4.6 0 92 c1
395.0 178.0 3160.0 1424.0 0.0 2.6 0.0 4.6 1 92 c1t50060E8000444540d4
394.0 173.0 3152.1 1384.1 0.0 3.3 0.0 5.7 0 99 c3
394.0 173.0 3152.2 1384.1 0.0 3.3 0.0 5.7 1 99 c3t50060E8000444542d4
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
484.1 188.0 3872.8 1504.3 0.0 3.0 0.0 4.5 0 98 c1
484.1 188.0 3872.8 1504.3 0.0 3.0 0.0 4.5 1 98 c1t50060E8000444540d4
418.1 177.0 3344.7 1416.3 0.0 3.0 0.0 5.0 0 97 c3
418.1 177.0 3344.6 1416.3 0.0 3.0 0.0 5.0 1 97 c3t50060E8000444542d4
In OLTP environments the column i/o rate is one of the most important while in DSS world MB/sec is probably more important. "Ok", you say, "you have stable io rate. Nothing really interesting." Are you sure ? Can you show me another, non-commercial tool which shows the same: io rate in each second ? How about this example:
interval i/o MB/sec bytes read resp resp resp
rate 1024**2 i/o pct time max stddev
15:04:15.746 1 27558,72 215,30 8192 69,72 1,089 338,684 10,493
15:04:16.465 2 42955,58 335,59 8192 69,66 ,219 903,236 5,504
15:04:17.170 3 45265,34 353,64 8192 69,63 ,132 384,352 3,121
15:04:17.692 4 41596,08 324,97 8192 69,99 ,119 148,012 2,091
15:04:18.322 5 38974,35 304,49 8192 69,98 ,210 570,820 5,815
15:04:19.385 6 22774,35 177,92 8192 69,32 1,830 1040,743 22,121
15:04:20.565 7 28552,81 223,07 8192 69,48 1,651 1539,927 28,605
15:04:21.079 8 38379,22 299,84 8192 69,30 ,181 340,999 3,756
15:04:22.159 9 40825,13 318,95 8192 70,07 ,134 319,363 2,655
15:04:23.187 10 37296,62 291,38 8192 69,91 ,161 186,869 2,589
15:04:24.334 11 23630,10 184,61 8192 70,29 ,894 352,045 6,997
15:04:25.087 12 24689,14 192,88 8192 70,14 ,192 604,120 7,659
15:04:26.334 13 33136,76 258,88 8192 70,22 ,094 66,819 ,870
15:04:27.046 14 40521,35 316,57 8192 69,89 ,128 348,961 2,395
15:04:29.072 15 35763,25 279,40 8192 69,60 ,394 264,676 4,571
15:04:29.290 16 28473,31 222,45 8192 69,65 1,433 730,685 14,482
15:04:30.206 17 31505,71 246,14 8192 69,62 ,979 914,484 15,763
15:04:31.259 18 40196,17 314,03 8192 69,67 ,125 234,910 1,692
15:04:32.116 19 35458,59 277,02 8192 69,77 ,116 138,638 1,540
15:04:33.249 20 43064,86 336,44 8192 69,97 ,128 390,286 2,813
15:04:34.234 21 24776,40 193,57 8192 70,12 1,947 231,994 12,754
15:04:35.554 22 35134,31 274,49 8192 69,80 ,451 703,214 6,932
15:04:36.261 23 33361,52 260,64 8192 69,50 ,444 898,570 11,638
15:04:37.557 24 38474,73 300,58 8192 70,05 ,225 322,527 4,602
15:04:38.234 25 41275,28 322,46 8192 69,74 ,170 206,164 2,687
15:04:39.097 26 22927,25 179,12 8192 70,44 1,345 665,988 11,805
15:04:40.258 27 32228,28 251,78 8192 70,14 ,540 752,539 9,812
15:04:41.296 28 39111,23 305,56 8192 70,15 ,196 271,594 3,369
15:04:42.200 29 41695,78 325,75 8192 69,74 ,210 282,547 3,677
15:04:43.110 30 46167,28 360,68 8192 69,80 ,156 356,567 2,871
15:04:43.131 avg_2-30 35532,33 277,60 8192 69,84 ,420 1539,927 8,438
And iostat:
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 827.7 0.0 105947.0 0.0 34.9 0.0 42.2 0 100 c1
0.0 827.7 0.0 105944.1 0.0 34.9 0.0 42.2 1 100 c1t50060E8000444540d3
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 439.1 0.0 49615.6 0.0 12.1 0.0 27.5 0 45 c1
0.0 439.1 0.0 49615.6 0.0 12.1 0.0 27.5 1 45 c1t50060E8000444540d3
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 919.3 0.0 116434.0 0.0 26.9 0.0 29.3 0 85 c1
0.0 919.3 0.0 116434.2 0.0 26.9 0.0 29.3 1 85 c1t50060E8000444540d3
Quite interesting, isn't it ? "What did you change to get so high io rate", you might ask ? Hmmm, I am not sure if I want to reveal ... ;-)
Seriously: the first test was on Veritas File System while the second one on ZFS. Does it mean that ZFS is faster then VxFS ? I don't think so. But using vdbench you can watch two different nature of mentioned filesystems. If you look closer at ZFS you will see that there is no reads from filesystem. Why ? Vdbench parameter "rdpct=70" forces to generate 70% reads and 30% writes.
It is because intensive kernel memory usage in ZFS.
Another example: have a look at first vdbench output. It is interesting that io rate goes up and down. I tracked it down earlier and revealed that hardware array, which I had put the filesystems on, has some bizarre behaviour using RAID-5.
Anyway you can see that without any sophisticated (and expansive) tool you are able to bring to light new, performance related information about your storage and filesystem.
Vdbench was released to the public a few weeks ago so we can download it over here.
2008-02-17
Solaris IO tuning: monitoring disk queue - lack of knowledge
A few months ago we faced some performance problems on one of our servers. There is One Very Well Known Big Database (OVWKBD - complicated, isn't it ? ;-) ) running on it. One end user reported us that there are some hangs during office hours. We (me and one of ours DBA who is responsible for the OVWKBD) were surprised since it had never happened before (or, as I assume, nobody had told us before) and began investigation. After a few days our DBA pointed out that it may be related to redo logs writing (which were, as the rest of database, on SAN-attached disk array). In fact he was sure that it is the problem but when I asked for any proof he didn't deliver any. He insisted on changing the array configuration but since I don't like to proceed blindly, I was going to monitor io (disk) queue(s) before any significant change. I set /etc/system according to docs and the ssd_max_throttle variable is the key setting. But still wanted to be more aware what is going on at the disk queue level and setup ssd_max_throttle according to _real_ needs. When I started hunting high and low I realized that the disk queue (which is after all one of the key areas of IO tuning) is poorly documented so I began desperately seeking any knowledge. Neither Sunsolve nor docs.sun.com helped me. One page, http://wikis.sun.com/display/StorageDev/The+Solaris+OS+Queue+Throttle, gave me a tiny piece of information but still ...
Oh, God ! I don't believe that there is no docs about it !
I tried Opensolaris mailing lists but still no really useful information. In one e-mail, Robert Miłkowski, suggested to use scsi.d DTrace script. I had used it before but it hadn't helped me. Maybe I wasn't able to read it properly ? Well, at least I could try it again:
Still no joy :-(. I become pessimist about finding any answer to my questions.
Next day I couldn't stop thinking about it and become down in the dumps ...
Suddenly, wait a minute ! Have I checked who wrote the scsi.d script ?!?!? No ! Let's quickly find out ! Maybe this is the way I can find any answer ?!?!?! The begin of the script says:
Hope got back. ;-)
I know that hope often blinks at a fool but you understand me ? Yes, you do ! Thanks ! ;-)
Let's see if these guys would help me. I sent them an e-mail without (well, almost ...) any belief that it would work ... and a few hours later Chris answered ! I still didn't believe it while reading his e-mail ! To make it not so easy, Chris offered a "deal": if I describe him my problem he will answer it via his blog. And that is how
http://blogs.sun.com/chrisg/entry/latency_bubble_in_your_io
was born ...
More such deals ! ;-)
Oh, God ! I don't believe that there is no docs about it !
I tried Opensolaris mailing lists but still no really useful information. In one e-mail, Robert Miłkowski, suggested to use scsi.d DTrace script. I had used it before but it hadn't helped me. Maybe I wasn't able to read it properly ? Well, at least I could try it again:
00000.868602160 fp2:-> 0x2a WRITE(10) address 2287:76, lba 0x00897a89, len 0x000002, control 0x00 timeout 60 CDBP 6002064db9c RDBMS(25751) cdb(10) 2a0000897a8900000200
00000.890551520 fp2:-> 0x28 READ(10) address 2287:76, lba 0x00a877f5, len 0x000001, control 0x00 timeout 60 CDBP 3000e37cde4 RDBMS(23974) cdb(10) 280000a877f500000100
00000.584069360 fp2:-> 0x2a WRITE(10) address 2287:66, lba 0x019c01f8, len 0x00013d, control 0x00 timeout 60 CDBP 60026a1e4d4 RDBMS(18244) cdb(10) 2a00019c01f800013d00
00000.239667600 fp2:-> 0x2a WRITE(10) address 2287:18, lba 0x00d93c20, len 0x000010, control 0x00 timeout 60 CDBP 300270fadec RDBMS(25753) cdb(10) 2a0000d93c2000001000
00000.958698480 fp2:-> 0x2a WRITE(10) address 2287:08, lba 0x00001f10, len 0x000010, control 0x00 timeout 60 CDBP 30025654084 sched(0) cdb(10) 2a0000001f1000001000
00000.240042160 fp2:<- 0x2a WRITE(10) address 2287:16, lba 0x01d4889c, len 0x000010, control 0x00 timeout 60 CDBP 60019209b84, reason 0x0 (COMPLETED) state 0x1f Time 820us
00000.240213360 fp2:<- 0x2a WRITE(10) address 2287:15, lba 0x0274b920, len 0x000010, control 0x00 timeout 60 CDBP 6002064c4f4, reason 0x0 (COMPLETED) state 0x1f Time 912us
00000.240311200 fp2:<- 0x2a WRITE(10) address 2287:18, lba 0x00d93c20, len 0x000010, control 0x00 timeout 60 CDBP 300270fadec, reason 0x0 (COMPLETED) state 0x1f Time 730us
00000.585352960 fp2:<- 0x2a WRITE(10) address 2287:67, lba 0x004a5ef8, len 0x00013d, control 0x00 timeout 60 CDBP 3003bb1add4, reason 0x0 (COMPLETED) state 0x1f Time 1390us
00000.586121680 fp2:<- 0x2a WRITE(10) address 2287:66, lba 0x019c01f8, len 0x00013d, control 0x00 timeout 60 CDBP 60026a1e4d4, reason 0x0 (COMPLETED) state 0x1f Time 2136us
00000.868869200 fp2:<- 0x2a WRITE(10) address 2287:17, lba 0x005ca80b, len 0x000002, control 0x00 timeout 60 CDBP 30053138df4, reason 0x0 (COMPLETED) state 0x1f Time 404us
00000.869025920 fp2:<- 0x2a WRITE(10) address 2287:76, lba 0x00897a89, len 0x000002, control 0x00 timeout 60 CDBP 6002064db9c, reason 0x0 (COMPLETED) state 0x1f Time 501us
00000.889036480 fp2:-> 0x28 READ(10) address 2287:76, lba 0x00a879d9, len 0x000001, control 0x00 timeout 60 CDBP 6002064db9c RDBMS(23974) cdb(10) 280000a879d900000100
00000.889377200 fp2:<- 0x28 READ(10) address 2287:76, lba 0x00a879d9, len 0x000001, control 0x00 timeout 60 CDBP 6002064db9c, reason 0x0 (COMPLETED) state 0x1f Time 409us
00000.890777520 fp2:<- 0x28 READ(10) address 2287:76, lba 0x00a877f5, len 0x000001, control 0x00 timeout 60 CDBP 3000e37cde4, reason 0x0 (COMPLETED) state 0x1f Time 267us
00000.959244800 fp2:<- 0x2a WRITE(10) address 2287:08, lba 0x00001f10, len 0x000010, control 0x00 timeout 60 CDBP 30025654084, reason 0x0 (COMPLETED) state 0x1f Time 642us
00000.239373680 fp2:-> 0x2a WRITE(10) address 2287:16, lba 0x01d4889c, len 0x000010, control 0x00 timeout 60 CDBP 60019209b84 RDBMS(25753) cdb(10) 2a0001d4889c00001000
00000.868509120 fp2:-> 0x2a WRITE(10) address 2287:17, lba 0x005ca80b, len 0x000002, control 0x00 timeout 60 CDBP 30053138df4 RDBMS(25751) cdb(10) 2a00005ca80b00000200
00000.239401200 fp2:-> 0x2a WRITE(10) address 2287:15, lba 0x0274b920, len 0x000010, control 0x00 timeout 60 CDBP 6002064c4f4 RDBMS(25753) cdb(10) 2a000274b92000001000
00000.584010640 fp2:-> 0x2a WRITE(10) address 2287:67, lba 0x004a5ef8, len 0x00013d, control 0x00 timeout 60 CDBP 3003bb1add4 RDBMS(18244) cdb(10) 2a00004a5ef800013d00
Still no joy :-(. I become pessimist about finding any answer to my questions.
Next day I couldn't stop thinking about it and become down in the dumps ...
Suddenly, wait a minute ! Have I checked who wrote the scsi.d script ?!?!? No ! Let's quickly find out ! Maybe this is the way I can find any answer ?!?!?! The begin of the script says:
...
/*
* Chris.Gerhard@sun.com
* Joel.Buckley@sun.com
*/
#pragma ident "@(#)scsi.d 1.12 07/03/16 SMI"
...
Hope got back. ;-)
I know that hope often blinks at a fool but you understand me ? Yes, you do ! Thanks ! ;-)
Let's see if these guys would help me. I sent them an e-mail without (well, almost ...) any belief that it would work ... and a few hours later Chris answered ! I still didn't believe it while reading his e-mail ! To make it not so easy, Chris offered a "deal": if I describe him my problem he will answer it via his blog. And that is how
http://blogs.sun.com/chrisg/entry/latency_bubble_in_your_io
was born ...
More such deals ! ;-)
2008-02-16
ZFS vs VxFS vs UFS on x4500 (Thumper - JBOD)
A few months ago I compared performance of the above filesystems using filebench. Since then a few things changed:
I decided to test RAID 1+0 under OLTP (8k, no-cached) workload. Since x4500 has 48 SATA disks I divided them into 3 sets, one for each filesystem: VxFS/VxVM, ZFS and UFS. Hard Drive Monitor Utility (HD Tool) allows to draw ASCII map of the internal drive layout:
Each filesystem was mounted in its directory. The follow filebench confguration was used in testing:
Below are results:
A few observations:
All the filesystem configurations are below:
VxVM/VxFS 2-cols
VxVM/VxFS 6-cols
UFS
ZFS
- it is Solaris 10 8/07 available now (compared to Solaris 10 11/06 used during the previous test). Thanks to fabulous pca tool all (really all !) patches were installed.
- there is a new 1.1 filebench released
- there is Veritas Storage Foundation Basic v5.0 for x64 (the last available version used to be 4.1)
I decided to test RAID 1+0 under OLTP (8k, no-cached) workload. Since x4500 has 48 SATA disks I divided them into 3 sets, one for each filesystem: VxFS/VxVM, ZFS and UFS. Hard Drive Monitor Utility (HD Tool) allows to draw ASCII map of the internal drive layout:
---------------------SunFireX4500------Rear----------------------------
36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47:
c5t3 c5t7 c4t3 c4t7 c7t3 c7t7 c6t3 c6t7 c1t3 c1t7 c0t3 c0t7 <-VxFS
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35:
c5t2 c5t6 c4t2 c4t6 c7t2 c7t6 c6t2 c6t6 c1t2 c1t6 c0t2 c0t6 <- UFS
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:
c5t1 c5t5 c4t1 c4t5 c7t1 c7t5 c6t1 c6t5 c1t1 c1t5 c0t1 c0t5 <- ZFS
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
c5t0 c5t4 c4t0 c4t4 c7t0 c7t4 c6t0 c6t4 c1t0 c1t4 c0t0 c0t4
^b+ ^b+ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
-------*-----------*-SunFireX4500--*---Front-----*-----------*----------
Each filesystem was mounted in its directory. The follow filebench confguration was used in testing:
DEFAULTS {
runtime = 60;
dir = "directory where each filesystem was mounted";
stats = /tmp;
filesystem = "zfs|ufs|vxfs";
description = "oltp zfs|ufs|vxfs";
}
CONFIG oltp_8k_uncached {
personality = oltp;
function = generic;
cached = 0;
directio = 1;
iosize = 8k;
nshadows = 200;
ndbwriters = 10;
usermode = 20000;
filesize = 5g;
nfiles = 10;
memperthread = 1m;
workingset = 0;
}
Below are results:
A few observations:
- compared to the previous benchmark we can see big improvements in ZFS area (but of course the environment, jbod instead of SCSI array can influence the results)
- VxFS is still the winner. But typical configuration of RAID 1+0 is not faster then ZFS. Only 6-cols configuration wins with ZFS.
All the filesystem configurations are below:
VxVM/VxFS 2-cols
Disk group: vxgroup
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg vxgroup default default 105000 1201598836.80.mickey
dm c0t3d0 c0t3d0 auto 65532 976691152 -
dm c0t7d0 c0t7d0 auto 65532 976691152 -
dm c1t3d0 c1t3d0 auto 65532 976691152 -
dm c1t7d0 c1t7d0 auto 65532 976691152 -
dm c4t3d0 c4t3d0 auto 65532 976691152 -
dm c4t7d0 c4t7d0 auto 65532 976691152 -
dm c5t3d0 c5t3d0 auto 65532 976691152 -
dm c5t7d0 c5t7d0 auto 65532 976691152 -
dm c6t3d0 c6t3d0 auto 65532 976691152 -
dm c6t7d0 c6t7d0 auto 65532 976691152 -
dm c7t3d0 c7t3d0 auto 65532 976691152 -
dm c7t7d0 c7t7d0 auto 65532 976691152 -
v vx-vol - ENABLED ACTIVE 5860145152 SELECT vx-vol-03 fsgen
pl vx-vol-03 vx-vol ENABLED ACTIVE 5860145152 STRIPE 2/32 RW
sv vx-vol-S01 vx-vol-03 vx-vol-L01 1 976691152 0/0 2/2 ENA
sv vx-vol-S02 vx-vol-03 vx-vol-L02 1 976691152 0/976691152 2/2 ENA
sv vx-vol-S03 vx-vol-03 vx-vol-L03 1 976690272 0/1953382304 2/2 ENA
sv vx-vol-S04 vx-vol-03 vx-vol-L04 1 976691152 1/0 2/2 ENA
sv vx-vol-S05 vx-vol-03 vx-vol-L05 1 976691152 1/976691152 2/2 ENA
sv vx-vol-S06 vx-vol-03 vx-vol-L06 1 976690272 1/1953382304 2/2 ENA
v vx-vol-L01 - ENABLED ACTIVE 976691152 SELECT - fsgen
pl vx-vol-P01 vx-vol-L01 ENABLED ACTIVE 976691152 CONCAT - RW
sd c0t3d0-02 vx-vol-P01 c0t3d0 0 976691152 0 c0t3d0 ENA
pl vx-vol-P02 vx-vol-L01 ENABLED ACTIVE 976691152 CONCAT - RW
sd c1t3d0-02 vx-vol-P02 c1t3d0 0 976691152 0 c1t3d0 ENA
v vx-vol-L02 - ENABLED ACTIVE 976691152 SELECT - fsgen
pl vx-vol-P03 vx-vol-L02 ENABLED ACTIVE 976691152 CONCAT - RW
sd c4t3d0-02 vx-vol-P03 c4t3d0 0 976691152 0 c4t3d0 ENA
pl vx-vol-P04 vx-vol-L02 ENABLED ACTIVE 976691152 CONCAT - RW
sd c5t3d0-02 vx-vol-P04 c5t3d0 0 976691152 0 c5t3d0 ENA
v vx-vol-L03 - ENABLED ACTIVE 976690272 SELECT - fsgen
pl vx-vol-P05 vx-vol-L03 ENABLED ACTIVE 976690272 CONCAT - RW
sd c6t3d0-02 vx-vol-P05 c6t3d0 0 976690272 0 c6t3d0 ENA
pl vx-vol-P06 vx-vol-L03 ENABLED ACTIVE 976690272 CONCAT - RW
sd c7t3d0-02 vx-vol-P06 c7t3d0 0 976690272 0 c7t3d0 ENA
v vx-vol-L04 - ENABLED ACTIVE 976691152 SELECT - fsgen
pl vx-vol-P07 vx-vol-L04 ENABLED ACTIVE 976691152 CONCAT - RW
sd c0t7d0-02 vx-vol-P07 c0t7d0 0 976691152 0 c0t7d0 ENA
pl vx-vol-P08 vx-vol-L04 ENABLED ACTIVE 976691152 CONCAT - RW
sd c1t7d0-02 vx-vol-P08 c1t7d0 0 976691152 0 c1t7d0 ENA
v vx-vol-L05 - ENABLED ACTIVE 976691152 SELECT - fsgen
pl vx-vol-P09 vx-vol-L05 ENABLED ACTIVE 976691152 CONCAT - RW
sd c4t7d0-02 vx-vol-P09 c4t7d0 0 976691152 0 c4t7d0 ENA
pl vx-vol-P10 vx-vol-L05 ENABLED ACTIVE 976691152 CONCAT - RW
sd c5t7d0-02 vx-vol-P10 c5t7d0 0 976691152 0 c5t7d0 ENA
v vx-vol-L06 - ENABLED ACTIVE 976690272 SELECT - fsgen
pl vx-vol-P11 vx-vol-L06 ENABLED ACTIVE 976690272 CONCAT - RW
sd c6t7d0-02 vx-vol-P11 c6t7d0 0 976690272 0 c6t7d0 ENA
pl vx-vol-P12 vx-vol-L06 ENABLED ACTIVE 976690272 CONCAT - RW
sd c7t7d0-02 vx-vol-P12 c7t7d0 0 976690272 0 c7t7d0 ENA
VxVM/VxFS 6-cols
Disk group: vxgroup
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg vxgroup default default 105000 1201598836.80.mickey
dm c0t3d0 c0t3d0 auto 65532 976691152 -
dm c0t7d0 c0t7d0 auto 65532 976691152 -
dm c1t3d0 c1t3d0 auto 65532 976691152 -
dm c1t7d0 c1t7d0 auto 65532 976691152 -
dm c4t3d0 c4t3d0 auto 65532 976691152 -
dm c4t7d0 c4t7d0 auto 65532 976691152 -
dm c5t3d0 c5t3d0 auto 65532 976691152 -
dm c5t7d0 c5t7d0 auto 65532 976691152 -
dm c6t3d0 c6t3d0 auto 65532 976691152 -
dm c6t7d0 c6t7d0 auto 65532 976691152 -
dm c7t3d0 c7t3d0 auto 65532 976691152 -
dm c7t7d0 c7t7d0 auto 65532 976691152 -
v vx-vol - ENABLED ACTIVE 5860145152 SELECT vx-vol-03 fsgen
pl vx-vol-03 vx-vol ENABLED ACTIVE 5860145280 STRIPE 6/32 RW
sv vx-vol-S01 vx-vol-03 vx-vol-L01 1 976690880 0/0 2/2 ENA
sv vx-vol-S02 vx-vol-03 vx-vol-L02 1 976690880 1/0 2/2 ENA
sv vx-vol-S03 vx-vol-03 vx-vol-L03 1 976690880 2/0 2/2 ENA
sv vx-vol-S04 vx-vol-03 vx-vol-L04 1 976690880 3/0 2/2 ENA
sv vx-vol-S05 vx-vol-03 vx-vol-L05 1 976690880 4/0 2/2 ENA
sv vx-vol-S06 vx-vol-03 vx-vol-L06 1 976690880 5/0 2/2 ENA
v vx-vol-L01 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P01 vx-vol-L01 ENABLED ACTIVE 976690880 CONCAT - RW
sd c0t3d0-02 vx-vol-P01 c0t3d0 0 976690880 0 c0t3d0 ENA
pl vx-vol-P02 vx-vol-L01 ENABLED ACTIVE 976690880 CONCAT - RW
sd c5t3d0-02 vx-vol-P02 c5t3d0 0 976690880 0 c5t3d0 ENA
v vx-vol-L02 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P03 vx-vol-L02 ENABLED ACTIVE 976690880 CONCAT - RW
sd c0t7d0-02 vx-vol-P03 c0t7d0 0 976690880 0 c0t7d0 ENA
pl vx-vol-P04 vx-vol-L02 ENABLED ACTIVE 976690880 CONCAT - RW
sd c5t7d0-02 vx-vol-P04 c5t7d0 0 976690880 0 c5t7d0 ENA
v vx-vol-L03 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P05 vx-vol-L03 ENABLED ACTIVE 976690880 CONCAT - RW
sd c1t3d0-02 vx-vol-P05 c1t3d0 0 976690880 0 c1t3d0 ENA
pl vx-vol-P06 vx-vol-L03 ENABLED ACTIVE 976690880 CONCAT - RW
sd c6t3d0-02 vx-vol-P06 c6t3d0 0 976690880 0 c6t3d0 ENA
v vx-vol-L04 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P07 vx-vol-L04 ENABLED ACTIVE 976690880 CONCAT - RW
sd c1t7d0-02 vx-vol-P07 c1t7d0 0 976690880 0 c1t7d0 ENA
pl vx-vol-P08 vx-vol-L04 ENABLED ACTIVE 976690880 CONCAT - RW
sd c6t7d0-02 vx-vol-P08 c6t7d0 0 976690880 0 c6t7d0 ENA
v vx-vol-L05 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P09 vx-vol-L05 ENABLED ACTIVE 976690880 CONCAT - RW
sd c4t3d0-02 vx-vol-P09 c4t3d0 0 976690880 0 c4t3d0 ENA
pl vx-vol-P10 vx-vol-L05 ENABLED ACTIVE 976690880 CONCAT - RW
sd c7t3d0-02 vx-vol-P10 c7t3d0 0 976690880 0 c7t3d0 ENA
v vx-vol-L06 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P11 vx-vol-L06 ENABLED ACTIVE 976690880 CONCAT - RW
sd c4t7d0-02 vx-vol-P11 c4t7d0 0 976690880 0 c4t7d0 ENA
pl vx-vol-P12 vx-vol-L06 ENABLED ACTIVE 976690880 CONCAT - RW
sd c7t7d0-02 vx-vol-P12 c7t7d0 0 976690880 0 c7t7d0 ENA
UFS
d300: Mirror
Submirror 0: d100
State: Okay
Submirror 1: d200
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 5860524032 blocks (2.7 TB)
d100: Submirror of d300
State: Okay
Size: 5860524032 blocks (2.7 TB)
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase State Reloc Hot Spare
c5t2d0s0 0 No Okay Yes
c5t6d0s0 0 No Okay Yes
c4t2d0s0 0 No Okay Yes
c4t6d0s0 0 No Okay Yes
c7t2d0s0 0 No Okay Yes
c7t6d0s0 0 No Okay Yes
d200: Submirror of d300
State: Okay
Size: 5860524032 blocks (2.7 TB)
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase State Reloc Hot Spare
c6t2d0s0 0 No Okay Yes
c6t6d0s0 0 No Okay Yes
c1t2d0s0 0 No Okay Yes
c1t6d0s0 0 No Okay Yes
c0t2d0s0 0 No Okay Yes
c0t6d0s0 0 No Okay Yes
d30: Mirror
Submirror 0: d31
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 12289725 blocks (5.9 GB)
d31: Submirror of d30
State: Okay
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c5t0d0s5 0 No Okay Yes
d20: Mirror
Submirror 0: d21
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 4096575 blocks (2.0 GB)
d21: Submirror of d20
State: Okay
Size: 4096575 blocks (2.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c5t0d0s1 0 No Okay Yes
d10: Mirror
Submirror 0: d11
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 22539195 blocks (10 GB)
d11: Submirror of d10
State: Okay
Size: 22539195 blocks (10 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c5t0d0s0 0 No Okay Yes
Device Relocation Information:
Device Reloc Device ID
c6t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXV85H
c6t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXR0HH
c1t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHWP5AF
c1t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXSRKH
c0t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHWP74F
c0t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXRGUH
c5t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXUN7H
c5t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXR35H
c4t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXT7LH
c4t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXR0JH
c7t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXTYTH
c7t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHWP7KF
c5t0d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXTZBH
ZFS
pool: pool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t5d0 ONLINE 0 0 0
c6t5d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t1d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t5d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0
c0t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t5d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0
errors: No known data errors
bash-3.00# zfs get all pool/test
NAME PROPERTY VALUE SOURCE
pool/test type filesystem -
pool/test creation Tue Jan 29 11:02 2008 -
pool/test used 50.1G -
pool/test available 2.63T -
pool/test referenced 50.1G -
pool/test compressratio 1.00x -
pool/test mounted yes -
pool/test quota none default
pool/test reservation none default
pool/test recordsize 8K local
pool/test mountpoint /test/zfs local
pool/test sharenfs off default
pool/test checksum on default
pool/test compression off default
pool/test atime on default
pool/test devices on default
pool/test exec on default
pool/test setuid on default
pool/test readonly off default
pool/test zoned off default
pool/test snapdir hidden default
pool/test aclmode groupmask default
pool/test aclinherit secure default
pool/test canmount on default
pool/test shareiscsi off default
pool/test xattr on default
2008-02-11
Live Upgrade - problem with ludelete
A few weeks ago I was trying to do Live Upgrade from Solaris 10 11/06 to 8/07. It went quite well until I decided to delete the old BE:
The only one useful solution was at http://tech.groups.yahoo.com/group/solarisx86/message/44111
Juergen Keil proposed to use lulib from OpenSolaris. Because I didn't have any DVD with OpenSolaris, asked Juergen for copy of lulib and he sent me one. After replacing the original:
Thanks Juergen !
bash-3.00# uname -a
SunOS mickey 5.10 Generic_127112-07 i86pc i386 i86pc
bash-3.00# cat /etc/release
Solaris 10 8/07 s10x_u4wos_12b X86
Copyright 2007 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 16 August 2007
bash-3.00# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
11-06 yes no no yes -
8-07 yes yes yes no -
bash-3.00# ludelete 11-06
The boot environment <11-06> contains the GRUB menu.
Attempting to relocate the GRUB menu.
/usr/sbin/ludelete: lulib_relocate_grub_slice: not found
ERROR: Cannot relocate the GRUB menu in boot environment <11-06>.
ERROR: Cannot delete boot environment <11-06>.
Unable to delete boot environment.
The only one useful solution was at http://tech.groups.yahoo.com/group/solarisx86/message/44111
Juergen Keil proposed to use lulib from OpenSolaris. Because I didn't have any DVD with OpenSolaris, asked Juergen for copy of lulib and he sent me one. After replacing the original:
bash-3.00# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
11-06 yes no no yes -
8-07 yes yes yes no -
bash-3.00# ludelete 11-06
Determining the devices to be marked free.
Updating boot environment configuration database.
Updating boot environment description database on all BEs.
Updating all boot environment configuration databases.
Updating GRUB menu default setting
Boot environment <11-06> deleted.
bash-3.00# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
8-07 yes yes yes no -
Thanks Juergen !
2008-02-08
Jarod Jenson is coming back ...
Jarod Jenson, the famous Texas Ranger, first DTrace user outside of Sun, author of Java DVM provider , has changed company and now, after a silence, is back with his blog !
Subscribe to:
Posts (Atom)