2009-07-14

Filesystem Cache Optimization Strategies

There is an excellent blog entry by Brad Diggs where we can read about Solaris filesystem caches used by UFS and ZFS. It is divided into a few parts:

Introduction
Why Use Filesystem Cache
Solaris Filesystem Caches: segmap cache
Solaris Filesystem Caches: Vnode Page Mapping cache
Solaris Filesystem Caches: ZFS Adaptive Replacement Cache (ARC)
Solaris Filesystem Caches: Memory Contention
How Filesystem Cache Can Improve Performance
Establish A Safe Ceiling
UFS Default Caching
ZFS Default Caching
Optimized ZFS Filesystem Caching
Tuning ZFS Cache
Unlock The Governors
Avoid Diluting The Filesystem Cache
Match Data Access Patterns
Consider Disabling vdev Caching
Minimize Application Data Pagesize
Match Average I/O Block Sizes
Consider The Pros and Cons of Cache Flushes
Prime The Filesystem Cache

Really worth reading !

How to change compression algorithm used in zfs

If you want to change the way zfs compresses you data you can use the following example:

# zfs create pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression on inherited from pool
# zfs set compression=gzip pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression gzip local
# zfs set compression=lzjb pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression lzjb local
# zfs set compression=gzip-9 pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression gzip-9 local
# zfs set compression=on pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression on local

2009-07-08

VxFS (veritas file system) - how to check and resize intent log size

How to query the current size of the intent log:

# fsadm -F vxfs -L /vxfs/my-vol
UX:vxfs fsadm: INFO: V-3-25669: logsize=16384 blocks, logvol=""


How to resize it:

# fsadm -F vxfs -o logsize=32768 /vxfs/my-vol
# fsadm -F vxfs -L /vxfs/my-vol
UX:vxfs fsadm: INFO: V-3-25669: logsize=32768 blocks, logvol=""


Basically with bigger intent log recovery time is proportionately longer and the file system may consume more system resources (such as memory) during normal operation. But also VxFS performs better with larger log sizes.

2009-07-03

Linux - how to turn on framebuffer during boot process

Just write 'vga=some_number' into the menu.lst file (if you use grub - does anybody use lilo yet ? ;-) ). Example values for some_number:

1280x1024x64k (vga=794)
1280x1024x256 (vga=775)
1024x768x64k (vga=791)
1024x768x32k (vga=790)
1024x768x256 (vga=773)
800x600x64k (vga=788)
800x600x32k (vga=787)
800x600x256 (vga=771)

Debian - how to "clone" packages to other server

This is perhaps my first blog entry related to Linux :-). Anyway I have just came across problem of moving one installed Debian (Lenny) to the other server. While rsync is a perfect solution for copying all the personal files I would like to avoid using it to copy all the Debian binaries. There is another, more reliable, way. On the source Debian just run:

host1:~# dpkg --get-selections > /tmp/dpkg.txt
host1:~# head /tmp/dpkg.txt
a2ps install
acpi install
acpi-support install
acpi-support-base install
acpid install
adduser install
adobe-flashplugin install
adobereader-enu install
akregator install
alien install

Transfer this (dpkg.txt) file to the target system (already installed as a minimal) and run:

host2:~# dpkg --set-selections < /tmp/dpkg.txt
host2:~# apt-get -u dselect-upgrade

And Voila ! - apt-get will download and install all the requested package.

2009-07-01

Prezentation during Storage conference

I got the opportunity to present at the Storage GigaCon conference in Warsaw about month ago. My presentation covered a bit of "introduction" to storage (since my presentation was the first) and later I "switched" to storage performance benchmarking. Of course it would be difficult to not to mention Vdbench and SWAT :-)

I wrote it in Polish and English (I know it is not a recommended way of writing presentations ...) so you can read it at least partially.

Optimizing Postgresql application performance with Solaris dynamic tracing

There is an excellent BluePrints article about DTrace probes in Postgresql. You can download it over here. Using DTrace you can get amazing information about internals of Postgresql, for instance (example taken from this article) having this DTrace script:

# cat query_load.d
#!/usr/sbin/dtrace -qs
dtrace:::BEGIN
{
printf(“Tracing... Hit Ctrl-C to end.\n”);
}
postgresql*:::query-start
{
self->query = copyinstr(arg0);
self->pid = pid;
}
postgresql*:::query-done
{
@queries[pid, self->query] = count();
}
dtrace:::END
{
printf(“%5s %s %s\n”, “PID”, “COUNT”, “QUERY”);
printa(“%6d %@5d %s\n”, @queries);
}

we get:

PID COUNT QUERY
1221 154 UPDATE tellers SET tbalance = tbalance + -487 WHERE tid = 25;
1221 204 UPDATE tellers SET tbalance = tbalance + 1051 WHERE tid = 42;
1220 215 UPDATE accounts SET abalance = abalance + -4302 WHERE aid = 144958;
1220 227 UPDATE accounts SET abalance = abalance + 2641 WHERE aid = 441283;


Isn't it amazing ?

I don't know if Larry Ellison is aware of DTrace but I wish I had the same DTrace probes in Oracle ...

2009-05-19

Veritas Volume Manager - display DMP IO statistics

If you are Veritas Volume Manager user you can use DMP to configure path failover and load balancing using many links (data paths) to storage. vxdmpadm is the command to do all the tasks. Just recently I had to check if data goes along configured links. You can achive this using the same vxdmpadm.

To enable the gathering of statistics use this command:

vxdmpadm iostat start

To reset the IO counters:

vxdmpadm iostat reset

To display the current statistics:

vxdmpadm -z iostat show all interval=60
cpu usage = 3037899us per cpu memory = 32768b
OPERATIONS BLOCKS AVG TIME(ms)
PATHNAME READS WRITES READS WRITES READS WRITES
c2t50060E80102C7770d16s2 99 43566 1584 1419358 0.55 0.03
c4t50060E80102C7772d16s2 101 37584 1616 1366880 0.74 0.17
c2t50060E80102C7770d21s2 2192 1623 87424 29072 0.11 0.04
c4t50060E80102C7772d21s2 1041 1059 46016 19840 0.16 0.24
...

The first output shows statistics from server boot. Next listings shows current data:

c2t50060E80102C7770d21s2 1 2 16 32 0.39 0.03
c4t50060E80102C7772d21s2 5 0 80 0 0.72 0.00
c2t50060E80102C7770d57s2 2 4 32 112 0.50 0.19
c4t50060E80102C7772d57s2 4 2 64 80 0.03 0.02
...


To disable the gathering of statistics use this command:

vxdmpadm iostat stop

2009-02-27

2009-02-16

DTrace CPU Performance Counter provider

Jonathan Haslam described DTrace CPU Performance Counter provider which "... gives you the ability
to profile your system by many different types of processor related events; the list of
events is processor specific and usually quite large but typically includes events such
as cycles executed, instructions executed, cache misses, TLB misses and many more ... ".
All the information are here, here and here.

2009-02-05

Optimizing MySQL Database Application Performance with Solaris Dynamic Tracing (DTrace)

There is fantastic white paper how taking advantage of Solaris Dynamic Tracing (DTrace) probes can help simplify MySQL database application tuning. The content of this white paper:

  • Introduction
  • Approaching MySQL Database Application Tuning
  • The Advantages of Solaris Dynamic Tracing
  • Simplifying and Speeding Performance Tuning Efforts
  • Analyzing Query Loads
  • Probing the Cost of File Sort Operations
  • Profiling the Use of Stored Procedures
  • Observing Slave Queries
  • Optimizing Use of the MySQL Database Query Cache
  • Putting it all Together
  • For More Information
  • About the Author
  • Related Resources
  • Ordering Sun Documents
  • Accessing Sun Documentation Online

It is downloadble over here.

2009-01-27

Parallel bzip in CMT (multicore) environment

We have a lot of archive logs from our Oracle databases. To save storage space we compress them. Of course this process (single threaded) on CMT-based CPUs takes quite long time. To use power of these CPUs we should parallelize all (if possible) tasks. Pbzip2 befriends us - it is a parallel implementation of the bzip2. This is an example:

rudy:/tmp/l# ls -alh
total 4194336
drwxr-xr-x 2 root root 233 Jan 27 16:35 .
drwxrwxrwt 3 root sys 286 Jan 27 16:30 ..
-rw-r----- 1 root root 1.0G Jan 27 16:35 1
-rw-r----- 1 root root 1.0G Jan 27 15:41 2
rudy:/tmp/l# time bzip2 1

real 13m4.397s
user 12m59.987s
sys 0m4.310s
rudy:/tmp/l# time /opt/csw/bin/pbzip2 2

real 0m34.842s
user 34m46.095s
sys 0m7.972s
rudy:/tmp/l# time /opt/csw/bin/pbzip2 -l 2

real 0m41.907s
user 28m52.137s
sys 0m6.302s


We can see that the difference is tremendous !!!

The '-l' parameter determines max number processors to use (calculation based on load average). This server is quite idle so more heavy workload might lengthten the time.

2008-07-31

swat & vdbench - excellent couple

There has been anouncement about SWAT - another tool helpful in performing benchmarking tests. Together with Vdbench we get almost the perfect couple for playing with storage benchmarking.

2008-06-08

vdbench - disk I/O workload generator

There are several tools which allow to test performance of filesystems like Iozone, Bonnie, FileBench. There is also one unknown widely called vdbench. It is disk I/O workload generator written by Sun's employee, Henk Vandenbergh and used internally at Sun and its customers.

How does it differ from the others tools ? We'll see ... :-)

Vdbench, besides classical CLI has also GUI which simplifies its usage. However I am going to show how to use it via CLI. After downloading one need to install it. Vdbench has optional, -t, parameter which allows to specify target directory:

kruk:/tmp/vdbench# ./install_unix.sh -t /opt/vdbench
Sun Microsystems, Inc. ("Sun") ENTITLEMENT for SOFTWARE

Licensee/Company: Entity receiving Software.

Efective Date: Date Sun delivers the Software to You.

Software: Sun StorageTek vdbench 4.07

[...]

Please contact Sun Microsystems, Inc. 4150 Network Circle, Santa
Clara, California 95054 if you have questions.




Accept User License Agreement (yes/no): yes
08:51:47.887 plog(): execute(): tar -tf /tmp/vdbench/vdbench407.tar

********************************************************************************
[...]

Tool will expire on: sobota, kwiecień 25 2009, 23:27:38


********************************************************************************



Tool installation to /opt/vdbench successful

Now its time to write parameter file with description of needed workload. There are same examples within vdbench directory (example*). But let's begin with the following, simple one:

sd=sd1,lun=/vdb/file1
sd=sd2,lun=/vdb/file2
sd=sd3,lun=/vdb/file3
sd=sd4,lun=/vdb/file4

wd=rg-1,sd=sd*,rdpct=70,rhpct=0,whpct=0,xfersize=8k,seekpct=70

rd=rd_rg-1,wd=rg-1,interval=1,iorate=max,elapsed=30,forthreads=(64)

How to interpret that file ? For example:
sd - Storage Definition (where the I/O is going to/from - storage, disks, files)
wd - Workload Definition (precise definition of workload) - some explanations:
  • rdpct - read percentage; 70 means that 70% of time is spent on reading and the rest, 30%, on writing
  • xfersize - size of each I/O
  • seekpct - percentage of random seeks
rd - Run Definition (in general - which and for how long run Workload Definition)
Of course we need to create needed files:

kruk:/root# cd /opt/vdbench/
kruk:/opt/vdbench# mkdir /vdb
kruk:/opt/vdbench# mkfile 100m /vdb/file1
kruk:/opt/vdbench# mkfile 100m /vdb/file2
kruk:/opt/vdbench# mkfile 100m /vdb/file3
kruk:/opt/vdbench# mkfile 100m /vdb/file4

What I like in this tool is the ability to show IOPS in each second of test what gives excellent view of tested environment. Let's see the example:

kruk:/opt/vdbench# ./vdbench -f my-parm.cfg
[...]
interval i/o MB/sec bytes read resp resp resp cpu% cpu%
rate 1024**2 i/o pct time max stddev sys+usr sys
14:50:41.208 1 1770,85 13,83 8192 79,48 104,913 774,921 170,206 20,8 10,5
14:50:42.097 2 1600,90 12,51 8192 67,65 159,721 961,760 225,067 20,0 11,8
14:50:43.103 3 1302,75 10,18 8192 68,05 184,123 1439,223 262,792 13,7 8,0
14:50:44.085 4 1112,86 8,69 8192 68,20 219,451 1954,038 315,391 12,6 7,1
14:50:45.080 5 1210,84 9,46 8192 68,83 220,502 1511,942 322,902 12,0 7,8
14:50:46.078 6 1192,25 9,31 8192 70,59 213,559 1474,794 318,486 11,7 7,2
14:50:47.081 7 899,99 7,03 8192 70,48 253,603 1654,079 378,854 10,3 6,0
14:50:48.058 8 1251,91 9,78 8192 71,13 219,671 1831,191 340,373 11,8 7,5
14:50:49.049 9 1004,77 7,85 8192 68,77 251,295 1668,598 364,461 10,0 6,7
14:50:50.049 10 1124,07 8,78 8192 68,28 229,389 1804,713 329,042 10,5 6,5
14:50:51.047 11 1099,94 8,59 8192 68,47 236,588 1699,419 344,097 10,0 6,7
14:50:52.040 12 629,53 4,92 8192 69,45 265,482 1742,241 374,407 7,7 4,7
14:50:53.043 13 1042,71 8,15 8192 69,05 344,049 2308,431 532,435 8,8 6,0
14:50:54.042 14 1452,97 11,35 8192 68,11 174,344 2086,119 251,800 11,8 8,8
14:50:55.075 15 1175,48 9,18 8192 69,67 212,452 1504,912 312,161 9,8 6,8
14:50:56.046 16 1048,33 8,19 8192 68,92 227,462 1595,952 325,352 10,8 7,0
14:50:57.047 17 881,26 6,88 8192 68,19 264,582 1160,291 365,083 9,2 5,7
14:50:58.068 18 1023,81 8,00 8192 71,98 282,999 1757,541 435,796 12,3 6,5
14:50:59.056 19 1076,79 8,41 8192 71,47 218,663 1339,287 339,072 11,8 7,8
14:51:00.044 20 1177,03 9,20 8192 68,99 231,724 1469,797 339,442 11,2 7,2
14:51:01.037 21 1136,34 8,88 8192 72,33 221,309 1807,646 343,556 10,3 6,8
14:51:02.043 22 655,37 5,12 8192 69,91 279,581 1286,680 402,433 6,5 4,5
14:51:03.040 23 1022,76 7,99 8192 70,30 336,784 1886,443 511,220 9,8 6,3
14:51:04.047 24 1322,93 10,34 8192 70,74 198,405 1732,970 295,461 11,8 8,0
14:51:05.049 25 1028,04 8,03 8192 68,67 228,957 1436,083 321,115 9,7 6,7
14:51:06.039 26 1075,99 8,41 8192 69,78 241,907 1642,574 349,677 9,5 6,3
14:51:07.046 27 908,38 7,10 8192 70,23 248,514 1661,609 363,362 9,0 6,0
14:51:08.044 28 827,90 6,47 8192 70,35 331,738 1898,630 486,697 7,8 5,3
14:51:09.046 29 1208,92 9,44 8192 68,49 225,420 1581,279 321,723 10,9 7,2
14:51:10.042 30 1025,04 8,01 8192 70,23 239,669 1502,131 347,456 9,3 6,5
14:51:10.055 avg_2-30 1086,79 8,49 8192 69,51 234,502 2308,431 353,760 10,7 6,9

iostat during the run:

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
395.0 178.0 3160.0 1424.0 0.0 2.6 0.0 4.6 0 92 c1
395.0 178.0 3160.0 1424.0 0.0 2.6 0.0 4.6 1 92 c1t50060E8000444540d4
394.0 173.0 3152.1 1384.1 0.0 3.3 0.0 5.7 0 99 c3
394.0 173.0 3152.2 1384.1 0.0 3.3 0.0 5.7 1 99 c3t50060E8000444542d4
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
484.1 188.0 3872.8 1504.3 0.0 3.0 0.0 4.5 0 98 c1
484.1 188.0 3872.8 1504.3 0.0 3.0 0.0 4.5 1 98 c1t50060E8000444540d4
418.1 177.0 3344.7 1416.3 0.0 3.0 0.0 5.0 0 97 c3
418.1 177.0 3344.6 1416.3 0.0 3.0 0.0 5.0 1 97 c3t50060E8000444542d4

In OLTP environments the column i/o rate is one of the most important while in DSS world MB/sec is probably more important. "Ok", you say, "you have stable io rate. Nothing really interesting." Are you sure ? Can you show me another, non-commercial tool which shows the same: io rate in each second ? How about this example:

interval i/o MB/sec bytes read resp resp resp
rate 1024**2 i/o pct time max stddev
15:04:15.746 1 27558,72 215,30 8192 69,72 1,089 338,684 10,493
15:04:16.465 2 42955,58 335,59 8192 69,66 ,219 903,236 5,504
15:04:17.170 3 45265,34 353,64 8192 69,63 ,132 384,352 3,121
15:04:17.692 4 41596,08 324,97 8192 69,99 ,119 148,012 2,091
15:04:18.322 5 38974,35 304,49 8192 69,98 ,210 570,820 5,815
15:04:19.385 6 22774,35 177,92 8192 69,32 1,830 1040,743 22,121
15:04:20.565 7 28552,81 223,07 8192 69,48 1,651 1539,927 28,605
15:04:21.079 8 38379,22 299,84 8192 69,30 ,181 340,999 3,756
15:04:22.159 9 40825,13 318,95 8192 70,07 ,134 319,363 2,655
15:04:23.187 10 37296,62 291,38 8192 69,91 ,161 186,869 2,589
15:04:24.334 11 23630,10 184,61 8192 70,29 ,894 352,045 6,997
15:04:25.087 12 24689,14 192,88 8192 70,14 ,192 604,120 7,659
15:04:26.334 13 33136,76 258,88 8192 70,22 ,094 66,819 ,870
15:04:27.046 14 40521,35 316,57 8192 69,89 ,128 348,961 2,395
15:04:29.072 15 35763,25 279,40 8192 69,60 ,394 264,676 4,571
15:04:29.290 16 28473,31 222,45 8192 69,65 1,433 730,685 14,482
15:04:30.206 17 31505,71 246,14 8192 69,62 ,979 914,484 15,763
15:04:31.259 18 40196,17 314,03 8192 69,67 ,125 234,910 1,692
15:04:32.116 19 35458,59 277,02 8192 69,77 ,116 138,638 1,540
15:04:33.249 20 43064,86 336,44 8192 69,97 ,128 390,286 2,813
15:04:34.234 21 24776,40 193,57 8192 70,12 1,947 231,994 12,754
15:04:35.554 22 35134,31 274,49 8192 69,80 ,451 703,214 6,932
15:04:36.261 23 33361,52 260,64 8192 69,50 ,444 898,570 11,638
15:04:37.557 24 38474,73 300,58 8192 70,05 ,225 322,527 4,602
15:04:38.234 25 41275,28 322,46 8192 69,74 ,170 206,164 2,687
15:04:39.097 26 22927,25 179,12 8192 70,44 1,345 665,988 11,805
15:04:40.258 27 32228,28 251,78 8192 70,14 ,540 752,539 9,812
15:04:41.296 28 39111,23 305,56 8192 70,15 ,196 271,594 3,369
15:04:42.200 29 41695,78 325,75 8192 69,74 ,210 282,547 3,677
15:04:43.110 30 46167,28 360,68 8192 69,80 ,156 356,567 2,871
15:04:43.131 avg_2-30 35532,33 277,60 8192 69,84 ,420 1539,927 8,438

And iostat:

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 827.7 0.0 105947.0 0.0 34.9 0.0 42.2 0 100 c1
0.0 827.7 0.0 105944.1 0.0 34.9 0.0 42.2 1 100 c1t50060E8000444540d3
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 439.1 0.0 49615.6 0.0 12.1 0.0 27.5 0 45 c1
0.0 439.1 0.0 49615.6 0.0 12.1 0.0 27.5 1 45 c1t50060E8000444540d3
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 919.3 0.0 116434.0 0.0 26.9 0.0 29.3 0 85 c1
0.0 919.3 0.0 116434.2 0.0 26.9 0.0 29.3 1 85 c1t50060E8000444540d3

Quite interesting, isn't it ? "What did you change to get so high io rate", you might ask ? Hmmm, I am not sure if I want to reveal ... ;-)

Seriously: the first test was on Veritas File System while the second one on ZFS. Does it mean that ZFS is faster then VxFS ? I don't think so. But using vdbench you can watch two different nature of mentioned filesystems. If you look closer at ZFS you will see that there is no reads from filesystem. Why ? Vdbench parameter "rdpct=70" forces to generate 70% reads and 30% writes.
It is because intensive kernel memory usage in ZFS.

Another example: have a look at first vdbench output. It is interesting that io rate goes up and down. I tracked it down earlier and revealed that hardware array, which I had put the filesystems on, has some bizarre behaviour using RAID-5.

Anyway you can see that without any sophisticated (and expansive) tool you are able to bring to light new, performance related information about your storage and filesystem.

Vdbench was released to the public a few weeks ago so we can download it over here.

2008-02-17

Solaris IO tuning: monitoring disk queue - lack of knowledge

A few months ago we faced some performance problems on one of our servers. There is One Very Well Known Big Database (OVWKBD - complicated, isn't it ? ;-) ) running on it. One end user reported us that there are some hangs during office hours. We (me and one of ours DBA who is responsible for the OVWKBD) were surprised since it had never happened before (or, as I assume, nobody had told us before) and began investigation. After a few days our DBA pointed out that it may be related to redo logs writing (which were, as the rest of database, on SAN-attached disk array). In fact he was sure that it is the problem but when I asked for any proof he didn't deliver any. He insisted on changing the array configuration but since I don't like to proceed blindly, I was going to monitor io (disk) queue(s) before any significant change. I set /etc/system according to docs and the ssd_max_throttle variable is the key setting. But still wanted to be more aware what is going on at the disk queue level and setup ssd_max_throttle according to _real_ needs. When I started hunting high and low I realized that the disk queue (which is after all one of the key areas of IO tuning) is poorly documented so I began desperately seeking any knowledge. Neither Sunsolve nor docs.sun.com helped me. One page, http://wikis.sun.com/display/StorageDev/The+Solaris+OS+Queue+Throttle, gave me a tiny piece of information but still ...

Oh, God ! I don't believe that there is no docs about it !

I tried Opensolaris mailing lists but still no really useful information. In one e-mail, Robert Miłkowski, suggested to use scsi.d DTrace script. I had used it before but it hadn't helped me. Maybe I wasn't able to read it properly ? Well, at least I could try it again:

00000.868602160 fp2:-> 0x2a WRITE(10) address 2287:76, lba 0x00897a89, len 0x000002, control 0x00 timeout 60 CDBP 6002064db9c RDBMS(25751) cdb(10) 2a0000897a8900000200
00000.890551520 fp2:-> 0x28 READ(10) address 2287:76, lba 0x00a877f5, len 0x000001, control 0x00 timeout 60 CDBP 3000e37cde4 RDBMS(23974) cdb(10) 280000a877f500000100
00000.584069360 fp2:-> 0x2a WRITE(10) address 2287:66, lba 0x019c01f8, len 0x00013d, control 0x00 timeout 60 CDBP 60026a1e4d4 RDBMS(18244) cdb(10) 2a00019c01f800013d00
00000.239667600 fp2:-> 0x2a WRITE(10) address 2287:18, lba 0x00d93c20, len 0x000010, control 0x00 timeout 60 CDBP 300270fadec RDBMS(25753) cdb(10) 2a0000d93c2000001000
00000.958698480 fp2:-> 0x2a WRITE(10) address 2287:08, lba 0x00001f10, len 0x000010, control 0x00 timeout 60 CDBP 30025654084 sched(0) cdb(10) 2a0000001f1000001000
00000.240042160 fp2:<- 0x2a WRITE(10) address 2287:16, lba 0x01d4889c, len 0x000010, control 0x00 timeout 60 CDBP 60019209b84, reason 0x0 (COMPLETED) state 0x1f Time 820us
00000.240213360 fp2:<- 0x2a WRITE(10) address 2287:15, lba 0x0274b920, len 0x000010, control 0x00 timeout 60 CDBP 6002064c4f4, reason 0x0 (COMPLETED) state 0x1f Time 912us
00000.240311200 fp2:<- 0x2a WRITE(10) address 2287:18, lba 0x00d93c20, len 0x000010, control 0x00 timeout 60 CDBP 300270fadec, reason 0x0 (COMPLETED) state 0x1f Time 730us
00000.585352960 fp2:<- 0x2a WRITE(10) address 2287:67, lba 0x004a5ef8, len 0x00013d, control 0x00 timeout 60 CDBP 3003bb1add4, reason 0x0 (COMPLETED) state 0x1f Time 1390us
00000.586121680 fp2:<- 0x2a WRITE(10) address 2287:66, lba 0x019c01f8, len 0x00013d, control 0x00 timeout 60 CDBP 60026a1e4d4, reason 0x0 (COMPLETED) state 0x1f Time 2136us
00000.868869200 fp2:<- 0x2a WRITE(10) address 2287:17, lba 0x005ca80b, len 0x000002, control 0x00 timeout 60 CDBP 30053138df4, reason 0x0 (COMPLETED) state 0x1f Time 404us
00000.869025920 fp2:<- 0x2a WRITE(10) address 2287:76, lba 0x00897a89, len 0x000002, control 0x00 timeout 60 CDBP 6002064db9c, reason 0x0 (COMPLETED) state 0x1f Time 501us
00000.889036480 fp2:-> 0x28 READ(10) address 2287:76, lba 0x00a879d9, len 0x000001, control 0x00 timeout 60 CDBP 6002064db9c RDBMS(23974) cdb(10) 280000a879d900000100
00000.889377200 fp2:<- 0x28 READ(10) address 2287:76, lba 0x00a879d9, len 0x000001, control 0x00 timeout 60 CDBP 6002064db9c, reason 0x0 (COMPLETED) state 0x1f Time 409us
00000.890777520 fp2:<- 0x28 READ(10) address 2287:76, lba 0x00a877f5, len 0x000001, control 0x00 timeout 60 CDBP 3000e37cde4, reason 0x0 (COMPLETED) state 0x1f Time 267us
00000.959244800 fp2:<- 0x2a WRITE(10) address 2287:08, lba 0x00001f10, len 0x000010, control 0x00 timeout 60 CDBP 30025654084, reason 0x0 (COMPLETED) state 0x1f Time 642us
00000.239373680 fp2:-> 0x2a WRITE(10) address 2287:16, lba 0x01d4889c, len 0x000010, control 0x00 timeout 60 CDBP 60019209b84 RDBMS(25753) cdb(10) 2a0001d4889c00001000
00000.868509120 fp2:-> 0x2a WRITE(10) address 2287:17, lba 0x005ca80b, len 0x000002, control 0x00 timeout 60 CDBP 30053138df4 RDBMS(25751) cdb(10) 2a00005ca80b00000200
00000.239401200 fp2:-> 0x2a WRITE(10) address 2287:15, lba 0x0274b920, len 0x000010, control 0x00 timeout 60 CDBP 6002064c4f4 RDBMS(25753) cdb(10) 2a000274b92000001000
00000.584010640 fp2:-> 0x2a WRITE(10) address 2287:67, lba 0x004a5ef8, len 0x00013d, control 0x00 timeout 60 CDBP 3003bb1add4 RDBMS(18244) cdb(10) 2a00004a5ef800013d00

Still no joy :-(. I become pessimist about finding any answer to my questions.
Next day I couldn't stop thinking about it and become down in the dumps ...

Suddenly, wait a minute ! Have I checked who wrote the scsi.d script ?!?!? No ! Let's quickly find out ! Maybe this is the way I can find any answer ?!?!?! The begin of the script says:

...
/*
* Chris.Gerhard@sun.com
* Joel.Buckley@sun.com
*/

#pragma ident "@(#)scsi.d 1.12 07/03/16 SMI"
...


Hope got back. ;-)

I know that hope often blinks at a fool but you understand me ? Yes, you do ! Thanks ! ;-)

Let's see if these guys would help me. I sent them an e-mail without (well, almost ...) any belief that it would work ... and a few hours later Chris answered ! I still didn't believe it while reading his e-mail ! To make it not so easy, Chris offered a "deal": if I describe him my problem he will answer it via his blog. And that is how
http://blogs.sun.com/chrisg/entry/latency_bubble_in_your_io
was born ...
More such deals ! ;-)

2008-02-16

ZFS vs VxFS vs UFS on x4500 (Thumper - JBOD)

A few months ago I compared performance of the above filesystems using filebench. Since then a few things changed:
  • it is Solaris 10 8/07 available now (compared to Solaris 10 11/06 used during the previous test). Thanks to fabulous pca tool all (really all !) patches were installed.
  • there is a new 1.1 filebench released
  • there is Veritas Storage Foundation Basic v5.0 for x64 (the last available version used to be 4.1)
A few words about the last option: VSF Basic is free version of commercial VSF but, according to Symantec site, limited to 4 user-data volumes, and/or 4 user-data file systems, and/or 2 processor sockets in a single physical system. So x4500 (aka Thumper) is within the limitations.
I decided to test RAID 1+0 under OLTP (8k, no-cached) workload. Since x4500 has 48 SATA disks I divided them into 3 sets, one for each filesystem: VxFS/VxVM, ZFS and UFS. Hard Drive Monitor Utility (HD Tool) allows to draw ASCII map of the internal drive layout:

---------------------SunFireX4500------Rear----------------------------

36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47:
c5t3 c5t7 c4t3 c4t7 c7t3 c7t7 c6t3 c6t7 c1t3 c1t7 c0t3 c0t7 <-VxFS
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35:
c5t2 c5t6 c4t2 c4t6 c7t2 c7t6 c6t2 c6t6 c1t2 c1t6 c0t2 c0t6 <- UFS
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:
c5t1 c5t5 c4t1 c4t5 c7t1 c7t5 c6t1 c6t5 c1t1 c1t5 c0t1 c0t5 <- ZFS
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
c5t0 c5t4 c4t0 c4t4 c7t0 c7t4 c6t0 c6t4 c1t0 c1t4 c0t0 c0t4
^b+ ^b+ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
-------*-----------*-SunFireX4500--*---Front-----*-----------*----------

Each filesystem was mounted in its directory. The follow filebench confguration was used in testing:

DEFAULTS {
runtime = 60;
dir = "directory where each filesystem was mounted";
stats = /tmp;
filesystem = "zfs|ufs|vxfs";
description = "oltp zfs|ufs|vxfs";
}
CONFIG oltp_8k_uncached {
personality = oltp;
function = generic;
cached = 0;
directio = 1;
iosize = 8k;
nshadows = 200;
ndbwriters = 10;
usermode = 20000;
filesize = 5g;
nfiles = 10;
memperthread = 1m;
workingset = 0;
}

Below are results:



A few observations:
  • compared to the previous benchmark we can see big improvements in ZFS area (but of course the environment, jbod instead of SCSI array can influence the results)
  • VxFS is still the winner. But typical configuration of RAID 1+0 is not faster then ZFS. Only 6-cols configuration wins with ZFS.

All the filesystem configurations are below:
VxVM/VxFS 2-cols

Disk group: vxgroup

DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE

dg vxgroup default default 105000 1201598836.80.mickey

dm c0t3d0 c0t3d0 auto 65532 976691152 -
dm c0t7d0 c0t7d0 auto 65532 976691152 -
dm c1t3d0 c1t3d0 auto 65532 976691152 -
dm c1t7d0 c1t7d0 auto 65532 976691152 -
dm c4t3d0 c4t3d0 auto 65532 976691152 -
dm c4t7d0 c4t7d0 auto 65532 976691152 -
dm c5t3d0 c5t3d0 auto 65532 976691152 -
dm c5t7d0 c5t7d0 auto 65532 976691152 -
dm c6t3d0 c6t3d0 auto 65532 976691152 -
dm c6t7d0 c6t7d0 auto 65532 976691152 -
dm c7t3d0 c7t3d0 auto 65532 976691152 -
dm c7t7d0 c7t7d0 auto 65532 976691152 -

v vx-vol - ENABLED ACTIVE 5860145152 SELECT vx-vol-03 fsgen
pl vx-vol-03 vx-vol ENABLED ACTIVE 5860145152 STRIPE 2/32 RW
sv vx-vol-S01 vx-vol-03 vx-vol-L01 1 976691152 0/0 2/2 ENA
sv vx-vol-S02 vx-vol-03 vx-vol-L02 1 976691152 0/976691152 2/2 ENA
sv vx-vol-S03 vx-vol-03 vx-vol-L03 1 976690272 0/1953382304 2/2 ENA
sv vx-vol-S04 vx-vol-03 vx-vol-L04 1 976691152 1/0 2/2 ENA
sv vx-vol-S05 vx-vol-03 vx-vol-L05 1 976691152 1/976691152 2/2 ENA
sv vx-vol-S06 vx-vol-03 vx-vol-L06 1 976690272 1/1953382304 2/2 ENA

v vx-vol-L01 - ENABLED ACTIVE 976691152 SELECT - fsgen
pl vx-vol-P01 vx-vol-L01 ENABLED ACTIVE 976691152 CONCAT - RW
sd c0t3d0-02 vx-vol-P01 c0t3d0 0 976691152 0 c0t3d0 ENA
pl vx-vol-P02 vx-vol-L01 ENABLED ACTIVE 976691152 CONCAT - RW
sd c1t3d0-02 vx-vol-P02 c1t3d0 0 976691152 0 c1t3d0 ENA

v vx-vol-L02 - ENABLED ACTIVE 976691152 SELECT - fsgen
pl vx-vol-P03 vx-vol-L02 ENABLED ACTIVE 976691152 CONCAT - RW
sd c4t3d0-02 vx-vol-P03 c4t3d0 0 976691152 0 c4t3d0 ENA
pl vx-vol-P04 vx-vol-L02 ENABLED ACTIVE 976691152 CONCAT - RW
sd c5t3d0-02 vx-vol-P04 c5t3d0 0 976691152 0 c5t3d0 ENA

v vx-vol-L03 - ENABLED ACTIVE 976690272 SELECT - fsgen
pl vx-vol-P05 vx-vol-L03 ENABLED ACTIVE 976690272 CONCAT - RW
sd c6t3d0-02 vx-vol-P05 c6t3d0 0 976690272 0 c6t3d0 ENA
pl vx-vol-P06 vx-vol-L03 ENABLED ACTIVE 976690272 CONCAT - RW
sd c7t3d0-02 vx-vol-P06 c7t3d0 0 976690272 0 c7t3d0 ENA

v vx-vol-L04 - ENABLED ACTIVE 976691152 SELECT - fsgen
pl vx-vol-P07 vx-vol-L04 ENABLED ACTIVE 976691152 CONCAT - RW
sd c0t7d0-02 vx-vol-P07 c0t7d0 0 976691152 0 c0t7d0 ENA
pl vx-vol-P08 vx-vol-L04 ENABLED ACTIVE 976691152 CONCAT - RW
sd c1t7d0-02 vx-vol-P08 c1t7d0 0 976691152 0 c1t7d0 ENA

v vx-vol-L05 - ENABLED ACTIVE 976691152 SELECT - fsgen
pl vx-vol-P09 vx-vol-L05 ENABLED ACTIVE 976691152 CONCAT - RW
sd c4t7d0-02 vx-vol-P09 c4t7d0 0 976691152 0 c4t7d0 ENA
pl vx-vol-P10 vx-vol-L05 ENABLED ACTIVE 976691152 CONCAT - RW
sd c5t7d0-02 vx-vol-P10 c5t7d0 0 976691152 0 c5t7d0 ENA

v vx-vol-L06 - ENABLED ACTIVE 976690272 SELECT - fsgen
pl vx-vol-P11 vx-vol-L06 ENABLED ACTIVE 976690272 CONCAT - RW
sd c6t7d0-02 vx-vol-P11 c6t7d0 0 976690272 0 c6t7d0 ENA
pl vx-vol-P12 vx-vol-L06 ENABLED ACTIVE 976690272 CONCAT - RW
sd c7t7d0-02 vx-vol-P12 c7t7d0 0 976690272 0 c7t7d0 ENA


VxVM/VxFS 6-cols

Disk group: vxgroup

DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE

dg vxgroup default default 105000 1201598836.80.mickey

dm c0t3d0 c0t3d0 auto 65532 976691152 -
dm c0t7d0 c0t7d0 auto 65532 976691152 -
dm c1t3d0 c1t3d0 auto 65532 976691152 -
dm c1t7d0 c1t7d0 auto 65532 976691152 -
dm c4t3d0 c4t3d0 auto 65532 976691152 -
dm c4t7d0 c4t7d0 auto 65532 976691152 -
dm c5t3d0 c5t3d0 auto 65532 976691152 -
dm c5t7d0 c5t7d0 auto 65532 976691152 -
dm c6t3d0 c6t3d0 auto 65532 976691152 -
dm c6t7d0 c6t7d0 auto 65532 976691152 -
dm c7t3d0 c7t3d0 auto 65532 976691152 -
dm c7t7d0 c7t7d0 auto 65532 976691152 -

v vx-vol - ENABLED ACTIVE 5860145152 SELECT vx-vol-03 fsgen
pl vx-vol-03 vx-vol ENABLED ACTIVE 5860145280 STRIPE 6/32 RW
sv vx-vol-S01 vx-vol-03 vx-vol-L01 1 976690880 0/0 2/2 ENA
sv vx-vol-S02 vx-vol-03 vx-vol-L02 1 976690880 1/0 2/2 ENA
sv vx-vol-S03 vx-vol-03 vx-vol-L03 1 976690880 2/0 2/2 ENA
sv vx-vol-S04 vx-vol-03 vx-vol-L04 1 976690880 3/0 2/2 ENA
sv vx-vol-S05 vx-vol-03 vx-vol-L05 1 976690880 4/0 2/2 ENA
sv vx-vol-S06 vx-vol-03 vx-vol-L06 1 976690880 5/0 2/2 ENA

v vx-vol-L01 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P01 vx-vol-L01 ENABLED ACTIVE 976690880 CONCAT - RW
sd c0t3d0-02 vx-vol-P01 c0t3d0 0 976690880 0 c0t3d0 ENA
pl vx-vol-P02 vx-vol-L01 ENABLED ACTIVE 976690880 CONCAT - RW
sd c5t3d0-02 vx-vol-P02 c5t3d0 0 976690880 0 c5t3d0 ENA

v vx-vol-L02 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P03 vx-vol-L02 ENABLED ACTIVE 976690880 CONCAT - RW
sd c0t7d0-02 vx-vol-P03 c0t7d0 0 976690880 0 c0t7d0 ENA
pl vx-vol-P04 vx-vol-L02 ENABLED ACTIVE 976690880 CONCAT - RW
sd c5t7d0-02 vx-vol-P04 c5t7d0 0 976690880 0 c5t7d0 ENA

v vx-vol-L03 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P05 vx-vol-L03 ENABLED ACTIVE 976690880 CONCAT - RW
sd c1t3d0-02 vx-vol-P05 c1t3d0 0 976690880 0 c1t3d0 ENA
pl vx-vol-P06 vx-vol-L03 ENABLED ACTIVE 976690880 CONCAT - RW
sd c6t3d0-02 vx-vol-P06 c6t3d0 0 976690880 0 c6t3d0 ENA

v vx-vol-L04 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P07 vx-vol-L04 ENABLED ACTIVE 976690880 CONCAT - RW
sd c1t7d0-02 vx-vol-P07 c1t7d0 0 976690880 0 c1t7d0 ENA
pl vx-vol-P08 vx-vol-L04 ENABLED ACTIVE 976690880 CONCAT - RW
sd c6t7d0-02 vx-vol-P08 c6t7d0 0 976690880 0 c6t7d0 ENA

v vx-vol-L05 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P09 vx-vol-L05 ENABLED ACTIVE 976690880 CONCAT - RW
sd c4t3d0-02 vx-vol-P09 c4t3d0 0 976690880 0 c4t3d0 ENA
pl vx-vol-P10 vx-vol-L05 ENABLED ACTIVE 976690880 CONCAT - RW
sd c7t3d0-02 vx-vol-P10 c7t3d0 0 976690880 0 c7t3d0 ENA

v vx-vol-L06 - ENABLED ACTIVE 976690880 SELECT - fsgen
pl vx-vol-P11 vx-vol-L06 ENABLED ACTIVE 976690880 CONCAT - RW
sd c4t7d0-02 vx-vol-P11 c4t7d0 0 976690880 0 c4t7d0 ENA
pl vx-vol-P12 vx-vol-L06 ENABLED ACTIVE 976690880 CONCAT - RW
sd c7t7d0-02 vx-vol-P12 c7t7d0 0 976690880 0 c7t7d0 ENA


UFS

d300: Mirror
Submirror 0: d100
State: Okay
Submirror 1: d200
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 5860524032 blocks (2.7 TB)

d100: Submirror of d300
State: Okay
Size: 5860524032 blocks (2.7 TB)
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase State Reloc Hot Spare
c5t2d0s0 0 No Okay Yes
c5t6d0s0 0 No Okay Yes
c4t2d0s0 0 No Okay Yes
c4t6d0s0 0 No Okay Yes
c7t2d0s0 0 No Okay Yes
c7t6d0s0 0 No Okay Yes


d200: Submirror of d300
State: Okay
Size: 5860524032 blocks (2.7 TB)
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase State Reloc Hot Spare
c6t2d0s0 0 No Okay Yes
c6t6d0s0 0 No Okay Yes
c1t2d0s0 0 No Okay Yes
c1t6d0s0 0 No Okay Yes
c0t2d0s0 0 No Okay Yes
c0t6d0s0 0 No Okay Yes


d30: Mirror
Submirror 0: d31
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 12289725 blocks (5.9 GB)

d31: Submirror of d30
State: Okay
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c5t0d0s5 0 No Okay Yes


d20: Mirror
Submirror 0: d21
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 4096575 blocks (2.0 GB)

d21: Submirror of d20
State: Okay
Size: 4096575 blocks (2.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c5t0d0s1 0 No Okay Yes


d10: Mirror
Submirror 0: d11
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 22539195 blocks (10 GB)

d11: Submirror of d10
State: Okay
Size: 22539195 blocks (10 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c5t0d0s0 0 No Okay Yes


Device Relocation Information:
Device Reloc Device ID
c6t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXV85H
c6t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXR0HH
c1t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHWP5AF
c1t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXSRKH
c0t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHWP74F
c0t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXRGUH
c5t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXUN7H
c5t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXR35H
c4t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXT7LH
c4t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXR0JH
c7t2d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXTYTH
c7t6d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHWP7KF
c5t0d0 Yes id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXTZBH


ZFS

pool: pool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t5d0 ONLINE 0 0 0
c6t5d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t1d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t5d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0
c0t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t5d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0

errors: No known data errors

bash-3.00# zfs get all pool/test
NAME PROPERTY VALUE SOURCE
pool/test type filesystem -
pool/test creation Tue Jan 29 11:02 2008 -
pool/test used 50.1G -
pool/test available 2.63T -
pool/test referenced 50.1G -
pool/test compressratio 1.00x -
pool/test mounted yes -
pool/test quota none default
pool/test reservation none default
pool/test recordsize 8K local
pool/test mountpoint /test/zfs local
pool/test sharenfs off default
pool/test checksum on default
pool/test compression off default
pool/test atime on default
pool/test devices on default
pool/test exec on default
pool/test setuid on default
pool/test readonly off default
pool/test zoned off default
pool/test snapdir hidden default
pool/test aclmode groupmask default
pool/test aclinherit secure default
pool/test canmount on default
pool/test shareiscsi off default
pool/test xattr on default

2008-02-11

Live Upgrade - problem with ludelete

A few weeks ago I was trying to do Live Upgrade from Solaris 10 11/06 to 8/07. It went quite well until I decided to delete the old BE:

bash-3.00# uname -a
SunOS mickey 5.10 Generic_127112-07 i86pc i386 i86pc
bash-3.00# cat /etc/release
Solaris 10 8/07 s10x_u4wos_12b X86
Copyright 2007 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 16 August 2007
bash-3.00# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
11-06 yes no no yes -
8-07 yes yes yes no -
bash-3.00# ludelete 11-06
The boot environment <11-06> contains the GRUB menu.
Attempting to relocate the GRUB menu.
/usr/sbin/ludelete: lulib_relocate_grub_slice: not found
ERROR: Cannot relocate the GRUB menu in boot environment <11-06>.
ERROR: Cannot delete boot environment <11-06>.
Unable to delete boot environment.

The only one useful solution was at http://tech.groups.yahoo.com/group/solarisx86/message/44111
Juergen Keil proposed to use lulib from OpenSolaris. Because I didn't have any DVD with OpenSolaris, asked Juergen for copy of lulib and he sent me one. After replacing the original:

bash-3.00# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
11-06 yes no no yes -
8-07 yes yes yes no -
bash-3.00# ludelete 11-06
Determining the devices to be marked free.
Updating boot environment configuration database.
Updating boot environment description database on all BEs.
Updating all boot environment configuration databases.
Updating GRUB menu default setting
Boot environment <11-06> deleted.
bash-3.00# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
8-07 yes yes yes no -


Thanks Juergen !

2008-02-08

Jarod Jenson is coming back ...

Jarod Jenson, the famous Texas Ranger, first DTrace user outside of Sun, author of Java DVM provider , has changed company and now, after a silence, is back with his blog !

2007-10-03

Sunsolve: A deeper look into vmstat statistics using DTrace and mdb

There is an excellent document released two days ago at Sunsolve:
A deeper look into vmstat statistics using DTrace and mdb
The document describes how to drill down into vmstat statistics using DTrace and mdb. There are so many fantastic DTrace and mdb commands that it is really worth looking (like any other DTrace documentation ;-) )
I like especially the mdb command to see swapped out processes:

mdb -k << EOF
::walk thread myvar|::print kthread_t t_schedflag|::grep .==0|::eval p_user.u_comm
EOF

Excellent document !

One should also note that there is (for a long time) similar document: "Using DTrace to understand mpstat and vmstat output"

PS. What a pity that this page is only for customers with Sunsolve account. Is should be available for everyone, especially for Linux advocates ;-)