przemol

2009-07-14

Filesystem Cache Optimization Strategies

There is an excellent blog entry by Brad Diggs where we can read about Solaris filesystem caches used by UFS and ZFS. It is divided into a few parts:


Introduction
Why Use Filesystem Cache
Solaris Filesystem Caches: segmap cache
Solaris Filesystem Caches: Vnode Page Mapping cache
Solaris Filesystem Caches: ZFS Adaptive Replacement Cache (ARC)
Solaris Filesystem Caches: Memory Contention
How Filesystem Cache Can Improve Performance
Establish A Safe Ceiling
UFS Default Caching
ZFS Default Caching
Optimized ZFS Filesystem Caching
Tuning ZFS Cache
Unlock The Governors
Avoid Diluting The Filesystem Cache
Match Data Access Patterns
Consider Disabling vdev Caching
Minimize Application Data Pagesize
Match Average I/O Block Sizes
Consider The Pros and Cons of Cache Flushes
Prime The Filesystem Cache

Really worth reading !

How to change compression algorithm used in zfs

If you want to change the way zfs compresses you data you can use the following example:


# zfs create pool/testfs
# zfs get compression pool/testfs
NAME         PROPERTY     VALUE        SOURCE
pool/testfs  compression  on           inherited from pool
# zfs set compression=gzip pool/testfs
# zfs get compression pool/testfs
NAME         PROPERTY     VALUE        SOURCE
pool/testfs  compression  gzip         local
# zfs set compression=lzjb pool/testfs
# zfs get compression pool/testfs
NAME         PROPERTY     VALUE        SOURCE
pool/testfs  compression  lzjb         local
# zfs set compression=gzip-9 pool/testfs
# zfs get compression pool/testfs
NAME         PROPERTY     VALUE        SOURCE
pool/testfs  compression  gzip-9       local
# zfs set compression=on pool/testfs
# zfs get compression pool/testfs
NAME         PROPERTY     VALUE        SOURCE
pool/testfs  compression  on           local

2009-07-08

VxFS (veritas file system) - how to check and resize intent log size

How to query the current size of the intent log:


# fsadm -F vxfs -L /vxfs/my-vol
UX:vxfs fsadm: INFO: V-3-25669:  logsize=16384 blocks, logvol=""

How to resize it:


# fsadm -F vxfs -o logsize=32768 /vxfs/my-vol
# fsadm -F vxfs -L  /vxfs/my-vol
UX:vxfs fsadm: INFO: V-3-25669:  logsize=32768 blocks, logvol=""

Basically with bigger intent log recovery time is proportionately longer and the file system may consume more system resources (such as memory) during normal operation. But also VxFS performs better with larger log sizes.

2009-07-03

Linux - how to turn on framebuffer during boot process

Just write 'vga=some_number' into the menu.lst file (if you use grub - does anybody use lilo yet ? ;-) ). Example values for some_number:


1280x1024x64k  (vga=794)
1280x1024x256  (vga=775)
1024x768x64k   (vga=791)
1024x768x32k   (vga=790)
1024x768x256   (vga=773)
800x600x64k    (vga=788)
800x600x32k    (vga=787)
800x600x256    (vga=771)

Debian - how to "clone" packages to other server

This is perhaps my first blog entry related to Linux :-). Anyway I have just came across problem of moving one installed Debian (Lenny) to the other server. While rsync is a perfect solution for copying all the personal files I would like to avoid using it to copy all the Debian binaries. There is another, more reliable, way. On the source Debian just run:


host1:~# dpkg --get-selections > /tmp/dpkg.txt
host1:~# head /tmp/dpkg.txt
a2ps                                            install
acpi                                            install
acpi-support                                    install
acpi-support-base                               install
acpid                                           install
adduser                                         install
adobe-flashplugin                               install
adobereader-enu                                 install
akregator                                       install
alien                                           install

Transfer this (dpkg.txt) file to the target system (already installed as a minimal) and run:


host2:~# dpkg --set-selections < /tmp/dpkg.txt
host2:~# apt-get -u dselect-upgrade

And Voila ! - apt-get will download and install all the requested package.

2009-07-01

Prezentation during Storage conference

I got the opportunity to present at the Storage GigaCon conference in Warsaw about month ago. My presentation covered a bit of "introduction" to storage (since my presentation was the first) and later I "switched" to storage performance benchmarking. Of course it would be difficult to not to mention Vdbench and SWAT :-)

I wrote it in Polish and English (I know it is not a recommended way of writing presentations ...) so you can read it at least partially.

Optimizing Postgresql application performance with Solaris dynamic tracing

There is an excellent BluePrints article about DTrace probes in Postgresql. You can download it over here. Using DTrace you can get amazing information about internals of Postgresql, for instance (example taken from this article) having this DTrace script:


# cat query_load.d
#!/usr/sbin/dtrace -qs
dtrace:::BEGIN
{
 printf(“Tracing... Hit Ctrl-C to end.\n”);
}
postgresql*:::query-start
{
 self->query = copyinstr(arg0);
 self->pid = pid;
}
postgresql*:::query-done
{
 @queries[pid, self->query] = count();
}
dtrace:::END
{
 printf(“%5s %s %s\n”, “PID”, “COUNT”, “QUERY”);
 printa(“%6d %@5d %s\n”, @queries);
}

we get:


PID COUNT QUERY
1221 154 UPDATE tellers SET tbalance = tbalance + -487 WHERE tid = 25;
1221 204 UPDATE tellers SET tbalance = tbalance + 1051 WHERE tid = 42;
1220 215 UPDATE accounts SET abalance = abalance + -4302 WHERE aid = 144958;
1220 227 UPDATE accounts SET abalance = abalance + 2641 WHERE aid = 441283;

Isn't it amazing ?

I don't know if Larry Ellison is aware of DTrace but I wish I had the same DTrace probes in Oracle ...

2009-05-19

Veritas Volume Manager - display DMP IO statistics

If you are Veritas Volume Manager user you can use DMP to configure path failover and load balancing using many links (data paths) to storage. vxdmpadm is the command to do all the tasks. Just recently I had to check if data goes along configured links. You can achive this using the same vxdmpadm.

To enable the gathering of statistics use this command:


vxdmpadm iostat start

To reset the IO counters:


vxdmpadm iostat reset

To display the current statistics:


vxdmpadm -z iostat show all interval=60
                       cpu usage = 3037899us    per cpu memory = 32768b
                                   OPERATIONS            BLOCKS          AVG TIME(ms)
PATHNAME             READS    WRITES     READS    WRITES     READS    WRITES
c2t50060E80102C7770d16s2          99     43566      1584   1419358      0.55      0.03
c4t50060E80102C7772d16s2         101     37584      1616   1366880      0.74      0.17
c2t50060E80102C7770d21s2        2192      1623     87424     29072      0.11      0.04
c4t50060E80102C7772d21s2        1041      1059     46016     19840      0.16      0.24
...

The first output shows statistics from server boot. Next listings shows current data:


c2t50060E80102C7770d21s2           1         2        16        32      0.39      0.03
c4t50060E80102C7772d21s2           5         0        80         0      0.72      0.00
c2t50060E80102C7770d57s2           2         4        32       112      0.50      0.19
c4t50060E80102C7772d57s2           4         2        64        80      0.03      0.02
...

To disable the gathering of statistics use this command:


vxdmpadm iostat stop

2009-02-27

Insights into ZFS - The nature of writing things

There is an excellent article about nature of ZFS writing.
You can read it over here.
Highly recommended to read !

2009-02-16

DTrace CPU Performance Counter provider

Jonathan Haslam described DTrace CPU Performance Counter provider which "... gives you the ability
to profile your system by many different types of processor related events; the list of
events is processor specific and usually quite large but typically includes events such
as cycles executed, instructions executed, cache misses, TLB misses and many more ... ".
All the information are here, here and here.

2009-02-05

Optimizing MySQL Database Application Performance with Solaris Dynamic Tracing (DTrace)

There is fantastic white paper how taking advantage of Solaris Dynamic Tracing (DTrace) probes can help simplify MySQL database application tuning. The content of this white paper:

Introduction

Approaching MySQL Database Application Tuning

The Advantages of Solaris Dynamic Tracing

Simplifying and Speeding Performance Tuning Efforts

Analyzing Query Loads

Probing the Cost of File Sort Operations

Profiling the Use of Stored Procedures

Observing Slave Queries

Optimizing Use of the MySQL Database Query Cache

Putting it all Together

For More Information

About the Author

Related Resources

Ordering Sun Documents

Accessing Sun Documentation Online

It is downloadble over here.

2009-01-27

Parallel bzip in CMT (multicore) environment

We have a lot of archive logs from our Oracle databases. To save storage space we compress them. Of course this process (single threaded) on CMT-based CPUs takes quite long time. To use power of these CPUs we should parallelize all (if possible) tasks. Pbzip2 befriends us - it is a parallel implementation of the bzip2. This is an example:


rudy:/tmp/l# ls -alh
total 4194336
drwxr-xr-x   2 root     root         233 Jan 27 16:35 .
drwxrwxrwt   3 root     sys          286 Jan 27 16:30 ..
-rw-r-----   1 root     root        1.0G Jan 27 16:35 1
-rw-r-----   1 root     root        1.0G Jan 27 15:41 2
rudy:/tmp/l# time bzip2 1

real    13m4.397s
user    12m59.987s
sys     0m4.310s
rudy:/tmp/l# time /opt/csw/bin/pbzip2 2

real    0m34.842s
user    34m46.095s
sys     0m7.972s
rudy:/tmp/l# time /opt/csw/bin/pbzip2 -l 2

real    0m41.907s
user    28m52.137s
sys     0m6.302s

We can see that the difference is tremendous !!!

The '-l' parameter determines max number processors to use (calculation based on load average). This server is quite idle so more heavy workload might lengthten the time.

2008-09-30

Solaris support for Rock processor

While looking for some information I came across the following link :-)

2008-07-31

swat & vdbench - excellent couple

There has been anouncement about SWAT - another tool helpful in performing benchmarking tests. Together with Vdbench we get almost the perfect couple for playing with storage benchmarking.

2008-06-08

vdbench - disk I/O workload generator

There are several tools which allow to test performance of filesystems like Iozone, Bonnie, FileBench. There is also one unknown widely called vdbench. It is disk I/O workload generator written by Sun's employee, Henk Vandenbergh and used internally at Sun and its customers.

How does it differ from the others tools ? We'll see ... :-)

Vdbench, besides classical CLI has also GUI which simplifies its usage. However I am going to show how to use it via CLI. After downloading one need to install it. Vdbench has optional, -t, parameter which allows to specify target directory:


kruk:/tmp/vdbench# ./install_unix.sh -t /opt/vdbench
Sun Microsystems, Inc. ("Sun") ENTITLEMENT for SOFTWARE

Licensee/Company: Entity receiving Software.

Efective Date:  Date Sun delivers the Software to You.

Software:   Sun StorageTek vdbench 4.07

[...]

Please contact Sun Microsystems, Inc.  4150 Network Circle, Santa
Clara, California 95054 if you have questions.




Accept User License Agreement (yes/no): yes
08:51:47.887 plog(): execute(): tar -tf /tmp/vdbench/vdbench407.tar

********************************************************************************
[...]

Tool will expire on: sobota, kwiecień 25 2009, 23:27:38


********************************************************************************



Tool installation to /opt/vdbench successful

Now its time to write parameter file with description of needed workload. There are same examples within vdbench directory (example*). But let's begin with the following, simple one:


sd=sd1,lun=/vdb/file1
sd=sd2,lun=/vdb/file2
sd=sd3,lun=/vdb/file3
sd=sd4,lun=/vdb/file4

wd=rg-1,sd=sd*,rdpct=70,rhpct=0,whpct=0,xfersize=8k,seekpct=70

rd=rd_rg-1,wd=rg-1,interval=1,iorate=max,elapsed=30,forthreads=(64)

How to interpret that file ? For example:
sd - Storage Definition (where the I/O is going to/from - storage, disks, files)
wd - Workload Definition (precise definition of workload) - some explanations:

rdpct - read percentage; 70 means that 70% of time is spent on reading and the rest, 30%, on writing

xfersize - size of each I/O

seekpct - percentage of random seeks

rd - Run Definition (in general - which and for how long run Workload Definition)
Of course we need to create needed files:


kruk:/root# cd /opt/vdbench/
kruk:/opt/vdbench# mkdir /vdb
kruk:/opt/vdbench# mkfile 100m /vdb/file1
kruk:/opt/vdbench# mkfile 100m /vdb/file2
kruk:/opt/vdbench# mkfile 100m /vdb/file3
kruk:/opt/vdbench# mkfile 100m /vdb/file4

What I like in this tool is the ability to show IOPS in each second of test what gives excellent view of tested environment. Let's see the example:


kruk:/opt/vdbench# ./vdbench -f my-parm.cfg
[...]
              interval        i/o   MB/sec   bytes   read     resp     resp     resp    cpu%  cpu%
                             rate  1024**2     i/o    pct     time      max   stddev sys+usr   sys
14:50:41.208         1    1770,85    13,83    8192  79,48  104,913  774,921  170,206    20,8  10,5
14:50:42.097         2    1600,90    12,51    8192  67,65  159,721  961,760  225,067    20,0  11,8
14:50:43.103         3    1302,75    10,18    8192  68,05  184,123 1439,223  262,792    13,7   8,0
14:50:44.085         4    1112,86     8,69    8192  68,20  219,451 1954,038  315,391    12,6   7,1
14:50:45.080         5    1210,84     9,46    8192  68,83  220,502 1511,942  322,902    12,0   7,8
14:50:46.078         6    1192,25     9,31    8192  70,59  213,559 1474,794  318,486    11,7   7,2
14:50:47.081         7     899,99     7,03    8192  70,48  253,603 1654,079  378,854    10,3   6,0
14:50:48.058         8    1251,91     9,78    8192  71,13  219,671 1831,191  340,373    11,8   7,5
14:50:49.049         9    1004,77     7,85    8192  68,77  251,295 1668,598  364,461    10,0   6,7
14:50:50.049        10    1124,07     8,78    8192  68,28  229,389 1804,713  329,042    10,5   6,5
14:50:51.047        11    1099,94     8,59    8192  68,47  236,588 1699,419  344,097    10,0   6,7
14:50:52.040        12     629,53     4,92    8192  69,45  265,482 1742,241  374,407     7,7   4,7
14:50:53.043        13    1042,71     8,15    8192  69,05  344,049 2308,431  532,435     8,8   6,0
14:50:54.042        14    1452,97    11,35    8192  68,11  174,344 2086,119  251,800    11,8   8,8
14:50:55.075        15    1175,48     9,18    8192  69,67  212,452 1504,912  312,161     9,8   6,8
14:50:56.046        16    1048,33     8,19    8192  68,92  227,462 1595,952  325,352    10,8   7,0
14:50:57.047        17     881,26     6,88    8192  68,19  264,582 1160,291  365,083     9,2   5,7
14:50:58.068        18    1023,81     8,00    8192  71,98  282,999 1757,541  435,796    12,3   6,5
14:50:59.056        19    1076,79     8,41    8192  71,47  218,663 1339,287  339,072    11,8   7,8
14:51:00.044        20    1177,03     9,20    8192  68,99  231,724 1469,797  339,442    11,2   7,2
14:51:01.037        21    1136,34     8,88    8192  72,33  221,309 1807,646  343,556    10,3   6,8
14:51:02.043        22     655,37     5,12    8192  69,91  279,581 1286,680  402,433     6,5   4,5
14:51:03.040        23    1022,76     7,99    8192  70,30  336,784 1886,443  511,220     9,8   6,3
14:51:04.047        24    1322,93    10,34    8192  70,74  198,405 1732,970  295,461    11,8   8,0
14:51:05.049        25    1028,04     8,03    8192  68,67  228,957 1436,083  321,115     9,7   6,7
14:51:06.039        26    1075,99     8,41    8192  69,78  241,907 1642,574  349,677     9,5   6,3
14:51:07.046        27     908,38     7,10    8192  70,23  248,514 1661,609  363,362     9,0   6,0
14:51:08.044        28     827,90     6,47    8192  70,35  331,738 1898,630  486,697     7,8   5,3
14:51:09.046        29    1208,92     9,44    8192  68,49  225,420 1581,279  321,723    10,9   7,2
14:51:10.042        30    1025,04     8,01    8192  70,23  239,669 1502,131  347,456     9,3   6,5
14:51:10.055  avg_2-30    1086,79     8,49    8192  69,51  234,502 2308,431  353,760    10,7   6,9

iostat during the run:


                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  395.0  178.0 3160.0 1424.0  0.0  2.6    0.0    4.6   0  92 c1
  395.0  178.0 3160.0 1424.0  0.0  2.6    0.0    4.6   1  92 c1t50060E8000444540d4
  394.0  173.0 3152.1 1384.1  0.0  3.3    0.0    5.7   0  99 c3
  394.0  173.0 3152.2 1384.1  0.0  3.3    0.0    5.7   1  99 c3t50060E8000444542d4
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  484.1  188.0 3872.8 1504.3  0.0  3.0    0.0    4.5   0  98 c1
  484.1  188.0 3872.8 1504.3  0.0  3.0    0.0    4.5   1  98 c1t50060E8000444540d4
  418.1  177.0 3344.7 1416.3  0.0  3.0    0.0    5.0   0  97 c3
  418.1  177.0 3344.6 1416.3  0.0  3.0    0.0    5.0   1  97 c3t50060E8000444542d4

In OLTP environments the column i/o rate is one of the most important while in DSS world MB/sec is probably more important. "Ok", you say, "you have stable io rate. Nothing really interesting." Are you sure ? Can you show me another, non-commercial tool which shows the same: io rate in each second ? How about this example:


              interval        i/o   MB/sec   bytes   read     resp     resp     resp
                             rate  1024**2     i/o    pct     time      max   stddev
15:04:15.746         1   27558,72   215,30    8192  69,72    1,089  338,684   10,493
15:04:16.465         2   42955,58   335,59    8192  69,66     ,219  903,236    5,504
15:04:17.170         3   45265,34   353,64    8192  69,63     ,132  384,352    3,121
15:04:17.692         4   41596,08   324,97    8192  69,99     ,119  148,012    2,091
15:04:18.322         5   38974,35   304,49    8192  69,98     ,210  570,820    5,815
15:04:19.385         6   22774,35   177,92    8192  69,32    1,830 1040,743   22,121
15:04:20.565         7   28552,81   223,07    8192  69,48    1,651 1539,927   28,605
15:04:21.079         8   38379,22   299,84    8192  69,30     ,181  340,999    3,756
15:04:22.159         9   40825,13   318,95    8192  70,07     ,134  319,363    2,655
15:04:23.187        10   37296,62   291,38    8192  69,91     ,161  186,869    2,589
15:04:24.334        11   23630,10   184,61    8192  70,29     ,894  352,045    6,997
15:04:25.087        12   24689,14   192,88    8192  70,14     ,192  604,120    7,659
15:04:26.334        13   33136,76   258,88    8192  70,22     ,094   66,819     ,870
15:04:27.046        14   40521,35   316,57    8192  69,89     ,128  348,961    2,395
15:04:29.072        15   35763,25   279,40    8192  69,60     ,394  264,676    4,571
15:04:29.290        16   28473,31   222,45    8192  69,65    1,433  730,685   14,482
15:04:30.206        17   31505,71   246,14    8192  69,62     ,979  914,484   15,763
15:04:31.259        18   40196,17   314,03    8192  69,67     ,125  234,910    1,692
15:04:32.116        19   35458,59   277,02    8192  69,77     ,116  138,638    1,540
15:04:33.249        20   43064,86   336,44    8192  69,97     ,128  390,286    2,813
15:04:34.234        21   24776,40   193,57    8192  70,12    1,947  231,994   12,754
15:04:35.554        22   35134,31   274,49    8192  69,80     ,451  703,214    6,932
15:04:36.261        23   33361,52   260,64    8192  69,50     ,444  898,570   11,638
15:04:37.557        24   38474,73   300,58    8192  70,05     ,225  322,527    4,602
15:04:38.234        25   41275,28   322,46    8192  69,74     ,170  206,164    2,687
15:04:39.097        26   22927,25   179,12    8192  70,44    1,345  665,988   11,805
15:04:40.258        27   32228,28   251,78    8192  70,14     ,540  752,539    9,812
15:04:41.296        28   39111,23   305,56    8192  70,15     ,196  271,594    3,369
15:04:42.200        29   41695,78   325,75    8192  69,74     ,210  282,547    3,677
15:04:43.110        30   46167,28   360,68    8192  69,80     ,156  356,567    2,871
15:04:43.131  avg_2-30   35532,33   277,60    8192  69,84     ,420 1539,927    8,438

And iostat:


                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  827.7    0.0 105947.0  0.0 34.9    0.0   42.2   0 100 c1
    0.0  827.7    0.0 105944.1  0.0 34.9    0.0   42.2   1 100 c1t50060E8000444540d3
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  439.1    0.0 49615.6  0.0 12.1    0.0   27.5   0  45 c1
    0.0  439.1    0.0 49615.6  0.0 12.1    0.0   27.5   1  45 c1t50060E8000444540d3
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  919.3    0.0 116434.0  0.0 26.9    0.0   29.3   0  85 c1
    0.0  919.3    0.0 116434.2  0.0 26.9    0.0   29.3   1  85 c1t50060E8000444540d3

Quite interesting, isn't it ? "What did you change to get so high io rate", you might ask ? Hmmm, I am not sure if I want to reveal ... ;-)

Seriously: the first test was on Veritas File System while the second one on ZFS. Does it mean that ZFS is faster then VxFS ? I don't think so. But using vdbench you can watch two different nature of mentioned filesystems. If you look closer at ZFS you will see that there is no reads from filesystem. Why ? Vdbench parameter "rdpct=70" forces to generate 70% reads and 30% writes.
It is because intensive kernel memory usage in ZFS.

Another example: have a look at first vdbench output. It is interesting that io rate goes up and down. I tracked it down earlier and revealed that hardware array, which I had put the filesystems on, has some bizarre behaviour using RAID-5.

Anyway you can see that without any sophisticated (and expansive) tool you are able to bring to light new, performance related information about your storage and filesystem.

Vdbench was released to the public a few weeks ago so we can download it over here.

2008-02-17

Solaris IO tuning: monitoring disk queue - lack of knowledge

A few months ago we faced some performance problems on one of our servers. There is One Very Well Known Big Database (OVWKBD - complicated, isn't it ? ;-) ) running on it. One end user reported us that there are some hangs during office hours. We (me and one of ours DBA who is responsible for the OVWKBD) were surprised since it had never happened before (or, as I assume, nobody had told us before) and began investigation. After a few days our DBA pointed out that it may be related to redo logs writing (which were, as the rest of database, on SAN-attached disk array). In fact he was sure that it is the problem but when I asked for any proof he didn't deliver any. He insisted on changing the array configuration but since I don't like to proceed blindly, I was going to monitor io (disk) queue(s) before any significant change. I set /etc/system according to docs and the ssd_max_throttle variable is the key setting. But still wanted to be more aware what is going on at the disk queue level and setup ssd_max_throttle according to _real_ needs. When I started hunting high and low I realized that the disk queue (which is after all one of the key areas of IO tuning) is poorly documented so I began desperately seeking any knowledge. Neither Sunsolve nor docs.sun.com helped me. One page, http://wikis.sun.com/display/StorageDev/The+Solaris+OS+Queue+Throttle, gave me a tiny piece of information but still ...

Oh, God ! I don't believe that there is no docs about it !

I tried Opensolaris mailing lists but still no really useful information. In one e-mail, Robert Miłkowski, suggested to use scsi.d DTrace script. I had used it before but it hadn't helped me. Maybe I wasn't able to read it properly ? Well, at least I could try it again:


00000.868602160 fp2:-> 0x2a WRITE(10) address 2287:76, lba 0x00897a89, len 0x000002, control 0x00 timeout 60 CDBP 6002064db9c RDBMS(25751) cdb(10) 2a0000897a8900000200
00000.890551520 fp2:-> 0x28  READ(10) address 2287:76, lba 0x00a877f5, len 0x000001, control 0x00 timeout 60 CDBP 3000e37cde4 RDBMS(23974) cdb(10) 280000a877f500000100
00000.584069360 fp2:-> 0x2a WRITE(10) address 2287:66, lba 0x019c01f8, len 0x00013d, control 0x00 timeout 60 CDBP 60026a1e4d4 RDBMS(18244) cdb(10) 2a00019c01f800013d00
00000.239667600 fp2:-> 0x2a WRITE(10) address 2287:18, lba 0x00d93c20, len 0x000010, control 0x00 timeout 60 CDBP 300270fadec RDBMS(25753) cdb(10) 2a0000d93c2000001000
00000.958698480 fp2:-> 0x2a WRITE(10) address 2287:08, lba 0x00001f10, len 0x000010, control 0x00 timeout 60 CDBP 30025654084 sched(0) cdb(10) 2a0000001f1000001000
00000.240042160 fp2:<- 0x2a WRITE(10) address 2287:16, lba 0x01d4889c, len 0x000010, control 0x00 timeout 60 CDBP 60019209b84, reason 0x0 (COMPLETED) state 0x1f Time 820us
00000.240213360 fp2:<- 0x2a WRITE(10) address 2287:15, lba 0x0274b920, len 0x000010, control 0x00 timeout 60 CDBP 6002064c4f4, reason 0x0 (COMPLETED) state 0x1f Time 912us
00000.240311200 fp2:<- 0x2a WRITE(10) address 2287:18, lba 0x00d93c20, len 0x000010, control 0x00 timeout 60 CDBP 300270fadec, reason 0x0 (COMPLETED) state 0x1f Time 730us
00000.585352960 fp2:<- 0x2a WRITE(10) address 2287:67, lba 0x004a5ef8, len 0x00013d, control 0x00 timeout 60 CDBP 3003bb1add4, reason 0x0 (COMPLETED) state 0x1f Time 1390us
00000.586121680 fp2:<- 0x2a WRITE(10) address 2287:66, lba 0x019c01f8, len 0x00013d, control 0x00 timeout 60 CDBP 60026a1e4d4, reason 0x0 (COMPLETED) state 0x1f Time 2136us
00000.868869200 fp2:<- 0x2a WRITE(10) address 2287:17, lba 0x005ca80b, len 0x000002, control 0x00 timeout 60 CDBP 30053138df4, reason 0x0 (COMPLETED) state 0x1f Time 404us
00000.869025920 fp2:<- 0x2a WRITE(10) address 2287:76, lba 0x00897a89, len 0x000002, control 0x00 timeout 60 CDBP 6002064db9c, reason 0x0 (COMPLETED) state 0x1f Time 501us
00000.889036480 fp2:-> 0x28  READ(10) address 2287:76, lba 0x00a879d9, len 0x000001, control 0x00 timeout 60 CDBP 6002064db9c RDBMS(23974) cdb(10) 280000a879d900000100
00000.889377200 fp2:<- 0x28  READ(10) address 2287:76, lba 0x00a879d9, len 0x000001, control 0x00 timeout 60 CDBP 6002064db9c, reason 0x0 (COMPLETED) state 0x1f Time 409us
00000.890777520 fp2:<- 0x28  READ(10) address 2287:76, lba 0x00a877f5, len 0x000001, control 0x00 timeout 60 CDBP 3000e37cde4, reason 0x0 (COMPLETED) state 0x1f Time 267us
00000.959244800 fp2:<- 0x2a WRITE(10) address 2287:08, lba 0x00001f10, len 0x000010, control 0x00 timeout 60 CDBP 30025654084, reason 0x0 (COMPLETED) state 0x1f Time 642us
00000.239373680 fp2:-> 0x2a WRITE(10) address 2287:16, lba 0x01d4889c, len 0x000010, control 0x00 timeout 60 CDBP 60019209b84 RDBMS(25753) cdb(10) 2a0001d4889c00001000
00000.868509120 fp2:-> 0x2a WRITE(10) address 2287:17, lba 0x005ca80b, len 0x000002, control 0x00 timeout 60 CDBP 30053138df4 RDBMS(25751) cdb(10) 2a00005ca80b00000200
00000.239401200 fp2:-> 0x2a WRITE(10) address 2287:15, lba 0x0274b920, len 0x000010, control 0x00 timeout 60 CDBP 6002064c4f4 RDBMS(25753) cdb(10) 2a000274b92000001000
00000.584010640 fp2:-> 0x2a WRITE(10) address 2287:67, lba 0x004a5ef8, len 0x00013d, control 0x00 timeout 60 CDBP 3003bb1add4 RDBMS(18244) cdb(10) 2a00004a5ef800013d00

Still no joy :-(. I become pessimist about finding any answer to my questions.
Next day I couldn't stop thinking about it and become down in the dumps ...

Suddenly, wait a minute ! Have I checked who wrote the scsi.d script ?!?!? No ! Let's quickly find out ! Maybe this is the way I can find any answer ?!?!?! The begin of the script says:


...
/*
 * Chris.Gerhard@sun.com
 * Joel.Buckley@sun.com
 */

#pragma ident   "@(#)scsi.d     1.12    07/03/16 SMI"
...

Hope got back. ;-)

I know that hope often blinks at a fool but you understand me ? Yes, you do ! Thanks ! ;-)

Let's see if these guys would help me. I sent them an e-mail without (well, almost ...) any belief that it would work ... and a few hours later Chris answered ! I still didn't believe it while reading his e-mail ! To make it not so easy, Chris offered a "deal": if I describe him my problem he will answer it via his blog. And that is how
http://blogs.sun.com/chrisg/entry/latency_bubble_in_your_io
was born ...
More such deals ! ;-)

2008-02-16

ZFS vs VxFS vs UFS on x4500 (Thumper - JBOD)

A few months ago I compared performance of the above filesystems using filebench. Since then a few things changed:

it is Solaris 10 8/07 available now (compared to Solaris 10 11/06 used during the previous test). Thanks to fabulous pca tool all (really all !) patches were installed.
there is a new 1.1 filebench released
there is Veritas Storage Foundation Basic v5.0 for x64 (the last available version used to be 4.1)

A few words about the last option: VSF Basic is free version of commercial VSF but, according to Symantec site, limited to 4 user-data volumes, and/or 4 user-data file systems, and/or 2 processor sockets in a single physical system. So x4500 (aka Thumper) is within the limitations.
I decided to test RAID 1+0 under OLTP (8k, no-cached) workload. Since x4500 has 48 SATA disks I divided them into 3 sets, one for each filesystem: VxFS/VxVM, ZFS and UFS. Hard Drive Monitor Utility (HD Tool) allows to draw ASCII map of the internal drive layout:


---------------------SunFireX4500------Rear----------------------------

36:   37:   38:   39:   40:   41:   42:   43:   44:   45:   46:   47:
c5t3  c5t7  c4t3  c4t7  c7t3  c7t7  c6t3  c6t7  c1t3  c1t7  c0t3  c0t7      <-VxFS
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
24:   25:   26:   27:   28:   29:   30:   31:   32:   33:   34:   35:
c5t2  c5t6  c4t2  c4t6  c7t2  c7t6  c6t2  c6t6  c1t2  c1t6  c0t2  c0t6      <- UFS
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
12:   13:   14:   15:   16:   17:   18:   19:   20:   21:   22:   23:
c5t1  c5t5  c4t1  c4t5  c7t1  c7t5  c6t1  c6t5  c1t1  c1t5  c0t1  c0t5      <- ZFS
^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
 0:    1:    2:    3:    4:    5:    6:    7:    8:    9:   10:   11:
c5t0  c5t4  c4t0  c4t4  c7t0  c7t4  c6t0  c6t4  c1t0  c1t4  c0t0  c0t4
^b+   ^b+   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++
-------*-----------*-SunFireX4500--*---Front-----*-----------*----------

Each filesystem was mounted in its directory. The follow filebench confguration was used in testing:


DEFAULTS {
     runtime = 60;
     dir = "directory where each filesystem was mounted";
     stats = /tmp;
     filesystem = "zfs|ufs|vxfs";
     description = "oltp zfs|ufs|vxfs";
}
CONFIG oltp_8k_uncached {
     personality = oltp;
     function = generic;
     cached = 0;
     directio = 1;
     iosize = 8k;
     nshadows = 200;
     ndbwriters = 10;
     usermode = 20000;
     filesize = 5g;
     nfiles = 10;
     memperthread = 1m;
     workingset = 0;
}

Below are results:

A few observations:

compared to the previous benchmark we can see big improvements in ZFS area (but of course the environment, jbod instead of SCSI array can influence the results)

VxFS is still the winner. But typical configuration of RAID 1+0 is not faster then ZFS. Only 6-cols configuration wins with ZFS.

All the filesystem configurations are below:
VxVM/VxFS 2-cols


Disk group: vxgroup

DG NAME         NCONFIG      NLOG     MINORS   GROUP-ID
ST NAME         STATE        DM_CNT   SPARE_CNT         APPVOL_CNT
DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE
RV NAME         RLINK_CNT    KSTATE   STATE    PRIMARY  DATAVOLS  SRL
RL NAME         RVG          KSTATE   STATE    REM_HOST REM_DG    REM_RLNK
CO NAME         CACHEVOL     KSTATE   STATE
VT NAME         RVG          KSTATE   STATE    NVOLUME
V  NAME         RVG/VSET/CO  KSTATE   STATE    LENGTH   READPOL   PREFPLEX UTYPE
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
SV NAME         PLEX         VOLNAME  NVOLLAYR LENGTH   [COL/]OFF AM/NM    MODE
SC NAME         PLEX         CACHE    DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
DC NAME         PARENTVOL    LOGVOL
SP NAME         SNAPVOL      DCO
EX NAME         ASSOC        VC                       PERMS    MODE     STATE
SR NAME         KSTATE

dg vxgroup      default      default  105000   1201598836.80.mickey

dm c0t3d0       c0t3d0       auto     65532    976691152 -
dm c0t7d0       c0t7d0       auto     65532    976691152 -
dm c1t3d0       c1t3d0       auto     65532    976691152 -
dm c1t7d0       c1t7d0       auto     65532    976691152 -
dm c4t3d0       c4t3d0       auto     65532    976691152 -
dm c4t7d0       c4t7d0       auto     65532    976691152 -
dm c5t3d0       c5t3d0       auto     65532    976691152 -
dm c5t7d0       c5t7d0       auto     65532    976691152 -
dm c6t3d0       c6t3d0       auto     65532    976691152 -
dm c6t7d0       c6t7d0       auto     65532    976691152 -
dm c7t3d0       c7t3d0       auto     65532    976691152 -
dm c7t7d0       c7t7d0       auto     65532    976691152 -

v  vx-vol       -            ENABLED  ACTIVE   5860145152 SELECT  vx-vol-03 fsgen
pl vx-vol-03    vx-vol       ENABLED  ACTIVE   5860145152 STRIPE  2/32     RW
sv vx-vol-S01   vx-vol-03    vx-vol-L01 1      976691152 0/0      2/2      ENA
sv vx-vol-S02   vx-vol-03    vx-vol-L02 1      976691152 0/976691152 2/2   ENA
sv vx-vol-S03   vx-vol-03    vx-vol-L03 1      976690272 0/1953382304 2/2  ENA
sv vx-vol-S04   vx-vol-03    vx-vol-L04 1      976691152 1/0      2/2      ENA
sv vx-vol-S05   vx-vol-03    vx-vol-L05 1      976691152 1/976691152 2/2   ENA
sv vx-vol-S06   vx-vol-03    vx-vol-L06 1      976690272 1/1953382304 2/2  ENA

v  vx-vol-L01   -            ENABLED  ACTIVE   976691152 SELECT   -        fsgen
pl vx-vol-P01   vx-vol-L01   ENABLED  ACTIVE   976691152 CONCAT   -        RW
sd c0t3d0-02    vx-vol-P01   c0t3d0   0        976691152 0        c0t3d0   ENA
pl vx-vol-P02   vx-vol-L01   ENABLED  ACTIVE   976691152 CONCAT   -        RW
sd c1t3d0-02    vx-vol-P02   c1t3d0   0        976691152 0        c1t3d0   ENA

v  vx-vol-L02   -            ENABLED  ACTIVE   976691152 SELECT   -        fsgen
pl vx-vol-P03   vx-vol-L02   ENABLED  ACTIVE   976691152 CONCAT   -        RW
sd c4t3d0-02    vx-vol-P03   c4t3d0   0        976691152 0        c4t3d0   ENA
pl vx-vol-P04   vx-vol-L02   ENABLED  ACTIVE   976691152 CONCAT   -        RW
sd c5t3d0-02    vx-vol-P04   c5t3d0   0        976691152 0        c5t3d0   ENA

v  vx-vol-L03   -            ENABLED  ACTIVE   976690272 SELECT   -        fsgen
pl vx-vol-P05   vx-vol-L03   ENABLED  ACTIVE   976690272 CONCAT   -        RW
sd c6t3d0-02    vx-vol-P05   c6t3d0   0        976690272 0        c6t3d0   ENA
pl vx-vol-P06   vx-vol-L03   ENABLED  ACTIVE   976690272 CONCAT   -        RW
sd c7t3d0-02    vx-vol-P06   c7t3d0   0        976690272 0        c7t3d0   ENA

v  vx-vol-L04   -            ENABLED  ACTIVE   976691152 SELECT   -        fsgen
pl vx-vol-P07   vx-vol-L04   ENABLED  ACTIVE   976691152 CONCAT   -        RW
sd c0t7d0-02    vx-vol-P07   c0t7d0   0        976691152 0        c0t7d0   ENA
pl vx-vol-P08   vx-vol-L04   ENABLED  ACTIVE   976691152 CONCAT   -        RW
sd c1t7d0-02    vx-vol-P08   c1t7d0   0        976691152 0        c1t7d0   ENA

v  vx-vol-L05   -            ENABLED  ACTIVE   976691152 SELECT   -        fsgen
pl vx-vol-P09   vx-vol-L05   ENABLED  ACTIVE   976691152 CONCAT   -        RW
sd c4t7d0-02    vx-vol-P09   c4t7d0   0        976691152 0        c4t7d0   ENA
pl vx-vol-P10   vx-vol-L05   ENABLED  ACTIVE   976691152 CONCAT   -        RW
sd c5t7d0-02    vx-vol-P10   c5t7d0   0        976691152 0        c5t7d0   ENA

v  vx-vol-L06   -            ENABLED  ACTIVE   976690272 SELECT   -        fsgen
pl vx-vol-P11   vx-vol-L06   ENABLED  ACTIVE   976690272 CONCAT   -        RW
sd c6t7d0-02    vx-vol-P11   c6t7d0   0        976690272 0        c6t7d0   ENA
pl vx-vol-P12   vx-vol-L06   ENABLED  ACTIVE   976690272 CONCAT   -        RW
sd c7t7d0-02    vx-vol-P12   c7t7d0   0        976690272 0        c7t7d0   ENA

VxVM/VxFS 6-cols


Disk group: vxgroup

DG NAME         NCONFIG      NLOG     MINORS   GROUP-ID
ST NAME         STATE        DM_CNT   SPARE_CNT         APPVOL_CNT
DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE
RV NAME         RLINK_CNT    KSTATE   STATE    PRIMARY  DATAVOLS  SRL
RL NAME         RVG          KSTATE   STATE    REM_HOST REM_DG    REM_RLNK
CO NAME         CACHEVOL     KSTATE   STATE
VT NAME         RVG          KSTATE   STATE    NVOLUME
V  NAME         RVG/VSET/CO  KSTATE   STATE    LENGTH   READPOL   PREFPLEX UTYPE
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
SV NAME         PLEX         VOLNAME  NVOLLAYR LENGTH   [COL/]OFF AM/NM    MODE
SC NAME         PLEX         CACHE    DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
DC NAME         PARENTVOL    LOGVOL
SP NAME         SNAPVOL      DCO
EX NAME         ASSOC        VC                       PERMS    MODE     STATE
SR NAME         KSTATE

dg vxgroup      default      default  105000   1201598836.80.mickey

dm c0t3d0       c0t3d0       auto     65532    976691152 -
dm c0t7d0       c0t7d0       auto     65532    976691152 -
dm c1t3d0       c1t3d0       auto     65532    976691152 -
dm c1t7d0       c1t7d0       auto     65532    976691152 -
dm c4t3d0       c4t3d0       auto     65532    976691152 -
dm c4t7d0       c4t7d0       auto     65532    976691152 -
dm c5t3d0       c5t3d0       auto     65532    976691152 -
dm c5t7d0       c5t7d0       auto     65532    976691152 -
dm c6t3d0       c6t3d0       auto     65532    976691152 -
dm c6t7d0       c6t7d0       auto     65532    976691152 -
dm c7t3d0       c7t3d0       auto     65532    976691152 -
dm c7t7d0       c7t7d0       auto     65532    976691152 -

v  vx-vol       -            ENABLED  ACTIVE   5860145152 SELECT  vx-vol-03 fsgen
pl vx-vol-03    vx-vol       ENABLED  ACTIVE   5860145280 STRIPE  6/32     RW
sv vx-vol-S01   vx-vol-03    vx-vol-L01 1      976690880 0/0      2/2      ENA
sv vx-vol-S02   vx-vol-03    vx-vol-L02 1      976690880 1/0      2/2      ENA
sv vx-vol-S03   vx-vol-03    vx-vol-L03 1      976690880 2/0      2/2      ENA
sv vx-vol-S04   vx-vol-03    vx-vol-L04 1      976690880 3/0      2/2      ENA
sv vx-vol-S05   vx-vol-03    vx-vol-L05 1      976690880 4/0      2/2      ENA
sv vx-vol-S06   vx-vol-03    vx-vol-L06 1      976690880 5/0      2/2      ENA

v  vx-vol-L01   -            ENABLED  ACTIVE   976690880 SELECT   -        fsgen
pl vx-vol-P01   vx-vol-L01   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c0t3d0-02    vx-vol-P01   c0t3d0   0        976690880 0        c0t3d0   ENA
pl vx-vol-P02   vx-vol-L01   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c5t3d0-02    vx-vol-P02   c5t3d0   0        976690880 0        c5t3d0   ENA

v  vx-vol-L02   -            ENABLED  ACTIVE   976690880 SELECT   -        fsgen
pl vx-vol-P03   vx-vol-L02   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c0t7d0-02    vx-vol-P03   c0t7d0   0        976690880 0        c0t7d0   ENA
pl vx-vol-P04   vx-vol-L02   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c5t7d0-02    vx-vol-P04   c5t7d0   0        976690880 0        c5t7d0   ENA

v  vx-vol-L03   -            ENABLED  ACTIVE   976690880 SELECT   -        fsgen
pl vx-vol-P05   vx-vol-L03   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c1t3d0-02    vx-vol-P05   c1t3d0   0        976690880 0        c1t3d0   ENA
pl vx-vol-P06   vx-vol-L03   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c6t3d0-02    vx-vol-P06   c6t3d0   0        976690880 0        c6t3d0   ENA

v  vx-vol-L04   -            ENABLED  ACTIVE   976690880 SELECT   -        fsgen
pl vx-vol-P07   vx-vol-L04   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c1t7d0-02    vx-vol-P07   c1t7d0   0        976690880 0        c1t7d0   ENA
pl vx-vol-P08   vx-vol-L04   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c6t7d0-02    vx-vol-P08   c6t7d0   0        976690880 0        c6t7d0   ENA

v  vx-vol-L05   -            ENABLED  ACTIVE   976690880 SELECT   -        fsgen
pl vx-vol-P09   vx-vol-L05   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c4t3d0-02    vx-vol-P09   c4t3d0   0        976690880 0        c4t3d0   ENA
pl vx-vol-P10   vx-vol-L05   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c7t3d0-02    vx-vol-P10   c7t3d0   0        976690880 0        c7t3d0   ENA

v  vx-vol-L06   -            ENABLED  ACTIVE   976690880 SELECT   -        fsgen
pl vx-vol-P11   vx-vol-L06   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c4t7d0-02    vx-vol-P11   c4t7d0   0        976690880 0        c4t7d0   ENA
pl vx-vol-P12   vx-vol-L06   ENABLED  ACTIVE   976690880 CONCAT   -        RW
sd c7t7d0-02    vx-vol-P12   c7t7d0   0        976690880 0        c7t7d0   ENA

UFS


d300: Mirror
  Submirror 0: d100
    State: Okay
  Submirror 1: d200
    State: Okay
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 5860524032 blocks (2.7 TB)

d100: Submirror of d300
  State: Okay
  Size: 5860524032 blocks (2.7 TB)
  Stripe 0: (interlace: 32 blocks)
      Device     Start Block  Dbase        State Reloc Hot Spare
      c5t2d0s0          0     No            Okay   Yes
      c5t6d0s0          0     No            Okay   Yes
      c4t2d0s0          0     No            Okay   Yes
      c4t6d0s0          0     No            Okay   Yes
      c7t2d0s0          0     No            Okay   Yes
      c7t6d0s0          0     No            Okay   Yes


d200: Submirror of d300
  State: Okay
  Size: 5860524032 blocks (2.7 TB)
  Stripe 0: (interlace: 32 blocks)
      Device     Start Block  Dbase        State Reloc Hot Spare
      c6t2d0s0          0     No            Okay   Yes
      c6t6d0s0          0     No            Okay   Yes
      c1t2d0s0          0     No            Okay   Yes
      c1t6d0s0          0     No            Okay   Yes
      c0t2d0s0          0     No            Okay   Yes
      c0t6d0s0          0     No            Okay   Yes


d30: Mirror
  Submirror 0: d31
    State: Okay
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 12289725 blocks (5.9 GB)

d31: Submirror of d30
  State: Okay
  Size: 12289725 blocks (5.9 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c5t0d0s5          0     No            Okay   Yes


d20: Mirror
  Submirror 0: d21
    State: Okay
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 4096575 blocks (2.0 GB)

d21: Submirror of d20
  State: Okay
  Size: 4096575 blocks (2.0 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c5t0d0s1          0     No            Okay   Yes


d10: Mirror
  Submirror 0: d11
    State: Okay
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 22539195 blocks (10 GB)

d11: Submirror of d10
  State: Okay
  Size: 22539195 blocks (10 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c5t0d0s0          0     No            Okay   Yes


Device Relocation Information:
Device   Reloc  Device ID
c6t2d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXV85H
c6t6d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXR0HH
c1t2d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHWP5AF
c1t6d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXSRKH
c0t2d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHWP74F
c0t6d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXRGUH
c5t2d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXUN7H
c5t6d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXR35H
c4t2d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXT7LH
c4t6d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXR0JH
c7t2d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXTYTH
c7t6d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHWP7KF
c5t0d0   Yes    id1,sd@SATA_____HITACHI_HDS7250S______KRVN67ZBHXTZBH

ZFS


pool: pool
state: ONLINE
scrub: none requested
config:

      NAME        STATE     READ WRITE CKSUM
      pool        ONLINE       0     0     0
        mirror    ONLINE       0     0     0
          c5t1d0  ONLINE       0     0     0
          c6t1d0  ONLINE       0     0     0
        mirror    ONLINE       0     0     0
          c5t5d0  ONLINE       0     0     0
          c6t5d0  ONLINE       0     0     0
        mirror    ONLINE       0     0     0
          c4t1d0  ONLINE       0     0     0
          c1t1d0  ONLINE       0     0     0
        mirror    ONLINE       0     0     0
          c4t5d0  ONLINE       0     0     0
          c1t5d0  ONLINE       0     0     0
        mirror    ONLINE       0     0     0
          c7t1d0  ONLINE       0     0     0
          c0t1d0  ONLINE       0     0     0
        mirror    ONLINE       0     0     0
          c7t5d0  ONLINE       0     0     0
          c0t5d0  ONLINE       0     0     0

errors: No known data errors

bash-3.00# zfs get all pool/test
NAME       PROPERTY       VALUE                  SOURCE
pool/test  type           filesystem             -
pool/test  creation       Tue Jan 29 11:02 2008  -
pool/test  used           50.1G                  -
pool/test  available      2.63T                  -
pool/test  referenced     50.1G                  -
pool/test  compressratio  1.00x                  -
pool/test  mounted        yes                    -
pool/test  quota          none                   default
pool/test  reservation    none                   default
pool/test  recordsize     8K                     local
pool/test  mountpoint     /test/zfs              local
pool/test  sharenfs       off                    default
pool/test  checksum       on                     default
pool/test  compression    off                    default
pool/test  atime          on                     default
pool/test  devices        on                     default
pool/test  exec           on                     default
pool/test  setuid         on                     default
pool/test  readonly       off                    default
pool/test  zoned          off                    default
pool/test  snapdir        hidden                 default
pool/test  aclmode        groupmask              default
pool/test  aclinherit     secure                 default
pool/test  canmount       on                     default
pool/test  shareiscsi     off                    default
pool/test  xattr          on                     default

2008-02-11

Live Upgrade - problem with ludelete

A few weeks ago I was trying to do Live Upgrade from Solaris 10 11/06 to 8/07. It went quite well until I decided to delete the old BE:


bash-3.00# uname -a
SunOS mickey 5.10 Generic_127112-07 i86pc i386 i86pc
bash-3.00# cat /etc/release
                        Solaris 10 8/07 s10x_u4wos_12b X86
           Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                            Assembled 16 August 2007
bash-3.00# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
11-06                      yes      no     no        yes    -
8-07                       yes      yes    yes       no     -
bash-3.00# ludelete 11-06
The boot environment <11-06> contains the GRUB menu.
Attempting to relocate the GRUB menu.
/usr/sbin/ludelete: lulib_relocate_grub_slice: not found
ERROR: Cannot relocate the GRUB menu in boot environment <11-06>.
ERROR: Cannot delete boot environment <11-06>.
Unable to delete boot environment.

The only one useful solution was at http://tech.groups.yahoo.com/group/solarisx86/message/44111
Juergen Keil proposed to use lulib from OpenSolaris. Because I didn't have any DVD with OpenSolaris, asked Juergen for copy of lulib and he sent me one. After replacing the original:


bash-3.00# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
11-06                      yes      no     no        yes    -
8-07                       yes      yes    yes       no     -
bash-3.00# ludelete 11-06
Determining the devices to be marked free.
Updating boot environment configuration database.
Updating boot environment description database on all BEs.
Updating all boot environment configuration databases.
Updating GRUB menu default setting
Boot environment <11-06> deleted.
bash-3.00# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
8-07                       yes      yes    yes       no     -

Thanks Juergen !

2008-02-08

Jarod Jenson is coming back ...

Jarod Jenson, the famous Texas Ranger, first DTrace user outside of Sun, author of Java DVM provider , has changed company and now, after a silence, is back with his blog !

2007-10-03

Sunsolve: A deeper look into vmstat statistics using DTrace and mdb

There is an excellent document released two days ago at Sunsolve:
A deeper look into vmstat statistics using DTrace and mdb
The document describes how to drill down into vmstat statistics using DTrace and mdb. There are so many fantastic DTrace and mdb commands that it is really worth looking (like any other DTrace documentation ;-) )
I like especially the mdb command to see swapped out processes:


mdb -k << EOF
::walk thread myvar|::print  kthread_t t_schedflag|::grep .==0|::eval p_user.u_comm
EOF

Excellent document !

One should also note that there is (for a long time) similar document: "Using DTrace to understand mpstat and vmstat output"

PS. What a pity that this page is only for customers with Sunsolve account. Is should be available for everyone, especially for Linux advocates ;-)