2009-10-12

Flash archive ERROR: Archive found, but is not a valid Flash archive

Just recently I was doing another Solaris 10 server installation using Flash archive. This time I encountered the following error:
ERROR: Archive found, but is not a valid Flash archive

I was a bit surprised because I did so many flar installations ... . Anyway little Googling a found out where problem is - I have to remove file
/var/sadm/system/admin/.platform

The above file remembers what platform (sun4u, sun4v, etc) is this flash file prepared for and it turned out that the new servers platform is not included in this file. Removing it and then recreating the archive repairs this error:
mv /var/sadm/system/admin/.platform /var/sadm/system/admin/.platform.old

More detailed description at Sunsolve.

2009-10-06

How to lock Oracle SGA memory in RAM within zone

A month ago I had a discussion with our DBA about how to make Oracle SGA memory ineligible for swapping/pagging (we had such accident). Conclusion was to use lock_sga parameter. A few days ago the DBA told me that Oracle support told him that the lock_sga parameter is not supported on Solaris. I didn't belive it and checked (you know, Google etc ;-) ). After a while it turned out that locking shared memory in RAM is done by syscall shmctl(id,SHM_LOCK) which can be done only by root ! Once again Oracle support didn't do its homework ;-) But the question remains: can we do it somehow ? There are some solutions where you can write program in C and call shmctl on requested shared memory segments but I don't find it as an elegant solution. At first I had no idea how to approach the problem. After a while my memory started working and recalled that Solaris privileges might help.
Below is example of what one should do to lock Oracle SGA in Solaris zones memory.
We need to fulfill two requirements:
  • give the zone privilege to lock some memory in RAM
  • give Oracle user privilege to lock the same memory as non-root user

The first requiremnt is achieved using zone's feature:

...
set limitpriv=default,proc_lock_memory
...

But using just this doesn't let us lock the SGA:

server:sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.8.0 - Production on Tue Oct 6 15:08:17 2009

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to an idle instance.

SQL> startup
ORA-27126: unable to lock shared memory segment in core
SVR4 Error: 1: Not owner

Oracle user is not able to use it.

The above example - zones privilege - was just the first step. Now time for the second - we need to let user oracle run the shmctl syscall. Solaris RBAC is rescue for us. You need to run the following inside the zone:

usermod -K defaultpriv=basic,proc_lock_memory oracle

Please remember that you need to logout and login into oracle account after usermod command ! Otherwise it won't work !

Let's check if it it saved:

# grep oracle /etc/user_attr
oracle::::type=normal;defaultpriv=basic,proc_lock_memory

Now:

server:sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.8.0 - Production on Tue Oct 6 15:12:56 2009

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1696565248 bytes
Fixed Size 731136 bytes
Variable Size 620756992 bytes
Database Buffers 1073741824 bytes
Redo Buffers 1335296 bytes
Database mounted.
Database opened.
SQL> show parameter lock_sga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
lock_sga boolean TRUE

Easy, isn't it ? :-)

2009-10-05

How to change hostid in Veritas Volume Manager

# vxdctl hostid servername
# cat /etc/vx/volboot
volboot 3.1 0.3 80
hostid servername
hostguid {50b73d9c-1dd2-11b2-a839-002128045b8e}
end
###############################################################
###############################################################
###############################################################
###############################################################
###############################################################
###############################################################
##########################################

2009-08-24

vdbench - IOPS vs number of files on vxfs

While benchmarking storage I found interesting dependency between number of files and IOPS on Veritas FIle System. Below is configuration file for vdbench:

*sd=sd1,lun=/vxfs/ams-vol/file1
*sd=sd2,lun=/vxfs/ams-vol/file2
*sd=sd3,lun=/vxfs/ams-vol/file3
*sd=sd4,lun=/vxfs/ams-vol/file4
*sd=sd5,lun=/vxfs/ams-vol/file5
*sd=sd6,lun=/vxfs/ams-vol/file6
*sd=sd7,lun=/vxfs/ams-vol/file7
*sd=sd8,lun=/vxfs/ams-vol/file8
*sd=sd9,lun=/vxfs/ams-vol/file9
*sd=sd10,lun=/vxfs/ams-vol/file10
wd=rg-1,sd=sd*,rdpct=70,rhpct=0,whpct=0,xfersize=8k,seekpct=100
rd=rd_rg-1,wd=rg-1,interval=1,iorate=max,elapsed=10,forthreads=(64)

All files were created using 'mkfile 100m '. Each time vdbench was run I enabled one more file. E.g. the very first run (just with file1) gave the following result:

gawron:/opt/SUNWvdbench# ./vdbench -f oltp_vxfs.cfg


Vdbench distribution: vdbench501
For documentation, see 'vdbench.pdf'.

12:35:30.755 input argument scanned: '-foltp_vxfs.cfg'
12:35:31.387 Starting slave: /opt/SUNWvdbench/vdbench SlaveJvm -m 192.168.220.10 -n localhost-10-090824-12.35.30.332 -l localhost-0 -p 5570
12:35:32.740 All slaves are now connected
12:35:40.015 Starting RD=rd_rg-1; I/O rate: Uncontrolled MAX; Elapsed=10; For loops: threads=64

sie 24, 2009 interval i/o MB/sec bytes read resp resp resp cpu% cpu%
rate 1024**2 i/o pct time max stddev sys+usr sys
12:35:41.315 1 5235,00 40,90 8192 70,24 11,191 244,182 19,690 12,1 9,8
12:35:42.081 2 5160,00 40,31 8192 69,65 11,962 149,638 20,754 12,1 9,2
12:35:43.093 3 5239,00 40,93 8192 70,30 12,331 157,169 21,972 11,2 9,5
12:35:44.070 4 5507,00 43,02 8192 70,35 11,606 160,935 20,391 10,5 8,9
12:35:45.082 5 5532,00 43,22 8192 69,99 11,682 151,976 20,245 9,5 8,3
12:35:46.062 6 5428,00 42,41 8192 69,95 11,661 156,968 20,409 9,7 8,2
12:35:47.054 7 5403,00 42,21 8192 69,57 11,807 165,627 20,287 9,3 8,2
12:35:48.061 8 5550,00 43,36 8192 70,07 11,543 138,574 20,030 9,6 8,3
12:35:49.068 9 5355,00 41,84 8192 69,02 11,923 156,828 20,487 9,4 8,1
12:35:50.051 10 5511,00 43,05 8192 69,75 11,572 140,281 20,265 9,7 8,4
12:35:50.074 avg_2-10 5409,44 42,26 8192 69,85 11,783 165,627 20,535 10,1 8,6
12:35:50.770 Vdbench execution completed successfully. Output directory: /opt/SUNWvdbench/output
12:35:50.808 Slave localhost-0 terminated


Next run - with two files enabled:

gawron:/opt/SUNWvdbench# ./vdbench -f oltp_vxfs.cfg


Vdbench distribution: vdbench501
For documentation, see 'vdbench.pdf'.

12:37:09.932 input argument scanned: '-foltp_vxfs.cfg'
12:37:10.668 Starting slave: /opt/SUNWvdbench/vdbench SlaveJvm -m 192.168.220.10 -n localhost-10-090824-12.37.09.697 -l localhost-0 -p 5570
12:37:10.669 Starting slave: /opt/SUNWvdbench/vdbench SlaveJvm -m 192.168.220.10 -n localhost-11-090824-12.37.09.697 -l localhost-1 -p 5570
12:37:11.950 All slaves are now connected
12:37:20.013 Starting RD=rd_rg-1; I/O rate: Uncontrolled MAX; Elapsed=10; For loops: threads=64

sie 24, 2009 interval i/o MB/sec bytes read resp resp resp cpu% cpu%
rate 1024**2 i/o pct time max stddev sys+usr sys
12:37:21.276 1 9237,00 72,16 8192 70,39 12,284 238,102 22,018 19,9 15,4
12:37:22.097 2 9527,00 74,43 8192 69,62 12,842 181,819 22,161 21,0 17,1
12:37:23.101 3 10201,00 79,70 8192 70,03 12,565 233,502 21,898 19,7 16,6
12:37:24.095 4 10271,00 80,24 8192 69,72 12,585 204,812 21,623 19,0 16,7
12:37:25.087 5 10230,00 79,92 8192 70,82 12,371 184,102 21,539 19,4 16,8
12:37:26.066 6 10334,00 80,73 8192 69,89 12,368 161,469 21,197 19,0 16,5
12:37:27.076 7 10107,00 78,96 8192 69,84 12,663 180,650 21,783 18,1 16,1
12:37:28.070 8 10285,00 80,35 8192 69,92 12,420 168,582 21,336 18,5 16,2
12:37:29.057 9 10044,00 78,47 8192 69,28 12,754 204,018 21,862 18,1 15,9
12:37:30.064 10 10194,00 79,64 8192 69,39 12,521 174,452 21,434 18,3 16,2
12:37:30.080 avg_2-10 10132,56 79,16 8192 69,84 12,563 233,502 21,644 19,0 16,4
12:37:30.904 Slave localhost-1 terminated
12:37:30.930 Vdbench execution completed successfully. Output directory: /opt/SUNWvdbench/output
12:37:30.988 Slave localhost-0 terminated

You can see that we double IOPS. Below is chart which shows what is happening when we add more files:


I am not sure why this dependency exists. Perhaps single writer lock is involved.
More to come :-)

2009-08-14

DTrace - jstack/ustack string table overflows

Just recently watching some java programs using DTrace I got the following messages:

...
dtrace: 7 jstack()/ustack() string table overflows
dtrace: 13 jstack()/ustack() string table overflows
dtrace: 9 jstack()/ustack() string table overflows
dtrace: 8 jstack()/ustack() string table overflows
dtrace: 13 jstack()/ustack() string table overflows
dtrace: 17 jstack()/ustack() string table overflows
dtrace: 8 jstack()/ustack() string table overflows
dtrace: 10 jstack()/ustack() string table overflows

To eliminate this problem you can use:

dtrace -x jstackstrsize=1024

2009-07-14

Filesystem Cache Optimization Strategies

There is an excellent blog entry by Brad Diggs where we can read about Solaris filesystem caches used by UFS and ZFS. It is divided into a few parts:

Introduction
Why Use Filesystem Cache
Solaris Filesystem Caches: segmap cache
Solaris Filesystem Caches: Vnode Page Mapping cache
Solaris Filesystem Caches: ZFS Adaptive Replacement Cache (ARC)
Solaris Filesystem Caches: Memory Contention
How Filesystem Cache Can Improve Performance
Establish A Safe Ceiling
UFS Default Caching
ZFS Default Caching
Optimized ZFS Filesystem Caching
Tuning ZFS Cache
Unlock The Governors
Avoid Diluting The Filesystem Cache
Match Data Access Patterns
Consider Disabling vdev Caching
Minimize Application Data Pagesize
Match Average I/O Block Sizes
Consider The Pros and Cons of Cache Flushes
Prime The Filesystem Cache

Really worth reading !

How to change compression algorithm used in zfs

If you want to change the way zfs compresses you data you can use the following example:

# zfs create pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression on inherited from pool
# zfs set compression=gzip pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression gzip local
# zfs set compression=lzjb pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression lzjb local
# zfs set compression=gzip-9 pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression gzip-9 local
# zfs set compression=on pool/testfs
# zfs get compression pool/testfs
NAME PROPERTY VALUE SOURCE
pool/testfs compression on local

2009-07-08

VxFS (veritas file system) - how to check and resize intent log size

How to query the current size of the intent log:

# fsadm -F vxfs -L /vxfs/my-vol
UX:vxfs fsadm: INFO: V-3-25669: logsize=16384 blocks, logvol=""


How to resize it:

# fsadm -F vxfs -o logsize=32768 /vxfs/my-vol
# fsadm -F vxfs -L /vxfs/my-vol
UX:vxfs fsadm: INFO: V-3-25669: logsize=32768 blocks, logvol=""


Basically with bigger intent log recovery time is proportionately longer and the file system may consume more system resources (such as memory) during normal operation. But also VxFS performs better with larger log sizes.

2009-07-03

Linux - how to turn on framebuffer during boot process

Just write 'vga=some_number' into the menu.lst file (if you use grub - does anybody use lilo yet ? ;-) ). Example values for some_number:

1280x1024x64k (vga=794)
1280x1024x256 (vga=775)
1024x768x64k (vga=791)
1024x768x32k (vga=790)
1024x768x256 (vga=773)
800x600x64k (vga=788)
800x600x32k (vga=787)
800x600x256 (vga=771)

Debian - how to "clone" packages to other server

This is perhaps my first blog entry related to Linux :-). Anyway I have just came across problem of moving one installed Debian (Lenny) to the other server. While rsync is a perfect solution for copying all the personal files I would like to avoid using it to copy all the Debian binaries. There is another, more reliable, way. On the source Debian just run:

host1:~# dpkg --get-selections > /tmp/dpkg.txt
host1:~# head /tmp/dpkg.txt
a2ps install
acpi install
acpi-support install
acpi-support-base install
acpid install
adduser install
adobe-flashplugin install
adobereader-enu install
akregator install
alien install

Transfer this (dpkg.txt) file to the target system (already installed as a minimal) and run:

host2:~# dpkg --set-selections < /tmp/dpkg.txt
host2:~# apt-get -u dselect-upgrade

And Voila ! - apt-get will download and install all the requested package.

2009-07-01

Prezentation during Storage conference

I got the opportunity to present at the Storage GigaCon conference in Warsaw about month ago. My presentation covered a bit of "introduction" to storage (since my presentation was the first) and later I "switched" to storage performance benchmarking. Of course it would be difficult to not to mention Vdbench and SWAT :-)

I wrote it in Polish and English (I know it is not a recommended way of writing presentations ...) so you can read it at least partially.

Optimizing Postgresql application performance with Solaris dynamic tracing

There is an excellent BluePrints article about DTrace probes in Postgresql. You can download it over here. Using DTrace you can get amazing information about internals of Postgresql, for instance (example taken from this article) having this DTrace script:

# cat query_load.d
#!/usr/sbin/dtrace -qs
dtrace:::BEGIN
{
printf(“Tracing... Hit Ctrl-C to end.\n”);
}
postgresql*:::query-start
{
self->query = copyinstr(arg0);
self->pid = pid;
}
postgresql*:::query-done
{
@queries[pid, self->query] = count();
}
dtrace:::END
{
printf(“%5s %s %s\n”, “PID”, “COUNT”, “QUERY”);
printa(“%6d %@5d %s\n”, @queries);
}

we get:

PID COUNT QUERY
1221 154 UPDATE tellers SET tbalance = tbalance + -487 WHERE tid = 25;
1221 204 UPDATE tellers SET tbalance = tbalance + 1051 WHERE tid = 42;
1220 215 UPDATE accounts SET abalance = abalance + -4302 WHERE aid = 144958;
1220 227 UPDATE accounts SET abalance = abalance + 2641 WHERE aid = 441283;


Isn't it amazing ?

I don't know if Larry Ellison is aware of DTrace but I wish I had the same DTrace probes in Oracle ...

2009-05-19

Veritas Volume Manager - display DMP IO statistics

If you are Veritas Volume Manager user you can use DMP to configure path failover and load balancing using many links (data paths) to storage. vxdmpadm is the command to do all the tasks. Just recently I had to check if data goes along configured links. You can achive this using the same vxdmpadm.

To enable the gathering of statistics use this command:

vxdmpadm iostat start

To reset the IO counters:

vxdmpadm iostat reset

To display the current statistics:

vxdmpadm -z iostat show all interval=60
cpu usage = 3037899us per cpu memory = 32768b
OPERATIONS BLOCKS AVG TIME(ms)
PATHNAME READS WRITES READS WRITES READS WRITES
c2t50060E80102C7770d16s2 99 43566 1584 1419358 0.55 0.03
c4t50060E80102C7772d16s2 101 37584 1616 1366880 0.74 0.17
c2t50060E80102C7770d21s2 2192 1623 87424 29072 0.11 0.04
c4t50060E80102C7772d21s2 1041 1059 46016 19840 0.16 0.24
...

The first output shows statistics from server boot. Next listings shows current data:

c2t50060E80102C7770d21s2 1 2 16 32 0.39 0.03
c4t50060E80102C7772d21s2 5 0 80 0 0.72 0.00
c2t50060E80102C7770d57s2 2 4 32 112 0.50 0.19
c4t50060E80102C7772d57s2 4 2 64 80 0.03 0.02
...


To disable the gathering of statistics use this command:

vxdmpadm iostat stop

2009-02-27

2009-02-16

DTrace CPU Performance Counter provider

Jonathan Haslam described DTrace CPU Performance Counter provider which "... gives you the ability
to profile your system by many different types of processor related events; the list of
events is processor specific and usually quite large but typically includes events such
as cycles executed, instructions executed, cache misses, TLB misses and many more ... ".
All the information are here, here and here.

2009-02-05

Optimizing MySQL Database Application Performance with Solaris Dynamic Tracing (DTrace)

There is fantastic white paper how taking advantage of Solaris Dynamic Tracing (DTrace) probes can help simplify MySQL database application tuning. The content of this white paper:

  • Introduction
  • Approaching MySQL Database Application Tuning
  • The Advantages of Solaris Dynamic Tracing
  • Simplifying and Speeding Performance Tuning Efforts
  • Analyzing Query Loads
  • Probing the Cost of File Sort Operations
  • Profiling the Use of Stored Procedures
  • Observing Slave Queries
  • Optimizing Use of the MySQL Database Query Cache
  • Putting it all Together
  • For More Information
  • About the Author
  • Related Resources
  • Ordering Sun Documents
  • Accessing Sun Documentation Online

It is downloadble over here.

2009-01-27

Parallel bzip in CMT (multicore) environment

We have a lot of archive logs from our Oracle databases. To save storage space we compress them. Of course this process (single threaded) on CMT-based CPUs takes quite long time. To use power of these CPUs we should parallelize all (if possible) tasks. Pbzip2 befriends us - it is a parallel implementation of the bzip2. This is an example:

rudy:/tmp/l# ls -alh
total 4194336
drwxr-xr-x 2 root root 233 Jan 27 16:35 .
drwxrwxrwt 3 root sys 286 Jan 27 16:30 ..
-rw-r----- 1 root root 1.0G Jan 27 16:35 1
-rw-r----- 1 root root 1.0G Jan 27 15:41 2
rudy:/tmp/l# time bzip2 1

real 13m4.397s
user 12m59.987s
sys 0m4.310s
rudy:/tmp/l# time /opt/csw/bin/pbzip2 2

real 0m34.842s
user 34m46.095s
sys 0m7.972s
rudy:/tmp/l# time /opt/csw/bin/pbzip2 -l 2

real 0m41.907s
user 28m52.137s
sys 0m6.302s


We can see that the difference is tremendous !!!

The '-l' parameter determines max number processors to use (calculation based on load average). This server is quite idle so more heavy workload might lengthten the time.