2011年11月28日 星期一

Yahoo 系統工程師的技能需求與具體描述

##
1. Linux/FreeBSD
2. Apache
3. MySQL
4. PHP
5. Nagios
6. MRTG
7. Perl
8. Shell Script
##

A. Function as a technical generalist responsible for the overall health and performance of our many web applications and cloud services

B. Develop tools by scripts (shell scripts, Perl, python, …etc) to automate the deployment, administration, and monitoring of a large-scale Linux and FreeBSD environments on different global locations.

C. Gain deep application-level knowledge of the systems as well as contributing to their overall design

D. Work with development teams to harden, enhance, document, establish process and generally improve the operability of our systems

E. Assist in the configuration/build-out of new deployments to facilitate our constant growth

F. Participate in a global support of on-call pager rotation

Common Tech Skills for Search Cloud Service Engineering

Boot Process and Configs

Areas

    BIOS, POST
    Remote on/off Machines / Remote Boot
    MBR, GRUB, Lilo.
    Kernel device probing, root mount
    init process, run levels
    configs of apache, dns, network, …etc

Perl

    a hash and/or hashing function works.
    Sorting of arrays and hashes.
    Recursive functions
    Plus: Advanced Perl

Shell Scripting

Areas

    Bash / Commands
    Write an init scrip..
    Awk, sed, regular expression.
    Pipe, redirect
    If, loop, case, …

Monitoring, Alerting and Nagios

Areas

    nrpe,
    RTTS/Uranus,
    Plus: Cacti, ganglia, cricket, etc.
    Plus: Gomez or Keynote or other such tools

Troubleshooting

Areas

    df show partition is full, you delete a file, df still shows full. What's going on?
    Web performance is slow. What do you do? (Network, system, app, DB, etc. troubleshooting.)
    Disk is "slow" what might you look at? iostat, dmesg to see if the disk is dying, etc.
    Network is slow. What do you look at? duplex, saturation, etc.

Unix Internals

Areas

    Fork()
    Paging, swapping. How to tell what's occurring (vmstat, etc.) Ill-effects of paging.
    How typical FFS based filesystems work. Cylinder groups, inodes, data blocks, indirect blocks, super blocks.
    log/extents/journal based filesystems
    inodes and dirents.
    symlinks and hardlinks.
    /proc/ filesystem.
    Block devices vs. raw devices.
    FIFOs, regular files, …etc.
    sysctl and other tunables
    IPC

Networking/DNS

Areas

    how DNS works. Recursive vs. iterative. Zones. Record types.
    Socket, bind, …etc
    nsswitch.conf, /etc/hosts, resolv.conf.
    network performance tuning.
    SACK and interface aggregation.
    Deeper network topics like 802.1Q, core/aggregation|distribution/access switch infrastructure, BGP, OSPF, etc.
    Discussion of load balancing techniques. like DSR, SNAT, etc.
    Healthchecking
    TCP packets, IP packets, ethernet frames
    ARP and RARP, switches , routers.

Hardware, RAID, misc.

Areas

    RAID levels.
    Difference between 32-bit and 64-bit architectures.
    Software RAID vs. Hardware RAID.
    SAN vs. NAS.
    …

Infrastructure and Architecture

Areas

    A typical three tier web site architecture
    Concerns to design an architecture
    GSLB (akadns, brooklyn), failover, BCP.
    Capacity planning
    …

2011年11月27日 星期日

在 Linux 上調整 glassfish 的 porcess pool number 時遇到的問題

在 Linux 上每個 porcess 可以開啟的檔案數量預設為 1024,
glassfish 在預設 porcess pool number 為 5 時執行就已經開啟了 500 多的檔案,
所以要是將 porcess pool number 調整為 500 瞬間就會超過 1024,然後出現下面的 Error Message。

##
#|2011-11-26T21:39:51.917+080
0|SEVERE|glassfish3.0.1|grizzly|_ThreadID=18;_ThreadName=Thread-14;|doSelect IOException
java.io.IOException: 開啟太多檔案
       at sun.nio.ch.IOUtil.initPipe(Native Method)
       at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
       at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
       at java.nio.channels.Selector.open(Selector.java:209)
       at com.sun.grizzly.util.Utils.openSelector(Utils.java:100)
       at com.sun.grizzly.TCPSelectorHandler.initSelector(TCPSelectorHandler.java:399)
       at com.sun.grizzly.TCPSelectorHandler.preSelect(TCPSelectorHandler.java:379)
       at com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:183)
       at com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:130)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:662)
|#]

##

解法是要先調整每個 porcess 可以開啟的檔案數上限,
先編輯 /etc/security/limits.conf
加入下面這一行,目的是未來機器重開都會讀取我們設定的新上限值。

##
* soft nofile 65536 
##

然後用這個指令動態修改

##

ulimit -Sn 65536
##

參考資料:
http://www.cyberciti.biz/faq/howto-linux-get-list-of-open-files/
http://www.linuxidc.com/Linux/2010-04/25283.htm


2011年11月16日 星期三

不要期待完美,而是要判斷哪些缺陷不重要

網路上有人分享在《魔球》裡讀到的一句話「不要期待完美,而是要判斷哪些缺陷不重要」