Monthly Archives: January 2013

NFC Tags and Nexus 4 Woes

For those who prefer fortune cookies to novels: “If you want a happy life, only buy NFC Tag Types 1, 2, 3, or 4. MIFARE Classic NFC tags don’t talk with all devices.”

A while back, when I was first researching all about NFC and NFC tags, I was a little disappointed when I realized that typical NFC tags used a proprietary protocol called MIFARE Classic on top of the standard base NFC protocol.

"It uses an NXP proprietary security protocol...This means, only devices
with an NXP nfc controller chip can read/write these tags."

-- http://en.wikipedia.org/wiki/MIFARE#MIFARE_Classic

Well, it turns out we now have our first casualty of war; the Nexus 4 (and Nexus 10) do not use an NXP NFC controller chip unlike previous devices such as the Nexus 7 and Galaxy Nexus. Instead, it uses a Broadcom NFC controller chip which doesn’t support NXP proprietary protocol. I’ve also heard that the NFC chips used in both Windows Phone 8 and Blackberry devices do not support MIFARE Classic either.

Most other NFC communication seems to be pretty standardized. Android Beam relies on pure NFC Forum protocols with an Android-specific protocol on top and still works. PayPass also works through Google Wallet as it’s based on existing, published EMV smartcard standards. I have sucessfully used both with my Nexus 4. However, if I want to use NFC Tags with my Nexus 4, I need to stick with “standard” NFC Forum type 1-4 tags which are a part of the official NFC specification.

Unfortunately, the MIFARE family of tags are much more common and might even be a little cheaper. They also tend to include more storage space; A 1K MIFARE Classic tag has 752 bytes of usable storage, whereas a typical NFC Tag Type 1-2 offers 96-144 bytes of storage. However, NFC Tags up 4096 bytes are supported by the standard NFC Forum protocol and I have seen Topaz Type 1 tags as large as 512 bytes for sale.

Searching Amazon, I can find a lot of NFC Tags, but it’s harder to reliably weed out MIFARE Classic tags from the results. I’ve read that any tags claiming to specifically be for Windows Phone 8 are safe as WP8 doesn’t support MIFARE Classic, but something in me just doesn’t like that solution. I have switched to purchasing my NFC Tags from tagstand; Any of their tags that don’t say MIFARE are standard NFC Forum tags. I’ve successfully used their tags with my Nexus 4. Andy Tags also seems to be a good place to look, but I have not yet tried them out.

Tags labeled NTAG203, Topaz, DESFire and Felica are standard NFC Forum tags and should work. Tags label MIFARE Ultralight C (the C in the name is important) also work, but be careful, MIFARE Ultralight (without the C) uses the same protocol as MIFARE Classic. Some of the well-known proprietary MIFARE Classic tags to avoid include Samsung TecTiles, Tags for Droid, and the Movaluate tag.

Diagnosing a slow server

Our server at Alzatex has been acting slow for a while now, but I have been unable to find a cause until now. With what I found out today, I’m beginning to think that we just have an old, slow hard drive in our RAID6. What do you think?

$ sudo hdparm -tT --direct /dev/sd[abcd]
/dev/sda:
 Timing O_DIRECT cached reads:   420 MB in  2.00 seconds = 209.87 MB/sec
 Timing O_DIRECT disk reads:  318 MB in  3.01 seconds = 105.60 MB/sec

/dev/sdb:
 Timing O_DIRECT cached reads:   492 MB in  2.00 seconds = 245.53 MB/sec
 Timing O_DIRECT disk reads:  268 MB in  3.10 seconds =  86.40 MB/sec

/dev/sdc:
 Timing O_DIRECT cached reads:   408 MB in  2.01 seconds = 203.34 MB/sec
 Timing O_DIRECT disk reads:  146 MB in  3.12 seconds =  46.76 MB/sec

/dev/sdd:
 Timing O_DIRECT cached reads:   478 MB in  2.01 seconds = 238.25 MB/sec
 Timing O_DIRECT disk reads:  272 MB in  3.01 seconds =  90.50 MB/sec

sdc’s cached read time looks ok, but it’s raw disk read is less than half of the next slowest drive. Next up, I tried looking at the S.M.A.R.T. attributes of each drive. I’ve trimmed the output to only the most interesting attributes.

$ sudo smartctl -A /dev/sda
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   240   236   021    Pre-fail  Always       -       1000
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       23
  9 Power_On_Hours          0x0032   082   082   000    Old_age   Always       -       13603
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

$ sudo smartctl -A /dev/sdb
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   239   236   021    Pre-fail  Always       -       1016
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       30
  9 Power_On_Hours          0x0032   067   067   000    Old_age   Always       -       24607
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

$ sudo smartctl -A /dev/sdc
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       2
  3 Spin_Up_Time            0x0003   225   224   021    Pre-fail  Always       -       5741
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       48
  9 Power_On_Hours          0x0032   042   042   000    Old_age   Always       -       42601
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       2

$ sudo smartctl -A /dev/sdd
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   239   238   021    Pre-fail  Always       -       1025
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       37
  9 Power_On_Hours          0x0032   064   064   000    Old_age   Always       -       26475
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

First, a quick review of S.M.A.R.T. attributes for the uninitiated. RAW_VALUE is the actual value of the attribute such as degrees Celsius or hours powered-on whereas VALUE is the attribute normalized to a scale of 1-253. The normalized value is used for easy analysis without much regard to the meaning of the attribute and uses 1 to represent the worst case scenario and 253 for the best case. RAW_VALUE, on the other hand, often increases when growing worse, like number of bad sectors. WORST is the lowest recorded value for VALUE. There are two flavors of attributes, old age and pre-fail. When a pre-fail attribute VALUE crosses the manufacturer’s defined threshold (THRESH), it means failure is imminent. Old age attributes just indicate wear and tear and don’t usually have a threshold.

After reviewing the S.M.A.R.T. attributes, there are definitely issues with sdc. This is the only drive showing a non-zero Raw_Read_Error_Rate as well as Offline_Uncorrectable. The Power_On_Hours, which really just indicates age, is nearly double of the second oldest drive. That’s not necessarily a problem, but what really worries me is that the Spin_Up_Time is much higher than other drives. The Start_Stop_Count, is also the highest on sdc.

Through the normal wear and tear of a hard drive, sectors can go bad and become unusable. Modern hard drives normally have a number of hidden, unused sectors reserved for this situation. When the hard drive controller detects a failure to update a sector, it will remap the sector address to one of the unused reserved sectors instead. When this happens, but there are no more free reserved sectors for remapping, you get an uncorrectable sector. When the value of Offline_Uncorrectable goes above zero, it means you have more bad sectors than were originally reserved by the manufacture and so can no longer be used.

Currently, none of the attributes have crossed a threshold indicating potential failure, and, interestingly, attributes 1 and 198 don’t appear to be reflected in the VALUE field. I also run regular, nightly S.M.A.R.T. tests on all drives and no issues have been reported from the tests. There is also nothing in the system logs of any recent errors or warnings; the only issue is the server is, overall, a little sluggish. Here’s a snippet of the self-test log as retrieved from the hard drive itself:

$ sudo smartctl -l selftest /dev/sdc
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
...
# 8  Short offline       Completed without error       00%     42414         -
# 9  Extended offline    Completed without error       00%     42395         -
...

If there were any failed attributes or tests, I would normally get an email about it as well as have the event logged by syslog. I have the smartd daemon installed and configured to run regular tests which something similar to this:

$ cat /etc/smartd.conf
/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03) -m root
/dev/sdb -a -o on -S on -s (S/../.././02|L/../../6/03) -m root
/dev/sdc -a -o on -S on -s (S/../.././02|L/../../6/03) -m root
/dev/sdd -a -o on -S on -s (S/../.././02|L/../../6/03) -m root

This says enable S.M.A.R.T and monitor hard drives /dev/sd[a-d]. Run a short test every night at 2 AM and an extended (long) test every Sunday at 3 AM. If there are any warnings or errors, report via syslog and send an email to root. I normally have most daemons send critical mail like this to root and make root an alias that redirects the email to all the primary system administrators.

Well, next up, I plan to take a look at running a full-fledged benchmark on the drives and replacing any drives that don’t quite meet expectations.

Monitoring Dæmons with CFEngine 3

I’ve been looking for a nice, simple method for verifying that all key services and dæmons are running on my UNIX servers. It’s pretty rare for a service to die, but, when it does, I want it restarted as soon as possible and I want to be notified about it! I’ve looked at process supervisors like daemontools and runit that are designed to run continuously and monitor the dæmons they start, but they tend to require a little more effort to maintain than I like. Normally, the sysadmin has to write the start-up script that sets up the environment and starts the dæmon, but it also must somehow coerce the dæmon to not do it’s normal double-fork and disassociate from it’s parent process and foreground terminal. I mainly just want a way to check for a process that’s not running and run the appropriate command to restart it.

CFEngine 3 seems to be a little closer to what I want and after reading through the excellent Learning CFEngine 3 book from O’Reilly, I think I’ve finally figured out the right recipe. All I want to do is to specify a process to look for and, if that process is not running, to specify a command to run that will restart the process. I would also like a report if a process ever needs to be restarted since that normally represents an abnormal event. Here’s the basic configuration I have to monitor a few services:

  any::
    "services[ssh][name]"           string => "OpenSSH";
    "services[ssh][process]"        string => "/usr/sbin/sshd";
    "services[ssh][restart]"        string => "/usr/sbin/invoke-rc.d ssh restart";

  web|dns::
    "services[ldap][name]"          string => "OpenLDAP";
    "services[ldap][process]"       string => "/usr/sbin/slapd";
    "services[ldap][restart]"       string => "/usr/sbin/invoke-rc.d slapd restart";
    "services[bind][name]"          string => "BIND";
    "services[bind][process]"       string => "/usr/sbin/named";
    "services[bind][restart]"       string => "/usr/sbin/invoke-rc.d bind9 restart";

  web::
    "services[apache][name]"        string => "Apache";
    "services[apache][process]"     string => "/usr/sbin/apache2";
    "services[apache][restart]"     string => "/usr/sbin/invoke-rc.d apache2 restart";

And that’s it! The above says that everyone must be running OpenSSH, servers web and dns must be running LDAP and BIND, and server web must be running Apache.  It also gives the name of the process to look for and the command necessary to restart the process if it’s not running.  I just repeat the same three lines for each service that I want to monitor and place the correct classes in front to select which servers will run those services.  Here’s the full file to run this:

body common control
{
    bundlesequence => { "services" };
    inputs => { "cfengine_stdlib.cf" };
}

bundle agent services
{
vars:
  any::
    "services[ssh][name]"           string => "OpenSSH";
    "services[ssh][process]"        string => "/usr/sbin/sshd";
    "services[ssh][restart]"        string => "/usr/sbin/invoke-rc.d ssh restart";

  web|dns::
    "services[ldap][name]"          string => "OpenLDAP";
    "services[ldap][process]"       string => "/usr/sbin/slapd";
    "services[ldap][restart]"       string => "/usr/sbin/invoke-rc.d slapd restart";
    "services[bind][name]"          string => "BIND";
    "services[bind][process]"       string => "/usr/sbin/named";
    "services[bind][restart]"       string => "/usr/sbin/invoke-rc.d bind9 restart";

  web::
    "services[apache][name]"        string => "Apache";
    "services[apache][process]"     string => "/usr/sbin/apache2";
    "services[apache][restart]"     string => "/usr/sbin/invoke-rc.d apache2 restart";

  any::
    "services" slist => getindices("services");

processes:
    "$(services[$(services)][process])"
        restart_class => "service_$(services)_restart";

commands:
    "$(services[$(services)][restart])"
        classes => if_notkept("service_$(services)_failed"),
        ifvarclass => "service_$(services)_restart";

reports:
    "$(services[$(services)][name]) is not running, restarting..."
        ifvarclass => "service_$(services)_restart";

    "$(services[$(services)][name]) failed to start!"
        ifvarclass => "service_$(services)_failed";
}

All the configuration of each service takes place entirely in the vars: section.  The processes: section takes care of searching for each process from the services array and declaring a class if the process was not found.  The commands: section runs the appropriate command to restart the service if the corresponding class was set by processes:.  Normally, CFEngine is silent, but the reports: section will generate output that goes into an email and log file if the process needed to be restarted and if there were any errors restarting the process.

This solution isn’t perfect, unlike a real process supervisor, CFEngine does not get immediate notification when a process dies, much less knowing the cause of death, but it does offer reasonable response time.  My example also doesn’t handle any rate limiting or limiting how many attempts there are to restart a dæmon, but CFEngine does have some amount of built in rate limiting.  Overall, I’ve found this solution simple to maintain and scale, and it’s now live and running on my servers.