# Yushan status

## Why is yushan (a.k.a. lalashan) slow?

It may be slow only for you, or only for you in a particular browser session, or a particular preview session. Or it may be slow for a particular project.

If you started a slow job in a particular preview session, and you know what you did wrong, it may work to bail on the preview session. Copy your edits, open a new version of the page, and start editing again (without doing whatever you did to make it too slow).

If you have an error on a complicated page, WW will try to make all of your targets making the error over and over, and wasting your time. Use the "disable make" directive to stop making after the file you are focused on, like this:

__DISABLE_MAKE__

If it is doing something slow in the main part of your project (running or merging), you really do have to wait before working on that project.

Yushan itself could also be bogged down, but right now it is distributing jobs to different nodes, so this doesn't seem likely. Or one of our file systems might be full. There is some diagnostic information below.

## top

Top shows a snapshot of the load each processor thinks it's under. Yushan dumps reports for each node every 15 seconds.

Here is the current summary of the report:

top.summary
05:05:16
00: 0.53, 0.39, 0.41
01: 0.04, 0.07, 0.06
02: 0.00, 0.02, 0.05
03: 0.00, 0.02, 0.05
04: 0.00, 0.01, 0.05
05: 0.07, 0.06, 0.06
06: 0.12, 0.08, 0.06

Swap
00: 1048572 total,       80 free,  1048492 used.  3292752 avail Mem
01: 1048572 total,   999500 free,    49072 used. 14572460 avail Mem
02: 1048572 total,   996588 free,    51984 used. 14570252 avail Mem
03: 1048572 total,   994360 free,    54212 used. 14568972 avail Mem
04: 1048572 total,   997716 free,    50856 used. 14564092 avail Mem
05: 1048572 total,   989172 free,    59400 used. 14561432 avail Mem
06: 1048572 total,  1007348 free,    41224 used. 14550484 avail Mem

## df

df shows whether our disks are full. All of these numbers should be <90%, or else feel free to tell us about it.

df.out
Filesystem      1K-blocks       Used  Available Use% Mounted on
/dev/sda1        20511312    3398616   16047736  18% /
devtmpfs          8120224          0    8120224   0% /dev
tmpfs             8131412          0    8131412   0% /dev/shm
tmpfs             8131412          0    8131412   0% /sys/fs/cgroup
tmpfs             8131412     827872    7303540  11% /run
n0:/1          4227416064 3126241280 1101158400  74% /1
tmpfs             1626284          0    1626284   0% /run/user/10032

Simply clicking your browser's reload button may not update the snapshots above, because the wiki caches its pages.

## Machinery

How we make these files

Makefile
df.out: /proc/uptime
df > $@ top.report: /bin/ln -s /home/etc/status/$@

top.summary: top.report top.pl
$(PUSH) ### Offline We generate top.report from yushan, because that's the machine that's supposed to poll all the nodes, but it's not running the working wiki. We have a directory /home/dushoff/bin/cron/status that takes files from here. Root uses cron to "make repeat" in that directory once every minute. root.sh #!/bin/sh top -bn1 | head; /usr/local/bin/pdsh -a top -bn1 Makefile.offline update: getprojectfile --filename=root.sh --wikiname=projects --project=Yushan_status getprojectfile --filename=Makefile.offline --wikiname=projects --project=Yushan_status outfile=Makefile run: ./root.sh > /home/etc/status/top.report chmod -R a+rX /home/etc/ repeat:$(MAKE) run; sleep 13
$(MAKE) run; sleep 13$(MAKE) run; sleep 13
$(MAKE) run ### Summary This is the perl script that makes the summary top.pl use strict; use 5.10.0; my %stats; my$time;

while (<>){
chomp;
my $node = "00";$node = $1 if s/yushan-n(..)://; unless (defined$time){
$time =$_;
$time =~ s/[^-]*-\s*//;$time =~ s/up.*//;
}

$stats{$node}->{load} = $_; } if (s/.*Swap:\s*//){$stats{$node}->{swap} =$_;
say "$time"; say "Load averages"; foreach (sort keys %stats){ say "$_: $stats{$_}->{load}"
say "$_:$stats{\$_}->{swap}"
}