Overview

This wiki will soon be retired. For up to date lab documentation please visit: https://pages.github.berkeley.edu/demography/computing-lab

The PopSci/Demography Lab

The PopSci/Demography Lab is available for use both on site and remotely by all PopSci affiliates and their research personnel, graduate and undergraduate assistants. At the core of the PopSci/Demography lab are several powerful Unix servers with sufficient power and data storage to support projects with very large data sets, intensive computing demands, or large memory needs. These servers include three high-RAM machines (up to 1TB), used for projects requiring in-core retention of entire datasets, two compute servers with 30-70 cpu cores, and a variety of smaller virtual machines used for specific services. All servers and workstations have access to shared network disk storage with local and cloud backup. Users access the system remotely through SSH shell, via NoMachine’s NX service (an enhanced X11 based remote desktop application), or through a browser (Rstudio). The systems present the user with an identical desktop and a unified file system.

The NX remote desktop can be accessed from any user's personal computer after installing a small free program (OSX, Windows, Linux, Android, iOS). It has been used successfully by scholars logging in from Europe, Asia and Latin America as well as those on the other side of campus. Thus users can easily run their projects on the appropriate platform. Demanding projects can run in a parallel processing environment while smaller projects, spread sheets and word processing can be done on the smaller servers or on personal laptops or workstations. This configuration minimizes maintenance costs and maximizes peak load capacity. An Rstudio server and a Shiny server run on the network allowing efficient development and deployment of R-based applications.

All servers and workstations run a recent Ubuntu release, patched continuously for security issues. This allows for nearly seamless interaction between workstations and the servers. From the user’s standpoint, the entire network consists of a single unified file system of over 50TB, and the X11 windowing system allows server resident applications to be controlled from a workstation as if they were running locally. Several aspects of the Centers’ computing center include:

  • The lab’s design stresses shareability of resources. Concentrating computing power in a small number of servers maximizes the peak load capacity. The unified file system allows computing tasks to be performed on the most appropriate platform: file editing, email, graphics and small statistical jobs can be done on workstations, while large jobs can be shifted to the server without need to change any code. The lab's design makes it easy to use one’s personal computer for editing while using the server for data analysis. Cloud storage/sync programs such as Dropbox allow users to read statistical results from Stata or R directly into spreadsheet programs running on their laptop computers
  • The lab is reliably available, with servers colocated now in the campus data center at Warren Hall. Since the technical staff are also users, things that do not work are noticed and fixed quickly.
  • Users are well trained. All graduate students in Demography take a one semester course which covers basic use of the PopSci/Demography Lab. In addition, a body of documentation explaining how to do most useful things is maintained on the web. Non-proprietary software introduced in D-Lab instruction to graduate students is available also within the Computing Lab.
  • All data and software are accessible remotely. Researchers working off site may use the system in almost the same manner as if they were working locally. Confidential Data: As genomic and other personally identifiable new data sources become available, security and confidentiality has become both more important and more burdensome for researchers. The lab does provide a rare combination of flexibility and computing power that allows us to meet the security requirements of agencies such as dbGaP, while allowing researchers to employ significant computing power with considerable convenience. The features of the lab that allow us to meet these requirements follow from the private network agreement that we negotiated with Berkeley's central computing authorities (IST). Under this agreement, the Computing Lab directly controls:
    1. a secure locked rack with our servers, in the secure campus data center.
    2. our network switching equipment and
    3. our firewall.

This arrangement allows us to control physical access to our computing equipment; to isolate internal parts of our network and selectively block external connections. The ability to perform these functions without securing the bureaucratic clearance from other parts of campus gives us both speed and flexibility in meeting data security requirements. That we are able to do so while still allowing researchers to use powerful servers without necessarily being present on campus has proved very appealing.

The Director of Computing, Joshua Quan, is available for consultation regarding both technical computing issues as well as substantive demographic and statistical issues.