Stata

ARC Stata User Guide

Stata is a statistical software package used for data analysis, visualization, and automation. We offer both Stata/SE (Standard Edition) and Stata/MP (Multiprocessor Edition) on the HPC cluster. Stata/MP allows for parallel processing and is recommended for computationally intensive tasks.

Licensing Notice

Stata is a licensed software, and an appropriate license must be requested before launching jobs that use Stata. Ensure that you have obtained the necessary license before running your job.

Available licenses include:

stata-mp@slurmdb
stata-se@slurmdb

When launching jobs that utilize Stata/MP, then you must request a Stata/MP license in your sbatch script with the flag #SBATCH --license=stata-mp@slurmdb:1

Using Stata

Stata is available on the cluster as a software module. To enable it, load it as follows (replacing {} with se or mp as required):

module load stata-{se,mp}

This module will set up the necessary environment variables and make the Stata executables available in your path.

Submitting a Stata/MP Job

Stata jobs must be submitted via Slurm. Below is an example batch script to run a Stata/MP job from a compute node:

#!/bin/bash
#SBATCH --job-name=StataJob
#SBATCH [email protected]
#SBATCH --mail-type=BEGIN,END
#SBATCH --cpus-per-task=4
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=8000m
#SBATCH --time=02:00:00
#SBATCH --account=example
#SBATCH --partition=standard
#SBATCH --output=/home/%u/%x-%j.log
#SBATCH --license=stata-mp@slurmdb:1

module load stata-mp
stata-mp -b do myscript.do

Choosing Between Stata/SE and Stata/MP

  • Use stata-se for single-threaded execution:

    stata-se -b do myscript.do
  • Use stata-mp for multi-threaded execution (recommended for large datasets):

    stata-mp -b do myscript.do

    You can control the number of CPU threads used by Stata/MP using the following command within your Stata script:

    set processors 4

    Adjust the number based on the allocated CPUs in your Slurm script.

Example Stata Script

Below is an example myscript.do file:

clear all
set more off
use mydata.dta
regress y x1 x2 x3
outreg2 using results.xlsx, replace
exit

This script loads a dataset, runs a regression, and saves the results to an Excel file.

For more details on Stata commands, refer to the Stata Documentation.

Version Control

The ARC software team will routinely apply patch updates to both Stata/SE and Stata/MP as needed. These updates commonly modify ado files and documentation files, while updates to the Stata executable occur less frequently. JVM updates are the least common. When a bug is discovered, all affected files will be updated as part of the fix. Updates do not change the major version number (e.g., Stata/SE 17 does not become Stata 18). Instead, minor updates may increment the version (e.g., Stata/IC 16.1). Details on fixes are available in the "Whatsnew" documentation:

  • Command: help whatsnew
  • Online: Stata Whatsnew

To ensure updates do not disrupt existing workflows, users should use the version command in their do-files to specify the desired Stata behavior, such as:

stata
version 18.0

If an issue arises post-update, there is not a way to revert a change in behavior after an update but the old behavior will be preserved under the -version- command.

Last Updated: 
Thursday, February 20, 2025