Translation Syntax

Loading the Dataset

SPSS
GET FILE="/path/to/file.sav".
STATA
use "/path/to/file.dta"
SAS
LIBNAME myFolder "P:\QAC\QAC201\study name";
data new; set myFolder.filename;
R
load ("/path/to/file.Rdata")
myData_orig <- objectName
# If calling in from a text file:
myData_orig <- read.table(file = "/path/to/file.txt", sep = "\t", header = TRUE)

Selecting Variables

SPSS
* put this as a subordinate of the SAVE OUTFILE command; the outfile will only contain that specified variables.
/KEEP VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8.
STATA
use VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 ///
using "P:\QAC\qac201\Studies\study name\filename", clear
SAS
* put this code inside a data step;
KEEP VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8;
R
var.keep <- c("VAR1", "VAR2", "VAR3", "VAR4", "VAR5", "VAR6", "VAR7", "VAR8")
myData <- myData_orig[ ,var.keep]

Saving the Dataset

SPSS
SAVE OUTFILE= "/path/to/outfile.sav".
STATA
save "/path/to/outfile.dta"
SAS
LIBNAME myFolder "/path/to/outFolder";
data myFolder.outfile; set tempfile; by unique_id;
R
# write as csv
write.table(myData, file = "/path/to/outfile.csv", sep = ",", row.names = FALSE)
# write as .RData
save(file = "/path/to/outfile.RData", myData)

Sorting the Data

SPSS
SORT CASES BY var.
STATA
sort var
SAS
proc sort; by var;
R
myData <- myData[order(myData$var, decreasing = FALSE), ]

Data Management
Logical Operators

SPSS EQ or = >= or GE <= or LE > or GT < or LT NE
STATA == >= <= > < !=
SAS EQ or = >= or GE <= or LE > or GT < or LT != or NE
R == >= <= > < !=

Selecting Observations
When using large data sets, it is often necessary to subset the data so that you are including only those observations that can assist in answering your particular research question. In these cases, you may want to select your own sample from within the survey’s sampling frame. For example, if you are interested in identifying demographic predictors of depression among Type II diabetes patients, you would plan to subset the data to subjects endorsing Type II Diabetes.

SPSS
*must be added as a command option.
/SELECT=diabetes2 EQ 1
STATA
// create a subset from the data
if diabetes2==1
// if running a procedure on a subset of the data (format: procedure [arguments] if [condition]). for example,
reg height weight if diabetes2==1
SAS
* inside the data step;
if diabetes2=1;
R
# create a subset of the data
myDataSubset <- myData[myData$diabetes2 == 1, ]

Missing Data
Often, you must define the response categories that represent missing data. For example, if the number 9 is used to represent a missing value, you must either designate in your program that this value represents missingness or else you must recode the variable into a missing data character that your statistical software recognizes. If you do not, the 9 will be treated as a real/meaningful value and will be included in each of your analyses.

SPSS
RECODE VAR1 (9=SYSMIS).
STATA
replace VAR1=. if VAR1==9
SAS
* inside the data step;
if VAR1=9 then VAR1=.;
R
myData$VAR1[myData$VAR1 == 9 ] <- NA

Converting String to Numeric Variable
It is important when preparing to run statistical analyses in most software packages, that all variables have response categories that are numeric rather than “string” or “character” (i.e. response categories are actual strings of characters and/or symbols). All variables with string responses must therefore be recoded into numeric values. These numeric values are known as dummy codes in that they carry no direct numeric meaning.

SPSS
RECODE TREE ('Maple'=1) ('Oak'=2) INTO TREE_N.
STATA
generate TREE_N=.
replace TREE_N=1 if TREE=="Maple"
replace TREE_N=2 if TREE=="Oak"
// OR
encode TREE, gen(TREE_N)
SAS
* inside the data step;
if TREE='Maple' then TREE_N=1;
else if TREE= 'Oak' then TREE_N=2;
R
# Usually not necessary in R.

Collapsing Responses within a Variable
If a variable has many response categories, it can be difficult to interpret the statistical analyses in which it is used. Alternatively, there may be too few subjects or observations identified by one or more response categories to allow for a successful analysis. In these cases, you would need to collapse across categories. For example, if you have the following categories for geographic region, you may want to collapse some of these categories:
Region: New England=1, Middle Atlantic=2, East North Central=3, West North Central=4, South Atlantic=5, East South Central=6, West South Central=7, Mountain=8, Pacific=9.
New_Region: East=1, West=2.

SPSS
COMPUTE new_region=2.
IF (region=1|region=2|region=3|region=5|region=6) new_region=1.
STATA
generate new_region =2
replace new_region=1 if region==1|region==2|region==3|region==5|region==6
// OR
recode region (1/3 5 6=2), gen(new_region)
SAS
* inside the data step;
if region=1 or region=2 or region=3 or region=5 or region=6 then new_region=1;
else if region=4 or region=7 or region=8 or region=9 then new_region=2;
R
condition1 <- myData$region == 1|myData$region == 2|myData$region == 3|myData$region == 5|myData$region == 6
myData$new_region[condition] <- 1
condition2 <- myData$region == 4|myData$region == 7|myData$region == 8|myData$region == 9
myData$new_region[condition2] <-2

Collapsing Responses Across Variables
In many cases, you will want to combine multiple variables into one. For example, while NESARC assesses several individual anxiety disorders, I may be interested in anxiety more generally. In this case I would create a general anxiety variable in which those individuals who received a diagnosis of social phobia, generalized anxiety disorder, specific phobia, panic disorder, agoraphobia, or obsessive compulsive disorder would be coded “yes” and those who were free from all of these diagnoses would be coded “no”.

SPSS
IF (socphob=1|gad=1|specphob=1|panic=1|agora=1|ocd=1) anxiety=1.
RECODE anxiety (SYSMIS=0).
STATA
gen anxiety=1 if socphob==1|gad==1|specphob==1|panic==1|agora==1|ocd==1
replace anxiety=0 if anxiety==.
SAS
* inside the data step;
if socphob=1 or gad=1 or specphob=1 or panic=1 or agora=1 or ocd=1 then anxiety=1;
else anxiety=0;
R
myData$anxiety <- NA
myData$anxiety[myData$socphob == 0&myData$gad==0&myData$panic == 0&myData$agora==0&myData$ocd == 0] <- 0
myData$anxiety[myData$socphob == 1|myData$gad==1|myData$panic == 1|myData$agora==1|myData$ocd == 1] <- 1

Creating Index or Score
If you are working with a number of items that represent a single construct, it may be useful to create a composite variable/score. For example, I want to use a list of nicotine dependence symptoms meant to address the presence or absence of nicotine dependence (e.g. tolerance, withdrawal, craving, etc.). Rather than using a dichotomous variable (i.e. nicotine dependence present/absent), I want to examine the construct as a dimensional scale (i.e. number of nicotine dependence symptoms). In this case, I would want to recode each symptom variable so that yes=1 and no=0 and then sum the items so that they represent one composite score.

SPSS
COMPUTE nd_sum=sum(nd_symptom1 nd_symptom2 nd_symptom3 nd_symptom4).
STATA
egen nd_sum=rsum(nd_symptom1 nd_symptom2 nd_symptom3 nd_symptom4)
SAS
* inside the data step;
nd_sum=sum(of nd_symptom1 nd_symptom2 nd_symptom3 nd_symptom4);
R
myData$nd_sum <- myData$nd_symptom1+myData$nd_symptom2+myData$nd_symptom3+myData$nd_symptom4

Labeling Variables
Given the often cryptic names that variables are given, it can sometimes be useful to label them.

SPSS
VARIABLE LABELS VAR1 'label'.
STATA
label variable VAR1 "label"
SAS
* inside the data step;
LABEL VAR1='label';
R
# no built-in label tags for variables

Renaming Variables
Given the often cryptic names that variables are given, it can sometimes be useful to give a variable a new name (something that is easier for you to remember or recognize).

SPSS
* no actual rename function, this will create a copy of the variable with the desired name.
COMPUTE newvarname=VAR1.
STATA
rename VAR1 newvarname
SAS
* inside the data step;
RENAME VAR1=newvarname;
R
names(myData)[names(myData)== "VAR1"] <- "newvarname"

Labeling Variable Responses/Values
Given that nominal and ordinal variables have, or are given numeric response values (i.e. dummy codes), it can be useful to label those values so that the labels are displayed in your output.

SPSS
VALUE LABELS VAR1 0 'value0label' 1 'value1label' 2 'value2label' 3 'value3label'.
STATA
label define labelName 0 "value0label" 1 "value1label" 2 "value2label" 3 "value3label"
label values VAR1 labelName
SAS
* Set up format before the data step;
proc format; VALUE FORMATNAME 0="value0label" 1="value1label" 2="value2label" 3="value3label";
data myData; set myData;
* other data management procedures;
format VAR1 FORMATNAME.
run;
R
# get order of the values
levels(myData$VAR1)
# input the labels in the same order as how the values were printed above
levels(myData$VAR1) <- c("value0label", "value1label", "value2label", "value3label")

Univariate Analysis
Categorical Variables (frequency)

SPSS
FREQUENCIES VARIABLES=CategVar1 CategVar2 CategVar3
/ORDER=ANALYSIS.
STATA
tab1 CategVar1 CategVar2 CategVar3
SAS
proc freq; tables CategVar1 CategVar2 CategVar3;
R
library(descr) # install library if needed
freq(as.ordered(myData$CategVar1))
freq(as.ordered(myData$CategVar2))
freq(as.ordered(myData$CategVar3))

Categorical Variables (Plot)

SPSS
FREQUENCIES VARIABLES=CategVar1 CategVar2 CategVar3
/ORDER=ANALYSIS.
STATA
graph bar, over(CategVar)
SAS
proc gchart;
    VBAR CategVar/ Discrete type=PCT Width=30;
R
library(ggplot2)
ggplot(data=myData)+
    geom_bar(aes(x=CategVar))+
    ggtitle("Descriptive Title")

Quantitative Variables (mean, sd, etc)

SPSS
DESCRIPTIVES VARIABLES=QuantVar1 QuantVar2 QuantVar3
/STATISTICS=MEAN STDDEV.
STATA
summarize QuantVar1 QuantVar2 QuantVar3
SAS
proc means; var QuantVar1 QuantVar2 QuantVar3;
R
# Repeat for each variable
summary(myData$QuantVar1)
mean(myData$QuantVar1, na.rm = TRUE)
sd(myData$QuantVar1, na.rm = TRUE)

Quantitative Variables (Plot)

SPSS
DESCRIPTIVES VARIABLES=QuantVar1 QuantVar2 QuantVar3
/STATISTICS=MEAN STDDEV.
STATA
histogram QuantVar
SAS
proc GCHART; VBAR QuantVar;
R
ggplot(data=myData)+
    geom_histogram(aes(x=QuantVar))+
    ggtitle("Descriptive Title Here")

Bivariate Analysis
Categorical-Categorical (crosstabs)

SPSS

CROSSTABS
/TABLES=CategResponseVar by CategExplanatoryVar
/CELLS COUNT ROW COLUMN TOTAL.

STATA

tab CategResponseVar CategExplanatoryVar, column

SAS
* numbers;
proc freq; tables CategResponseVar*CategExplanatoryVar;

R
# numbers
tab1 <- table(myData$CategResponseVar, myData$CategExplanatoryVar)
tab1_colProp <- prop.table(tab1, 2) # column proportions
tab1_rowProp <- prop.table(tab1, 1) # row proportions
tab1_cellProp <- prop.table(tab1) # cell proportions

Categorical-Categorical (Plot)

SPSS
* visualization: use GUI point-and-click.
STATA

// visualization to show frequencies
ssc install catplot
catplot CategResponseVar CategExplanatoryVar

// visualization to show percents
ssc install catplot
catplot CategResponseVar CategExplanatoryVar, percent

SAS
proc GCHART; vbar CategExplanatoryVar / subgroup = CategResponseVar;
R
# Data Preparation Work for Graph
tab1 <- table(myData$CategExplanatoryVar, myData$CategResponseVar)
tab1_rowProp <- prop.table(tab1, 1) # row proportions

library(reshape2)
graph_data<-melt(data.frame(tab1_rowProp))
names(graph_data)<-c("ExplanatoryVarName","ResponseVarName", "Freq","Proportion")

#Visualization

ggplot(data=graph_data) +
    geom_bar(aes(x=CategExplanatoryVar, y=Proportion, fill=CategResponseVar),
        position="fill", stat="identity") +
    ylab("Proportion of Subjects at each Response Level within each group") +
    ggtitle("Informative Title Here")

Categorical-Quantitative (means by group)

SPSS
* numbers.
MEANS TABLES= CategExplanatoryVar by QuantResponseVar
/CELLS MEAN COUNT STDDEV.
STATA

bys CategExplanatoryVar: su QuantResponseVar

SAS
proc sort; by CategExplanatoryVar;
proc means; var QuantResponseVar;
by CategExplanatoryVar;


R
# Numerical Summaries
by(myData$QuantResponseVar, myData$CategExplanatoryVar, mean, na.rm = TRUE)
by(myData$QuantResponseVar, myData$CategExplanatoryVar, sd, na.rm = TRUE)
by(myData$QuantResponseVar, myData$CategExplanatoryVar, length)

Categorial-Quantitative (Plot)

SPSS
* visualization: use GUI point-and-click.
STATA
graph box QuantResponseVar, over(CategExplanatoryVar)
SAS
proc gchart; vbar CategExplanatoryVar /discrete type=mean sumvar=QuantResponseVar;


R

# Option 1: Bar plot
ggplot(data=myData)+
    stat_summary(aes(x=CategExplanatoryVar, y=QuantResponseVar),
        fun.y=mean, geom=”bar”)

# Option 2: Boxplot
ggplot(data=myData)+
    geom_boxplot(aes(x=CategExplanatoryVar, y=QuantResponseVar))+
    ggtitle(“Descriptive Title Here”)

Quantitative-Quantitative (plot)

SPSS
* visualization.
GRAPH
/scatterplot(bivar)=QuantExplanatoryVar with QuantResponseVar.
STATA
// visualization
twoway (scatter QuantResponseVar QuantExplanatoryVar) (lfit QuantResponseVar QuantExplanatoryVar)
SAS
* visualization;
proc gplot; plot QuantResponseVar*QuantExplanatoryVar;
R
ggplot(data=myData)+
    geom_point(aes(x=QuantExplanatoryVar, y=QuantResponseVar))+
    geom_smooth(aes(x=QuantExplanatoryVar, y=QuantResponseVar),
method=”lm”)

Multivariate (bivariate, by subpopulation (third variable – categorical))
Categorical-Categorical (crosstabs) by Third Variable

SPSS
CROSSTABS
/TABLES=CategResponseVar BY CategExplanatoryVar BY CategThirdVar.
STATA
bys CategThirdVar: tab CategResponseVar CategExplanatoryVar, column
SAS
proc sort; by CategThirdVar;
proc freq; tables CategResponseVar*CategExplanatoryVar;
by CategThirdVar;

R
# numbers
tab1 <- ftable(myData$CategResponseVar, myData$CategExplanatoryVar, myData$CategThirdVar)
tab1_rowProp <- prop.table(tab1, 1)
tab1_colProp <- prop.table(tab1, 2)
tab1_cellProp <- prop.table(tab1)

Categorical-Categorical (Plot) by Third Variable

SPSS
* visualization: use GUI point-and-click.
STATA
// visualization to show percents
ssc install catplot
catplot CategResponseVar CategExplanatoryVar, percent over(CategThirdVar)
SAS
proc gchart; vbar CategExplanatoryVar /discrete type=mean sumvar=CategResponseVar;
by CategThirdVar;
R
# Data Preparation for Visualization
tab1 <- ftable(myData$CategExplanatoryVar, myData$CategThirdVar, myData$CategResponseVar)
tab1_rowProp <- prop.table(tab1, 1)
library(reshape)
graph_data<-melt(data.frame(tab1_rowProp))
names(graph_data)<-c("ExplanatoryVarName",
                "ThirdVarName","ResponseVarName","Freq","Proportion")

# Plot
ggplot(data=graph_data) +
    geom_bar(aes(x=CategExplanatoryVar, y=Proportion, fill=CategResponseVar),
    position="fill", stat="identity") +
    facet_wrap( ~CategThirdVar) +
    ylab("Proportion of Subjects at each Response Level within each group") +
    ggtitle("Informative Title Here")

Categorical-Quantitative (means by group) by Third Variable

SPSS
MEANS TABLES= QuantResponseVar BY CategExplanatoryVar BY CategThirdVar
/CELLS MEAN COUNT STDDEV.
STATA
bys CategExplanatoryVar CategThirdVar: su QuantResponseVar
SAS
proc sort; by CategExplanatoryVar CategThirdVar;
proc means; var QuantResponseVar;
by CategExplanatoryVar CategThirdVar;
R
ftable(by(myData$QuantResponseVar, list(myData$CategExplanatoryVar,
        myData$CategThirdVar), mean, na.rm = TRUE))

Categorical-Quantitative (Plot) by Third Variable

SPSS
* visualization: use GUI point-and-click.
STATA
graph box QuantResponseVar, over(CategExplanatoryVar) over(CategThirdVar)
SAS
proc sort; by CategExplanatoryVar CategThirdVar;
proc sgplot;
    vbox QuantResponseVar / category=CategExplanatoryVar group=CategThirdVar;
    xaxis label="Description of Category Variable";
    keylegend / title="Description of Group Variable"; run;
R
ggplot(data=myData)+
    geom_boxplot(aes(x=ExplanatoryVar, y=QuantResponseVar))+
    facet_grid(.~CategThirdVar)+
    ggtitle("Descriptive Title Here")

Quantitative-Quantitative (Plot) by Third Variable

SPSS
* numbers.
SORT CASES BY region.
SPLIT FILE LAYERED BY region.
CORRELATIONS
/VARIABLES=id age
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
SPLIT FILE OFF.
* visualization.
SORT CASES BY region.
SPLIT FILE LAYERED BY region.
GRAPH
/SCATTERPLOT(BIVAR)=id WITH exp
/MISSING=LISTWISE.
SPLIT FILE OFF.
STATA
// visualization
twoway (scatter QuantResponseVar QuantExplanatoryVar) (lfit QuantResponseVar QuantExplanatoryVar), by(CategThirdVar)
SAS
* visualization;
proc sort; by CategThirdVar;
proc sgscatter; plot QuantResponseVar*QuantExplanatoryVar/
group=CategThirdVar reg=(clm degree=1) grid; run;
R
ggplot(data=myData)+
    geom_point(aes(x=QuantExplanatoryVar, y=QuantResponseVar))+
    geom_smooth(aes(x=QuantExplanatoryVar, y=QuantResponseVar), method=”lm”)+
    facet_grid(. ~ CategThirdVar)

Hypothesis Testing
Categorical-Categorical (chi-square)

SPSS
CROSSTABS
/TABLES= CategResponseVar by CategExplanatoryVar
/STATISTICS=CHISQ.
STATA
tab CategResponseVar CategExplanatoryVar, chi2 row col
// Post-hoc test of which explanatory levels vary (suppose we are interested in comparing level 2 and3 below):
tab CategResponseVar CategExplanatoryVar if CategExplanatoryVar==1|CategExplanatory==3, chi2
//or check Pearson residuals
ssc install tab_chi
tabchi CategResponseVar CategExplanatoryVar, pearson
SAS
proc freq; tables CategResponseVar*CategExplanatoryVar/ chisq;
R
myChi <- chisq.test(myData$CategResponseVar, myData$CategExplanatoryVar)
myChi
myChi$observed # for actual, observed cell counts
prop.table(myChi$observed, 2) # for column percentages
prop.table(myChi$observed, 1) # for row percentages
## Post-hoc test of which explanatory levels vary.
library(fifer)
myChi<-chisq.test(myData$CategResponseVar, myData$CategExplantoryVar)
Observed_table<-myChi$observed
chisq.post.hoc(observed_table, popsInRows=FALSE, control="bonferroni")
## Or check Pearson Residuals
myChi$residuals

Quantitative-Categorial (anova)

SPSS
UNIANOVA QuantResponseVar BY CategExplanatoryVar.
* for post-hoc test add the following options to the UNIANOVA command.
UNIANOVA QuantResponseVar BY CategExplanatoryVar.
/POSTHOC=CategExplanatoryVar (TUKEY)
/PRINT=ETASQ DESCRIPTIVE.
STATA
oneway QuantResponseVar CategExplanatoryVar, tabulate
// for post-hoc test add the `sidak` option to oneway command
oneway QuantResponseVar CategExplanatoryVar, tabulate sidak
SAS
proc anova; class CategExplanatoryVar;
model QuantResponseVar = CategExplanatoryVar; means CategExplanatoryVar;
* for post-hoc test add the `duncan` option to proc anova command;
proc anova; class CategExplanatoryVar;
model QuantResponseVar = CategExplanatoryVar; means CategExplanatoryVar /duncan;
R
myAnovaResults <- aov(QuantResponseVar ~ CategExplanatoryVar, data = myData)
summary(myAnovaResults)
# for post-hoc test
myAnovaResults <- aov(QuantResponseVar ~ CategExplanatoryVar, data = myData)
TukeyHSD(myAnovaResults)

Quantitative-Quantitative (pearson correlation)

SPSS
CORRELATIONS
/VARIABLES= QuantResponseVar QuantExplanatoryVar
/STATISTICS DESCRIPTIVES.
STATA
corr QuantResponseVar QuantExplanatoryVar
//OR
pwcorr QuantResponseVar QuantExplanatoryVar, sig
SAS
proc corr; var QuantResponseVar QuantExplanatoryVar;
R
cor.test(myData$QuantResponseVar, myData$QuantExplanatoryVar)

Moderation by a third variable
Categorical-Categorical (chi-square)

SPSS
CROSSTABS
/TABLES = CategResponseVar by CategExplanatoryVar by CategThirdVar
/CELLS = COUNT ROW
/STATISTICS = CHISQ.
STATA
bys CategThirdVar: tab CategResponseVar CategExplanatoryVar, chi2 row
SAS
proc sort; by CategThirdVar;
proc freq; tables CategResponseVar*CategExplanatoryVar/chisq;
by CategThirdVar;
R
by(myData,
myData$CategThirdVar,
function(x) list( chisq.test(x$CategResponseVar, x$CategExplanatoryVar), chisq.test(x$CategResponseVar, x$CategExplanatoryVar)$observed, prop.table(chisq.test(x$CategResponseVar, x$CategExplanatoryVar)$observed, 2)))

Quantitative-Categorial (anova)
Note: the following code snippets have the post-hoc options built-in

SPSS
SORT CASES BY CategThirdVar.
SPLIT FILE LAYERED BY CategThirdVar.
ONEWAY QuantResponseVar BY CategExplanatoryVar
/STATISTICS DESCRIPTIVES
/POSTHOC = BONFERRONI ALPHA (0.05).
SPLIT FILE OFF.
STATA
bys CategThirdVar: oneway QuantResponseVar CategExplanatoryVar, tab sidak
SAS
proc sort; by CategThirdVar;
proc anova; class CategExplanatoryVar;
model QuantResponseVar=CategExplanatoryVar;
means CategExplanatoryVar;
by CategThirdVar /duncan;
R
by(myData,
myData$CategThirdVar,
function(x) list(aov(QuantResponseVar ~ CategExplanatoryVar, data = x), summary(aov( QuantResponseVar ~ CategExplanatoryVar, data = x))))

Quantitative-Quantitative (pearson correlation)

SPSS
SORT CASES BY CategThirdVar.
SPLIT FILE LAYERED BY CategThirdVar.
CORRELATIONS
/VARIABLES= QuantResponseVar QuantExplanatoryVar
/STATISTICS DESCRIPTIVES.
SPLIT FILE OFF.
STATA
bys CategThirdVar: corr QuantResponseVar QuantExplanatoryVar
//OR
bys CategThirdVar: pwcorr QuantResponseVar QuantExplanatoryVar, sig
SAS
proc sort; by CategThirdVar;
proc anova; class CategExplanatoryVar;
model QuantResponseVar=CategExplanatoryVar;
means CategExplanatoryVar;
by CategThirdVar;

/*note, this will give you a separate ANOVA for each level of CategThirdVar.. If you want post hoc tests, it would be*/

proc anova; class CategExplanatoryVar;
model QuantResponseVar=CategExplanatoryVar;
means CategExplanatoryVar/tukey;

R
by(myData,
myData$CategThirdVar,
function(x) cor.test(x$QuantResponseVar, x$QuantExplanatoryVar))

Regression
Simple

SPSS
* note if explanatory var is categorical, make sure that the variable is type `nominal`.
REGRESSION
/DEPENDENT QuantResponseVar
/METHOD ENTER ExplanatoryVar.
STATA
//if explanatory var is quantitative
reg QuantResponseVar QuantExplanatoryVar
//if explanatory var is categorical
reg QuantResponseVar i.CategExplanatoryVar
SAS
* if explanatory var is quantitative;
proc glm;
model QuantResponseVar=QuantExplanatoryVar /solution;
* if explanatory var is categorical;
proc glm; class CategExplanatoryVar;
model QuantResponseVar=CategExplanatoryVar /solution;
R
# if explanatory var is quantitative
my.lm <- lm(QuantResponseVar ~ QuantExplanatoryVar, data = myData)
summary(my.lm)
# if explanatory var is categorical
my.lm <- lm(QuantResponseVar ~ factor(CategExplanatoryVar), data = myData)
summary(my.lm)

Logistic

SPSS
* note if explanatory var is categorical, make sure that the variable is type `nominal`.
LOGISTIC REGRESSION BinaryResponseVar with ExplanatoryVar ThirdVar1 ThirdVar2.
STATA
// for all categorical predictors, add `i.` before the variabe name (e.g. i.race)
logistic BinaryResponseVar ExplanatoryVar ThirdVar1 ThirdVar2
SAS
* list all categorical variables in the model under the class subcommand (e.g. CategThirdVar);
proc logistic;
class BinaryResponseVar(ref="referenceGroup") CategThirdVar;
model BinaryResponseVar = ExplanatoryVar CategThirdVar QuantThirdVar;
R
# if categorical variable is encoded as numeric, wrap it around with the factor() function (e.g. factor(ExplanatoryVar3) )
my.logreg <- glm(BinaryResponseVar ~ ExplanatoryVar1 + ExplanatoryVar2 +
                ExplanatoryVar3, data = myData, family = "binomial")
summary(my.logreg) # for p-values
exp(my.logreg$coefficients) # for odds ratios
exp(confint(my.logreg)) # for confidence intervals on the odds ratios

Multiple regression

SPSS
* note if explanatory var is categorical, make sure that the variable is type `nominal`.
REGRESSION
/DEPENDENT QuantResponseVar
/METHOD ENTER ExplanatoryVar ExtraVar1 ExtraVar2.
STATA
//if a predictor var is categorical, add `i.`.
reg QuantResponseVar i.CategExplanatoryVar i.CategExtraVar1 QuantExtraVar2
//to incorporate a moderator (statistical interaction term) in your model add `#` between the two terms
// add `i.` for categorical terms in the interaction and `c.` for quantitative terms in the interaction.
reg QuantResponseVar i.CategExplanatoryVar i.CategExtraVar1 QuantExtraVar2 i.CategExplanatoryVar#c.QuantExtraVar2
SAS
* if a predictor var is categorical, add to `class`;
proc glm;
class CategExplanatoryVar;
model QuantResponseVar=CategExplanatoryVar ExtraVar1 /solution;
R
# if a predictor var is categorical, wrap the var with factor() (e.g. factor(ExplanatoryVar3) )
my.lm <- lm(QuantResponseVar ~ ExplanatoryVar1 + ExplanatoryVar2 +
                ExplanatoryVar3, data = myData)
summary(my.lm)

# to incorporate a statistical interaction between two of your explanatory variables
my.lm <- lm(QuantResponseVar ~ ExplanatoryVar1 + ExplanatoryVar2 +
                ExplanatoryVar1*ExplanatoryVar2, data = myData)
summary(my.lm)