Resources for Programming and manipulating financial datasets
This page provides program routines (mostly in SAS) that I have used, however it is neither a primer on SAS programming nor is it a manual. Rather, it provides specific links to set of tools that could help in manipulating or computing/creating datasets that are used in finance and accounting research. I provide the links from publicly available sources and also upload some of the programs that I might have written from scratch or edited from an existing program. I cannot guarantee the veracity of such programs. However, I make sure to test the programs or the datasets before I update here. I hope these resources would be helpful . Should you find this page useful or have any comments please feel free to email me
Difference in shares outstanding between CRSP and COMPUSTAT.
CRSP and COMPUSTAT are the most commonly used databases by a financial researcher. There is often confusion between variables under either of these dtatabases
This article provides a detailed description of calculating market value of equity with CRSP and Compustat database.
Fixed Effects regression
Everyday life of a researcher in corporate finance and asset pricing involves estimating fixed effects regressions on panel datasets. Here is a great resource by Gormley and Masta (RFS, 2014)on how to estimate the fixed effects regressions. They also provide SAS/STATA commands and a detailed review (PhD lecture slides). See also this interesting article.
Wondering how to interpret the results from 2SLS/3SLS , IV regressions ?
Here is a nice article by STATA corp, explaining how to overcome the missing/negative R
Probability of Informed Trading: Estimating PIN
Prof. Stephen Brown's website provides a detailed description along with PIN measures and SAS programs to compute PIN using TAQ data. To know more about the basic PIN (by Easely et al. 1996) and PINs computed using the Venter and DeJong model (Venter, J.H., de Jongh, D., 2006. Extending the ekop model to estimate the probability of informed trading. Studies in Economics and Econometrics 30, 25-39)
Computing number of business segments using COMPUSTAT
Ever wondered how to compute the number of Business segments of North American firms as in Fich and Shivdasani (2006) ? Here is a great video by WRDS, in addition this SAS macro will help you to compute the number of business segments a firm has on an annual basis. For convenience I have copied the SAS macro from WRDS.US repository.
/*
Macro: nSegments.sas
Created by: Joost Impink
Date: March 2014
Macro to count the number of segments (either industry segments or geo segments)
Variables required:
dsin dataset in (needs to have gkvey, fyear and datadate)
dsout dataset out
type 'ind' (default) or 'geo'
unique 'true' (default) or 'false', count unique segments only
(segments with the same industry count as 1 segment)
only relevant for industry segments
sicsreq 'true' (default) or 'false'; if true: require industry sic code (sics)
no be nonempty
var name of variable to create
segmerged location (libname + dataset name) of wrds_seg_merged
e.g. segments.wrds_segmerged
Dependencies:
- the macro requires local access to wrds_segmerged (Compustat) in a local libray
- %runquit macro
%macro runquit;
; run; quit;
%if &syserr. ne 0 %then %do;
%abort cancel ;
%end;
%mend runquit;
*/
%macro
nSegments(dsin=, dsout=, type=
"ind"
, unique=
"true"
, sicsreq=
"true"
, var=, segmerged=);
/* Vars needed from dsin */
data
seg1 (
keep
= gvkey fyear datadate);
set
&dsin
;
%runquit
;
/* Select segments segments */
proc
sql;
create table seg2 as
select
a.gvkey, a.fyear, b.SICS1 as sics
from
seg1 a,
segments.wrds_segmerged b
where
a.gvkey = b.gvkey
/* use segment info of year of 10-K (not later years) */
and a.datadate = b.datadate
and a.datadate = b.srcdate
/* operating or business segments */
%if
&type
eq
"ind"
%then
%do
;
and b.stype
IN
(
"BUSSEG"
,
"OPSEG"
)
%end
;
/* geo segments */
%else
%do
;
and b.stype =
"GEOSEG"
%end
;
/* drop segments with missing sics1 ? */
%if
&sicsreq
eq
"true"
%then
%do
;
and b.SICS1 ne
""
%end
;
and b.SALES ne .
and b.SALES > 0
;
%runquit
;
/* Unique industries? (only relevant for type "ind")*/
%if
&type
eq
"ind"
and
&unique
eq
"true"
%then
%do
;
/* Drop segments with same industry */
proc
sort
data
=seg2 nodupkey;
by
gvkey fyear sics;
%runquit
;
%end
;
/* Count #segments */
proc
sql;
create table seg3 as
select distinct gvkey, fyear, count(*) as numSegs from seg2
group
by
gvkey, fyear
;
%runquit
;
/* Create output dataset */
proc
sql;
create table
&dsout
as
select a.*, b.numSegs as
&var
from
&dsin
a
LEFT
JOIN
seg3 b
ON
a.gvkey = b.gvkey
and a.fyear = b.fyear;
%runquit
;
/* Cleanup */
proc
datasets library=work;
delete
seg1 - seg3 ;
%runquit
;
%mend
;
Computing firm and industry level market to book ratio using Compustat
Market to book ratio is commonly applied in accounting and finance research. This article by WRDS provides a detailed guide and a SAS program to compute annual market to book ratio for US firms.