How to make open-ended do loops in Stata - FreeEconHelp.com, Learning Economics... Solved!

11/2/11

How to make open-ended do loops in Stata

This code goes over how to create an open-ended do loop in Stata 11, but it should work in most versions of Stata.  The dataset for this sample code is hourly observations of auctions.  The data set includes information on offer price, seller name and quantity of the item offered.

The same auction will show up each hour until it is sold or the auction expires at 48 hours. I am attempting to identify the length of a given group (batch) of auctions, i.e. four seperate auctions for 20 bags of sugar at $20 offered by joe.

The problem: the same person may post a new auction with the same offer price and quantity as the original batch of auctions expire. This means I must identify those batches that show up for more than 48 hours.  The sample code below does this. There is always room for improvement and this code could be made more efficient, but it worked for me and hopefully it will help you. 

NOTE:  Commands are delineated using ; 
Comments begin with * and end with ;


*this line changes two float variables into string variables: buy and q;
tostring buyamount quantity, gen(buy q);
*this line combines the new string variable plus one other into a new string;
gen batch1 = seller+buy+q;
*the next two lines drop the original string variables to keep things clean.;
drop buy;
drop q;
* this sorts the data by time and batch, keeps only one observation for each batch in each time period.;
bysort t batch1: drop if _n>1;
* here the ordering is changed to look at the time periods by batch;
sort batch1 t;
* this generates a variable n, listing t in hours n=1 means 01jan1960 01:00:00 n=2 means 01jan1960 02:00:00;
gen n = hours(t);
* this generates a variable, indicating the maximum value of n for each batch;
bysort batch1: gen T = n[_N];
* this generates a variable  counting the difference b/t the current time period and the maximum value of n. i.e. 12 hours, 11 hours, 10 hours, 9 hours, ......;
gen z = T-n;
*this just creates a duplicate variable to adjust.;
gen batch2 = batch1;
* this creates a new batch id for batch observations that are listed for more then 48 hours. ie. batch joe202 at n=49 hours becomes ijoe202 @ n=1,;
* while joe202 at n=47 hours stays joe202 at n=47 hours;
replace batch2 = "i"+batch1 if z>48;
*cleaning up by dropping necessary variables;
drop T;
drop z;
*generates two new variables for use in the loop. i must initially be greater than the value in the while statement, if it is lower, then the loop will be skipped.;
gen i2 = 0;
gen i = 100;
*this sets the condition for continuing the loop;
while i>=49;
*indicates the beginning of the loop;
{;
*i2 is redefined in each iteration of the loop, to make things easier it is dropped at the beginning of each loop and redefined later;
drop i2;
*This repeats the process above;
sort batch2 t;
bysort batch2: gen T = n[_N];
gen z = T-n;
replace batch2 = "i"+batch2 if z>48;
drop T;
*at this point this identifies if there are any batches that still have an n greater than 48.;
*a new variable needed to be defined rather than altering i, because the replace did not work with the egen function;
egen i2 = max(z);
drop z;
*this adjusts it so the count variable matches the condition set at the beginning of the loop;
replace i = i2;
*indicates the end of the loop;
};
*cleaning up the extra variables;
drop i2;


Created by Michael Morrison