These are actually a version of the notes I made up for a group seminar at ICTS. Just posting it here for anyone interested, or looking for more detailed documentation of the astroML correlation functions.
What is BAO and BAO Peak?
Say there is a big box far away from you. You know the actual dimensions of the box. You can then calculate the distance to the box by knowing its apparent angular size.
A similar trick can be applied in cosmology to measure distances to galaxies, but there is one slight caveat  spacetime is curved. This curvature acts like a lens, distorting the images produced. What that means is  if we notice that the image appears small, it could be far away, or it could be that the lens distorted the image to make it look small when it actually is nearby.
We then need a distance ruler which is not affected by such effects, which means we need an object of known size for a particular redshift $z$, or a population of objects whose redshift variation of size we know. Baryon Acoustic Oscillations (BAO) accomplish both these things for us. One could calculate, given a certain redshift, the correlation function between galaxies at that redshift, and we would expect there to be a peak at some length scales. Similarly, we can calculate the correlation function in the redshift space, and expect to find a peak.
The above figure shows the 2point correlation function in the redshift space (redshift converted to comoving distance).
But why do we expect to see a BAO peak at all? Imagine an overdense region in the early universe. This region will pull matter towards it due to its gravity, but there will also be an outward pressure due to radiationmatter interaction. This competition creates oscillations like the sound waves produced in air, and these oscillations propagate outwards.
But after a point, electrons and protons combine to form the Hydrogen atom, and the photons are ‘decoupled’ from the matter. This causes the photons to freestream, and eventually reach us after millions and bliions of years as the cosmic microwave background (CMB). This means that a fair portion of matter is left at a fixed radius. This excess is what we detect as the BAO peak.
The LandySzalay Estimator for the twopoint angular correlation function
tl;dr  Stephen Landy and Alexander Szalay in their 1993 paper said that the best estimator for 2pacf is
$w(\theta) = \frac{DD2DR+RR}{RR}$.
The 2pacf $w(\theta)$ is the relative probability of finding a pair of galaxies separated by a certain angular distance with respect to that for an uniform distribution. The estimators for for $w(\theta)$ are generally built out of $DD$, $DR$ and $RR$, all suitably normalized with regard to the geometry from which data is taken.
There are lot of calculations mentioned in the paper, like calculating variance and bias of the estimator and comparing it to other estimators. But the takeaway formula from the paper to implement code is the following :
$1 + w(\theta) = d  2x + 2$
where $d = \frac{2DD}{G_p(\theta)n(n1)} $ and $x = \frac{DR}{G_p(\theta)nn_r}$
Correlation Functions in astroML
Here, I describe various functions in astroML.correlation
. astroML.correlation
(source code here contains correlation functions written with astronomy/cosmology applications in mind. The package astroML
contains a lot of tools that can be used for machine learning applications in astronomy.
two_point(data, bins, method='standard', data_R=None, random_state=None)
This is the primary function of the module in a sense. All the other functions involve calls of this function, so understanding this is the key.
Lets describe each argument that the function takes :

data
 The dataset for which correlation function $w(s)$ has to be found.data
can be a onedimensional or twodimensional array. Ifdata
is a onedimensional array, it is converted into a twodimensional array by adding a redundant axis usingnumpy.newaxis
. Also,n_samples, n_features = data.shape

bins
 The explicit values of the bins over which $w(s)$ has to be calculated.bins
has to be a onedimensional array. 
method
 Optional argument. Can take valuestandard
which will use estimator $\frac{DD}{RR}  1$, orlandyszalay
which will use the LandySzalay estimator $\frac{DD2DR+RR}{RR}$. We shall usemethod=landyszalay
for all our purposes. 
data_R
 Optional argument. The random dataset to be used in the estimator. Thedata_R
should have samen_features
asdata
. If not given, the program chooses a version ofdata
shuffled along all but one axis asdata_R
. 
random_state
 Optional argument. The seed of the random number generator to be used throughout the program.
The program returns array corr
containing the estimate of the correlation function.
The following is the alogrithmic flow of the program :

Check for consistency conditions of
data
,method
,bins
, anddata_R
. 
Define
factor = len(data_R) * 1. / len(data)
. This will be useful to normalize the estimator later. 
Create the
KDTree
data structure (present inscikitlearn
) to representdata
anddata_R
. This data structure allows for fast computation of correlation functions. 
KDTree
data structure has an attribute calledtwo_point_correlation(data_1,bins)
which provides the correlation between theKDTree
data and the datasetdata_1
for the distances inbins
. This means if we use theKDTree
data structure corresponding todata
and putdata_1 = data
, we will getcounts_DD
(not equal to $DD$; explained below) to be used in the estimator. Similarly, if we usedata_1 = data_R
, we getcounts_DR
, and so on. 
An important point to note is that
two_point_correlation(data_1,bins)
returns the total number of galaxy pairs lying within a sphere of radius equal to the distance given inbins
. This is not what we need  we would like to have the number of galaxy pairs lying within a spherical shell of some thickness around the distance value.
To get $DD$ we just take np.diff(counts_DD)
and normalize it by the total number of pairs. With N_D = len(data)
and N_R = len(data_R)
, we assign
$DD$ = np.diff(counts_DD)
$RR$ = np.diff(counts_RR)
$DR$ = np.diff(counts_DR)
As $RR$ appears in the denominator, we find all places where $RR$ is zero and set it to 1
to avoid problems. Later we will set all such locations in the final correlation function array to numpy.nan
.
 The correlation function is now calculated by the formula (LandySzalay Estimator)
corr = (factor ** 2 * DD  2 * factor * DR + RR) / RR
bootstrap_two_point(data, bins, method='standard',Nbootstraps=10, return_bootstraps=False, random_state=None)
Bootstraping is a process used to identify errors in the estimation of a particular quantity. It is pretty straightforward  we use one realization of the quantity to be measured as the baseline, and then take the standard deviation (with delta degrees of freedom = 1) over multiple realizations (or ‘bootstraps’) of the quantity as the errorbars on the baseline.
The two extra arguments in this function as compared to two_point
are both optional arguments.

Nbootstraps
gives the the number of realizations over which the error will be calculated.Nbootstraps
$\ge$ 2. 
return_bootstraps
takes valueTrue
orFalse
and decides whether the function should return all the bootstraps.
The function returns corr
, corr_err
and all the bootstraps (given condition).
two_point_angular(ra, dec, bins, method='standard', random_state=None)
This is an implementation of the angular twopoint correlation function in astroML, which merely converts angular coordinates into 3D (x,y,z)
coordinates, and passes this as data
into two_point
. The algorithmic flow of this function is as follows :

As opposed to
data
intwo_point
,two_point_angular
takes onedimensional arrays of Right Ascensionra
and Declinationdec
as input.ra
anddec
is converted into euclidean(x,y,z)
using thera_dec_to_xyz(ra,dec)
function. The output of this function is used asdata

Notice that there is no argument here which corresponds to
data_R
intwo_point
. That is because the program generates uniform distributions of RAra_R
in range[min(ra), max(ra)]
Decdec_R
in th e range[(min(dec), max(dec)]
. The size ofra_R
anddec_R
is twice that of thera
(ordec
).data_R
is constructed usingra_dec_to_xyz(ra_R,dec_R)
. 
The angular
bins
entered are also transformed into euclidean distances. The function then returns the follwing
return two_point(data, bins_transform, method=method, data_R=data_R, random_state=rng)
bootstrap_two_point_angular(ra, dec, bins, method='standard',Nbootstraps=10,random_state=None)
This is defined in a similar fashion as the other bootstrap procedure, with the sole exception that, rather than considering a single realization of the estimator as the baseline, we take the baseline to be the mean of all realizations.
My Comments

The
factor
intwo_point
is a bit fishy. It could be that this is causing an apparent problem in the normalization. 
I am not very comfortable with the
ddof = 1
used in the numpy.std, but at first glance, that seems to be a property of the distribution/error