Here are some points I like to mention about the sample code below:

- Currently Accelerator does not support random number generation. Since I want to time only the execution done on GPU, I put the code for normal random number generation outside the main pricing function (i.e.,
**European_Option_Price**), as what I did in my previous posts on Monte Carlo. Inside the the pricing function, I try to put every computation on GPU. - As I still use the Box Muller transform, to avoid the issue of generating a normal random number of infinity, I apply the simple transform (i.e., Y = 1-U[0,1)) mentioned in the blog post. Well, if you want to try, you can choose not to apply the transform and set the seed to 388, and then see what your GPU will return. I tried that for fun.
- Why I set grid_size = 3163 is because the number of simulations will be the square of grid_size, which is slightly over the simulation size (10 millions) I used in my previous posts on Monte Carlo. Easier for me to compare.
- Obviously, using the API reduces F#'s expressiveness, but with some custom syntactic sugars, like overloaded operators, I believe that it can be mitigated. However, I don't bother to do that at this moment.

**option_price = 10.117122**

CPU time = 234ms

Absolute time = 433ms

Wow, in terms of absolute time, this GPU version is almost 4 times faster than my previous champion, the Array version. Though we fix the seed to 1, the reason why the option price given by the GPU version is different from the price by the Array version is mainly because of the simple transform I mentioned above. If you don't apply the transform, you will see both versions produce "almost" the same option price (some very minor difference is still there, due to slightly different number of simulations).

My laptop is 4 years old and equiped with only a

Why NVIDIA? because I like to try the following:

- Actually the latest release of Accelerator contains a CUDA target.
- NVIDIA claims their cuRAND library can generate random numbers at light speed.

////////////////////////////////////////////////////////////

open System

open System.Diagnostics

open Microsoft.ParallelArrays

type PA = Microsoft.ParallelArrays.ParallelArrays

type FPA = Microsoft.ParallelArrays.FloatParallelArray

let grid_size = 3163

let shape = [| grid_size; grid_size |]

let dxTarget = new DX9Target()

let rnd = new Random(1)

let next_double_pair () =

(1.0 - rnd.NextDouble(), 1.0 - rnd.NextDouble())

//let rnd = new Random(388)

//let next_double_pair () = (rnd.NextDouble(), rnd.NextDouble())

let box_muller_transform (u1,u2) =

sqrt(-2.0*log(u1)) * cos(2.0*Math.PI*u2)

let generator _ _ =

next_double_pair() |> box_muller_transform |> (float32)

let random_normals = Array2D.init grid_size grid_size generator

let European_Option_Price K S0 r vol T =
let FPA_zero = new FPA(float32 0.0, shape)

let FPA_N = new FPA(float32 (grid_size * grid_size), [|1|])

let FPA_discount = new FPA(float32 (exp(-r*T)), [|1|])

let FPA_K = new FPA(float32 K,shape)

let FPA_S0 = new FPA(float32 S0,shape)

let FPA_mean =

new FPA(float32 ((r - 0.5*(vol**2.0))*T), shape)

let FPA_sigma = new FPA(float32 (vol*sqrt(T)), shape)

let FPA_exp = new FPA(float32 (exp 1.0), shape)

let FPA_normals = new FPA(random_normals)

let FPA_S_T =

FPA_S0 *

PA.Pow(FPA_exp, (FPA_mean + FPA_sigma * FPA_normals))

let FPA_payouts = PA.Max(FPA_S_T - FPA_K, FPA_zero)

let FPA_price = FPA_discount * PA.Sum(FPA_payouts) / FPA_N

let result = dxTarget.ToArray1D(FPA_price)

(float) result.[0]

let S0 = 50.0

let K = 40.0

let r = 0.01

let vol = 0.2

let T = 0.25

let price() =

let price = European_Option_Price K S0 r vol T

printfn "option_price = %f" price

let time f =

let proc = Process.GetCurrentProcess()

let cpu_time_stamp = proc.TotalProcessorTime

let timer = new Stopwatch()

timer.Start()

try

f()

finally

let cpu_time = (proc.TotalProcessorTime -

cpu_time_stamp).TotalMilliseconds

printfn "CPU time = %dms" (int64 cpu_time)

printfn "Absolute time = %dms" timer.ElapsedMilliseconds

time price