jaredpar's WebLog
Comparing Continuations in C# and F# Part 3: Double wrong
Is it better to be wrong once or to be right then think you’re wrong but find out you were right but wrong about being wrong? Besides the obvious be right the first time, it’s certainly an educational experience.
Here’s the original sample:
let FoldRight combine (sequence:seq<'a>) acc = use e = sequence.GetEnumerator() let rec inner cont = match e.MoveNext() with | true -> let cur = e.Current inner (fun racc -> cont (combine cur racc)) | false -> cont acc inner (fun x -> x )Brian McNamara pointed out I wasn’t considering all of the call sites for this sample. In addition to the recursive call to “inner” and the initial inner call, there is the actual recursive invocation of the of the continuations. Effectively the “inner” function is building up a list of list of lambdas which call the combine function. The output of the combine function is simply passed into the next lambda in the list. The last lambda in the list is the identity lambda and returns the final call to combine. This value is the actual value returned from the initial invocation “cont acc”. Lambdas are methods under the hood. Without a tail instruction, this chain of lambda calls will just as easily overflow the stack.
Digging deeper into the compiled F# code we can view this call and indeed it is done with tail recursion.
.method public virtual instance !V Invoke(!U racc) cil managed { .maxstack 8 L_0000: nop L_0001: ldarg.0 L_0002: ldfld class [FSharp.Core]Microsoft.FSharp.Core.FastFunc`2 Program/clo@9::cont L_0007: ldarg.0 L_0008: ldfld class [FSharp.Core]Microsoft.FSharp.Core.FastFunc`2> Program/clo@9::combine L_000d: ldarg.0 L_000e: ldfld !2 Program/clo@9::cur L_0013: ldarg.1 L_0014: call !!0 [FSharp.Core]Microsoft.FSharp.Core.FastFunc`2::InvokeFast2(class [FSharp.Core]Microsoft.FSharp.Core.FastFunc`2>, !0, !1) L_0019: tail L_001b: callvirt instance !1 [FSharp.Core]Microsoft.FSharp.Core.FastFunc`2::Invoke(!0) L_0020: ret }The below code more accurately resembles the equivalent C# code that is generated for the above F# sample (thanks Brian!).
public static TAcc Inner<TSource, TAcc>( this IEnumerator<TSource> e, Func<TAcc, TSource, TAcc> combine, TAcc start, Func<TAcc, TAcc> cont) { while (e.MoveNext()) { var cur = e.Current; Func<TAcc, TAcc> innerCont = cont; cont = (x) => /*need .tail here */innerCont(combine(x, cur)); } return cont(start); } public static TAcc FoldRight<TSource, TAcc>( this IEnumerable<TSource> enumerable, Func<TAcc, TSource, TAcc> combine, TAcc start) { using (var e = enumerable.GetEnumerator()) { return Inner(e, combine, start, (x) => x); } }Previous Entries
Comparing Continuations in C# and F# Part 2
In my last post I went over the differences between using a continuation in F# and C#. As it turns out I was right about the limits and symptoms but wrong about the reason.
The F# code does indeed generate tail calls for part of the continuation. However this is only a very small portion of the actual code and is in fact only generated for the call in the empty case. I misread this function to be the call for the overall continuation. Instead it is the function for the entire “inner” lambda.
So why does F# perform differently than C# in this scenario?
Andrew Kennedy pointed out that F# will actually transform the “inner” function into a loop. In affect the code generated looks like the following.
TypeFunc func = this._self3; while (true) { if (!this.e.MoveNext()) { break; } A cur = this.e.Current; cont = new Program.clo@9<U V, A ,>(this.combine, cont, cur); } return cont.Invoke(this.acc);The actual transformation into a loop is what is preventing F# from overflowing the stack here. Iteration incurs no stack overhead in this case.
Even more interesting is that the tail opcode is quite simply ignored when dealing with un-trusted code. It therefore cannot be relied on to generate performant code in all scenarios.
Properly Incrementing an IntPtr
Just as native pointer types are moved around with pointer arithmetic in native code, it can also be useful to move IntPtr types around in managed code. Say for instance there is an IntPtr available which points to a native array of Dog instances. To access the values of that array individually requires pointer arithmetic of a fashion.
For the most part this is a straight forward operation if the underlying native memory is understood. Say there is an array of Dog instances with length 10 and the Dog structure has a size of 8. The total amount of memory will be 80 bytes and with a valid dog instance being available at every 8 bytes. So if the start address is 1000 then 1000,1008,1016 and so on will point to a valid instance.
The native size of any data structure can be calculated via Marshal.SizeOf(tyepof(Dog)) [1]. With a pointer to the start of the array, the Nth Dog instance can be accessed with a pointer of address = (N*sizeof(Dog))+startAddress
The address of a pointer can be accessed by 1 of 2 functions
- .ToIn32()
- .ToInt64()
Unless you are writing an application that will every only run on a 32 bit system, don’t use method #1 (even then still don’t). Native pointer addresses vary in size depending on version of the OS a program is running on. 64 bit systems have a much larger address size (long vs int). Consequently calling .ToInt32 on a 64bit system will truncate the actual address to a valid that is no longer valid. This will eventually lead to a random error PInvoke’ing a function that is difficult to track down.
Instead use .ToInt64(). This method is safe on both 32 and 64 bit systems. Additionally constructing an IntPtr instance with either value is safe. The class knows what version of windows it’s executing on and will adjust the size in a safe way [2].
In many of my projects I define a class similar to the following to take care of this automatically.
public static class IntPtrExtensions { public static IntPtr Increment(this IntPtr ptr, int cbSize) { return new IntPtr(ptr.ToInt64() + cbSize); } public static IntPtr Increment<T>(this IntPtr ptr) { return ptr.Increment(Marshal.SizeOf(typeof(T))); } public static T ElementAt<T>(this IntPtr ptr, int index) { var offset = Marshal.SizeOf(typeof(T))*index; var offsetPtr = ptr.Increment(offset); return (T)Marshal.PtrToStructure(offsetPtr, typeof(T)); } }[1] It’s highly advisable to not calculate this value yourself.
[2] See the post “Is IntPtr(long) truncating?” for more details
Comparing Continuations in F# and C#
Lately I’ve been playing quite a bit with F#. I have several hobby projects I’m working on that take up a bit of my time. But when I’m not playing around with F# I’m exploring ways to apply certain functional patterns to actual coding on the job and/or porting to my functional library: RantPack.
Recently I’ve been playing around with continuations in F#. I thought this was a great topic to do a F# comparison with other languages. In this case C#.
Let’s examine a classic use of continuations: a right fold on a list. For a detailed explanation of fold right and the use of a continuation I suggest taking a look at Brian's discussion. If you’re unfamiliar with continuations I highly suggest that you take a look at this post as Brian gives a great breakdown of continuations and their uses.
Here's a quick refresher on continuations by example. Fold right is an operation which reduces a sequence of elements into a single element by processing the list from right to left. It’s similar to the LINQ Aggregate method except Aggregate operations left to right.
For this post we’ll be writing FoldRight against a sequence. I chose sequence vs. the traditional list because it’s present in both languages (F# = seq<’a>, C# = IEnumerable<T>). It is possible to other F# data structures in C# but the comparison is cleaner when using a type that is native to both languages.
Sequences are a left to right data structure so processing it right to left is not natural. After all, with a sequence all the developer has is current and whether or not there is a next element. Processing the list in a right to left fashion can be done by such acts as reversing the list, or doing a head recursive call. Both have their detractors.
Continuations are a different way to process the list. Instead of processing the list directly we process the list a single element at a time building up a continuation along the way. For each element a lambda expression is generated representing the work needed to be done for that element. The value calculated within the lambda will then be passed to the lambda calculated for the previous element. Once we hit the end of the list, we essentially have a chain of lambda expressions which process each element in the list in reverse order. All that is needed is to call the final lambda with the starting value and we will effectively process the list in reverse order.
Simple enough? Lets take a look at the code.
F# Code
let FoldRight combine (sequence:seq<'a>) acc = use e = sequence.GetEnumerator() let rec inner cont = match e.MoveNext() with | true -> let cur = e.Current inner (fun racc -> cont (combine cur racc)) | false -> cont acc inner (fun x -> x )C# Code
public static TAcc FoldRight<TSource, TAcc>( this IEnumerable<TSource> enumerable, Func<TAcc, TSource, TAcc> combine, TAcc start) { using (var e = enumerable.GetEnumerator()) { Func<Func<TAcc, TAcc>, TAcc> inner = null; inner = (cont) => { if (e.MoveNext()) { var cur = e.Current; Func<TAcc, TAcc> innerCont = (x) => cont(combine(x, cur)); return inner(innerCont); } else { return cont(start); } }; return inner(x => x); } }My immediate reaction to the two samples is the conciseness of the F# code. This is a not a criticism of C# though. F# is designed to be a concise language and it’s delivery on that goal is evident in this sample.
What makes the big difference here is the type inference power of F#. In the C# sample there are 6 explicit types listed in the code sample. The F# sample only has a single type listed. The compiler is able to infer and/or generate the rest of the signatures. F# also requires less explicit generic parameters 1 vs. 2 in C#.
The next big difference I see is the awkward way in which the inner lambda expression must be declared in C#. The lambda expression is called recursively in order to setup the continuation. In order to do that in C# the lambda must be declared and defined in separate expressions. Otherwise, a self reference of “inner” inside the body of “inner” will generate a used before defined warning from the compiler.
The IL
Examining the full IL of both functions would take several blog posts. Not to mention that trying to read disassembled F# much less IL, is like trying to read disassembled C++. An interesting exercise but a bit time consuming.
I did want to focus a bit on one portion of the generated IL. There is a very significant difference in the way the recursive call to the “inner” lambda is made.
F#
L_0032: ldarg.1 L_0033: ldarg.0 L_0034: ldfld !0 Test/clo@6T::acc L_0039: tail L_003b: callvirt instance !1 [FSharp.Core]Microsoft.FSharp.Core.FastFunc`2::Invoke(!0)C#
L_0077: ldarg.0 L_0078: ldfld class [System.Core]System.Func`2, !1> ConsoleApplication1.Extensions/<>c__DisplayClass8::inner L_007d: ldloc.0 L_007e: callvirt instance !1 [System.Core]System.Func`2, !TAcc>::Invoke(!0) L_0083: stloc.3In both cases the first 3 lines are building up the 2 parameters necessary for the recursive lambda call. The closures are structured somewhat differently but the same basic operation is being done.
The key difference between the languages is F#’s use of the tail opcode. This opcode tells the CLR the to call the next method in a tail recursive fashion. This causes the CLR to remove the current method frame from the stack before the next method is called. Because the method is removed from the stack, the recursive call takes up no additional stack space. This is true no matter how many times the function is called.
The C# IL does not have this opcode. So the recursive call will happen with the current method on the frame. With a big enough sequence this will cause the process to run out of stack space and generate a StackOverflowException. This creates a limitation on the number of elements the C# sample can process.
Limits
I explored the limits of both samples on my home laptop. I generated a simple example to sum the sequence with the fold right. Note: For a sum of ints, a fold left is just as good, but it serves fine for this sample.
F#
let sum = FoldRight (fun x y -> x + y) [1..1000000] 0 printfn "%d" sumC#
var source = Enumerable.Range(1, 9397); var result = source.FoldRight((x, y) => x + y, 0); Console.WriteLine("{0}", result);The C# sample can process a maximum size of 9397 elements. After that I encounter a stackoverflow exception. The F# sample however can easily process 1,000,000 elements.
Closing note. This not meant to be a post criticizing C#. It’s meant to be a general comparison of the same technique in two managed languages. This is a scenario that is far less likely to occur in a C# program. In an F# program it’s quite simply an expectation and hence the F# compiler is optimized for this scenario.
Dereference a double IntPtr
A common PInvoke question is how to deal with a double pointer. More specifically, how can one dereference an IntPtr to another pointer without using unsafe code?
Dereferencing a double pointer is done the same way a dereference to any other structure is done: Marshal.PtrToStructure. PtrToStructure is used to transform a native pointer, in the form of an IntPtr, into a managed version of the native data structure the native pointer points to.
In the case of a double pointer, the native data structure the pointer points to is just another native pointer. The managed equivalent is the IntPtr (or UIntPtr) class.
For Example, say we had the following native data signature
void GetDoublePointer(int** ppData)This function returns a pointer to a pointer that points to an int. (Pointer->Pointer->int). We can then use the following C# code to access the final int value.
[DllImport("PInvokeSample.dll")] static extern void GetDoublePointer(IntPtr doublePtr); static void Main(string[] args) { var ptr = Marshal.AllocHGlobal(Marshal.SizeOf(typeof(IntPtr))); try { GetDoublePointer(ptr); var deref1 = (IntPtr)Marshal.PtrToStructure(ptr, typeof(IntPtr)); var deref2 = (int)Marshal.PtrToStructure(deref1, typeof(int)); Console.WriteLine(deref2); } finally { Marshal.FreeHGlobal(ptr); } }PInvoke and COM objects
I've occasionally found the need to mix COM interop and PInvoke. For certain scenarios it's just easier to code up a PInvoke declaration and function. It's perfectly legal to include COM objects in these scenarios provided the appropriate Marshal attributes are added to the signature.
The easiest way to accomplish scenario is to have the native signature only expose IUnknown instances. On the managed side use an object declaration annotated with MarshalAs(UnmanagedType.IUnknown). Example:
[DllImport("SomeDll.dll")] [return: MarshalAs(UnmanagedType.IUnknown)] public static extern object GetSomeComObject();One item to remember though is how to managed the ref counting in this scenario. In any case where a COM object is considered to be coming out of the PInvoke signature, the CLR will assume that it has an obligation to call IUnknown::Release() at some point in the future. The corresponding native code must take this into account and appropriately AddRef() the object.
This includes any scenario, as displayed above, where the COM object is the actual return value of the function [1].
[1] Got bit by this last week.


Theme by