Watson crash in RIL_GetCellTowerInfo()

Tuesday, December 15, 2009


I've banged my head against the keyboard for far too long on this problem, so hopefully this post will help preserve someone else's forehead/keyboard.

On the Bing client for Windows Mobile 6, we were getting random Watson crashes (the error window with the Send/Don't Send buttons) and eventually narrowed it down to the RIL_GetCellTowerInfo() call. The probability of a single call to that function causing a crash was about 0.2% and seemed to happen totally randomly. What were we doing wrong?

After narrowing down the cause of the crash by manually running the app over and over, inspecting dumps, etc, I came up with the following code that would crash the phone every time I ran it. It'd die on average around the 500th iteration, sometimes on the first few iterations, and never made it past about 2000. RIL (radio interface layer) talks to drivers underneath it which are of course hardware-specific — but this code crashed every phone I tried it on.

I found another person who was having the same issue as me, but there was no solution. The only answer was a link to a blog which had code that looks just like what I was already using, so that didn't help either.

So, here's the code that crashes. Briefly, what it does is (in a loop):

  1. Call RIL_Initialize(), passing it some parameters including the CellTowerInfoCallback which GetCellTowerInfo() will later call.
  2. If it succeeds (I've never seen it fail on a real phone), call GetCellTowerInfo(), which returns a status code immediately and spawns a thread to do the callback.
  3. Wait for GetCellTowerInfo() to invoke the CellTowerInfoCallback. Sometimes this times out, but that is no big deal and doesn't break anything.
  4. Call RIL_Deinitialize()
using System;
using System.Runtime.InteropServices;
using System.Threading;

static class Program {
  public delegate void RILRESULTCALLBACK(uint dwCode, IntPtr hrCmdID, 
    IntPtr lpData, uint cbData, uint dwParam);
  
  public delegate void RILNOTIFYCALLBACK(uint dwCode, IntPtr lpData, 
    uint cbData, uint dwParam);

  [DllImport("ril.dll")]
  private static extern uint RIL_GetCellTowerInfo(IntPtr rilHandle);

  [DllImport("ril.dll")]
  private static extern uint RIL_Initialize(
    uint rilPortIndex, 
    RILRESULTCALLBACK resultCallback, 
    RILNOTIFYCALLBACK notifyCallback,
    uint notificationClasses,
    uint userParam,
    out IntPtr rilHandle
  );

  [DllImport("ril.dll")]
  private static extern uint RIL_Deinitialize(IntPtr rilHandle);

  private static readonly AutoResetEvent CallbackWaitHandle = new AutoResetEvent(false);

  [MTAThread]
  public static void Main(string[] args) {
    for (int i = 0;; i++) {
      IntPtr ril;
      uint result = RIL_Initialize(1, CellTowerInfoCallback, null, 0, 0, out ril);
      if (result == 0) {
        result = RIL_GetCellTowerInfo(ril);
        if (result >= 0) {
          if (CallbackWaitHandle.WaitOne(3000, false))
            Console.WriteLine("Successfully waited");
        }
      }
      RIL_Deinitialize(ril);
    }
  }

  private static void CellTowerInfoCallback(uint code, IntPtr commandId,
      IntPtr commandData, uint commandDataLength, uint userParam) {
    // normally, you'd do stuff with the data passed here and then set the signal
    CallbackWaitHandle.Set();
  }
}

To me, and quite a few other devs, it looked like we weren't doing anything wrong. Having come straight out of college, where I used primarily Java and some C/C++, but never both simultaneously, it was easy to fall into what is probably a common trap when bridging the managed/unmanaged divide. The problem is that the garbage collector decided to collect or move the CellTowerInfoCallback function right after RIL_Initialize() was called, so that when RIL_GetCellTowerInfo() tries to invoke the callback, it fails and throws an uncatchable exception (since it's thrown from a different thread), which is then handled by Watson.

The fix? Tell the garbage collector to not touch that function. Everything works great if we write the code like this:

public static void Main(string[] args) {
  for (int i = 0;; i++) {
    IntPtr ril;
    RILRESULTCALLBACK func = this.CellTowerInfoCallback;
    GCHandle cb = GCHandle.Alloc(func, GCHandleType.Pinned);
    uint result = RIL_Initialize(1, (RILRESULTCALLBACK)cb.Target, null, 0, 0, out ril);
    if (result == 0) {
      result = RIL_GetCellTowerInfo(ril);
      if (result >= 0) {
        if (CallbackWaitHandle.WaitOne(3000, false))
          Console.WriteLine("Successfully waited");
      }
    }
    RIL_Deinitialize(ril);
    cb.Free();
  }
}

Tags: crash, ril, watson, wince, winmo | Posted at 21:03 | Comments (6)


Comments

charles on Thursday, December 17, 2009 at 01:50

maybe you shoulda thrown a

randomize()

in there somewhere? i dunno, i rememebr doing that in CS1371 alot lol.

Markus Korbel on Friday, April 16, 2010 at 06:53

Thank you sooo much for this information, this problem was starting to drive me crazy! But I was really lucky to "stumble" over your post here since I was searching for WinCE501bException in regards to RIL_GetCellTowerInfo. Hope that google and co will index this comment and help others find this info!!!

Thx again!

Kornelije Petak on Tuesday, May 25, 2010 at 22:57

I'd like to thank you for posting this, it's been a time saver.

I've had a same problem and just couldn't figure it out. The thing was so random and I couldn't grab an exception, it was so frustrating.

Thanks again.

Sergey Abaev on Wednesday, May 26, 2010 at 02:16

GC collected callback method when PInvoking this method from C#
When using this method from managed code (C#,VB.NET, etc.) make sure to tell the garbage collector not to collect the callback method. This results in un-catchable WinCE501bExceptions that crash the application.

Use the following example when initializing RIL:

RIL.RILRESULTCALLBACK callback = this.CellTowerData;
GCHandle cb = GCHandle.Alloc(callback, GCHandleType.Pinned);

IntPtr result = RIL.RIL_Initialize(1, (RIL.RILRESULTCALLBACK)cb.Target, null, 0, 0, out rilRef);

Sergey Abaev on Wednesday, May 26, 2010 at 02:18

http://msdn.microsoft.com/en-us/library/aa923065.aspx

Sergey Abaev on Wednesday, May 26, 2010 at 02:18

http://msdn.microsoft.com/en-us/library/dd938890.aspx

Add a comment

Name:
Email: (optional, not displayed to public)
URL: (do not fill this in — leave blank!)

Comment: