Thanks again for making this, MuddyMole. 
Did some digging and modified your code to work with other LUT sizes:
Code for 4096px X 64px (64,64,64) LUT:
Code:
sampler2D bkd : register(s1);
sampler2D lut : register(s2) = sampler_state {
MinFilter = Linear;
MagFilter = Linear;
};
float4 ps_main(in float2 In : TEXCOORD0) : COLOR0 {
float4 imgColor = tex2D( bkd, In );
float red = ( imgColor.r * 63 + 0.5 ) / 4096;
float green = ( imgColor.g * 63 + 0.5 ) / 64;
float blueA = floor( imgColor.b * 63 ) / 64;
float blueB = ceil( imgColor.b * 63 ) / 64;
float4 colorA = tex2D( lut, float2( blueA + red, green ));
float4 colorB = tex2D( lut, float2( blueB + red, green ));
float lerpAB = ( imgColor.b - ( blueA + red )) / (( blueB + red ) - ( blueA + red ));
float4 colorOut = lerp( colorA, colorB, lerpAB );
return colorOut;
}
technique tech_main {
pass P0 {
PixelShader = compile ps_2_0 ps_main();
}
}

^ Right click, save as
As far as the rest of the code goes, I'm still a bit lost lol.
I did the pass-through test as you did and did not observe errors:

And here is a test grade using a lut I had laying around:

I also tried my hand at adding an intensity value to the shader and XML, but failed miserably.
-------------------------------------
For reference for other users, here is the code for a 1024px X 32px (32,32,32) LUT:
Code:
sampler2D bkd : register(s1);
sampler2D lut : register(s2) = sampler_state {
MinFilter = Linear;
MagFilter = Linear;
};
float4 ps_main(in float2 In : TEXCOORD0) : COLOR0 {
float4 imgColor = tex2D( bkd, In );
float red = ( imgColor.r * 31 + 0.5 ) / 1024;
float green = ( imgColor.g * 31 + 0.5 ) / 32;
float blueA = floor( imgColor.b * 31 ) / 32;
float blueB = ceil( imgColor.b * 31 ) / 32;
float4 colorA = tex2D( lut, float2( blueA + red, green ));
float4 colorB = tex2D( lut, float2( blueB + red, green ));
float lerpAB = ( imgColor.b - ( blueA + red )) / (( blueB + red ) - ( blueA + red ));
float4 colorOut = lerp( colorA, colorB, lerpAB );
return colorOut;
}
technique tech_main {
pass P0 {
PixelShader = compile ps_2_0 ps_main();
}
}

^ Right click, save as
-------------------------------------
And here is the code for a 256px X 16px (16,16,16) LUT:
Code:
sampler2D bkd : register(s1);
sampler2D lut : register(s2) = sampler_state {
MinFilter = Linear;
MagFilter = Linear;
};
float4 ps_main(in float2 In : TEXCOORD0) : COLOR0 {
float4 imgColor = tex2D( bkd, In );
float red = ( imgColor.r * 15 + 0.5 ) / 256;
float green = ( imgColor.g * 15 + 0.5 ) / 16;
float blueA = floor( imgColor.b * 15 ) / 16;
float blueB = ceil( imgColor.b * 15 ) / 16;
float4 colorA = tex2D( lut, float2( blueA + red, green ));
float4 colorB = tex2D( lut, float2( blueB + red, green ));
float lerpAB = ( imgColor.b - ( blueA + red )) / (( blueB + red ) - ( blueA + red ));
float4 colorOut = lerp( colorA, colorB, lerpAB );
return colorOut;
}
technique tech_main {
pass P0 {
PixelShader = compile ps_2_0 ps_main();
}
}

^ Right click, save as
The benefit of using a larger LUT size is greater accuracy across complex color changes. For most users, the 256 X 16 will be good enough, as will MuddyMole's 289 X 17 LUT (which should be slightly more accurate.)
As far as I know, the 4096 X 64 is a near-perfect translation, while the smaller sizes interpolate the colors to a far heavier degree.
As far as performance differences, I have no idea. I don't have a project large enough to test it, and even if I did, my computer is really overkill for MMF anyway so I doubt I'd see much of a performance drop.