Tuesday, October 6, 2009

Slow XDocument.Load

I recently started using LINQ to XML with XHTML web services. The process was fairly straight forward and was easy enough but I soon encountered a pretty big snag - parsing the response of the web service for some reason was taking over 8 seconds! After some digging I narrowed it down to the XDocument.Load method. So for the rest of my fellow developers that have experienced the incredible slow XDocument.Load this is for you.

I researched the issue and discovered that XDocument and XmlDocument can slow down during parsing when the document references one or more DTDs. Disabling DTDs wasn't an option for me so I played around a little and found that caching the DTDs locally gave me the performance I was looking for. It went from taking a little over 8 seconds to load to taking a little under a half a second (that's a bit of a difference).

When I took a look at the directory where I was putting these cached files I was surprised to find 41 files (DTDs, files the DTDs reference, etc...)!

The code for this solution is as follows:
// I hereby place this code in the public domain.
public class CachedXmlResolver : XmlUrlResolver
{
public static Dictionary<Uri, String> cache = new Dictionary<Uri, String>();
public const string Prefix = /* Directory for storing the cached files. */;

public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
if (!cache.ContainsKey(absoluteUri))
if (!doCache(absoluteUri))
return base.GetEntity(absoluteUri, role, ofObjectToReturn);

return new FileStream(cache[absoluteUri], FileMode.Open, FileAccess.Read, FileShare.Read);
}

private void addCache(Uri uri, string path)
{
if (cache.ContainsKey(uri))
cache[uri] = path;
else
cache.Add(uri, path);
}

private bool doCache(Uri absoluteUri)
{
try
{
string strUri = absoluteUri.ToString();

if (strUri.StartsWith("http://"))
{
int lindex = strUri.LastIndexOf('/');
string fname = Prefix + strUri.Substring(lindex + 1);

if (!File.Exists(fname))
{
WebRequest request = WebRequest.Create(absoluteUri);
WebResponse response = request.GetResponse();
Stream str = response.GetResponseStream();
byte[] buffer = new byte[8192];
Stream outs = new FileStream(fname, FileMode.Create, FileAccess.Write);
int count = 0;

while ((count = str.Read(buffer, 0, buffer.Length)) > 0)
outs.Write(buffer, 0, count);

outs.Flush();
outs.Close();
}

addCache(absoluteUri, fname);
return true;
}
}
catch { }

return false;
}
}

Using the fix is really simple:

XmlReaderSettings settings = new XmlReaderSettings();
settings.XmlResolver = new CachedXmlResolver();

XmlReader reader = XmlReader.Create(/* WEB SERVICE RESPONSE STREAM HERE */, settings);
XDocument xml = XDocument.Load(reader);

NOTE: I had to manually cache (and hard-code the 'addCache' calls in the constructor) for the following:
  • "-/W3C/DTD XHTML 1.0 Strict/EN" -> "xhtml1-strict.dtd"
  • "-/W3C XHTML 1.0 Transitional/EN" -> "xhtml1-transitional.dtd"
  • "-/W3C/DTD XHTML 1.0 Transitional/EN" -> "xhtml1-transitional.dtd"
  • "-/W3C XHTML 1.0 Frameset/EN" -> "xhtml1-frameset.dtd"
  • "-/W3C/DTD XHTML 1.1/EN" -> "xhtml11.dtd"
  • "-/W3C/DTD XHTML 1.1/xhtml11-model-1.mod" -> "xhtml11-model-1.mod"
If you want to know more about 'XmlResolver' you can always check MSDN. Hope this helps someone!

BC Time Fix Update

I recently switched web hosts and I forgot to update the links to my solution to fixing the annoying Boot Camp time problem (BC Time Fix). Instead of hosting it on my own site I have put it on Google Code so you can now download it from there.

NOTE: I'm having trouble compiling the wrapper application under Snow Leopard; I'll upload it once I get it compiling. The Python script has been uploaded however and can be found under the 'Downloads' tab.

Monday, October 5, 2009

VFP Interim

My VFP tutorials have been on hold. As such, since people may find these helpful, I am posting the code for vector dot & cross products. (I didn't actually compile this code, I just took the code out of the tutorial projects and threw it into a text file. It should be close enough however that you should be able to figure it out.) From what I remember these offer a decent speed increase (especially when grouping VFP_SET_VECTOR_LENGTH... instead of having it in each function).

Here's a review of dot & cross product if you need it:




The code!
/*
Copyright (c) 2009, Zach Griswold
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of the Zach Griswold nor the names of its contributors may be
used to endorse or promote products derived from this software without specific prior written
permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

// this is so we can make sure VFP code is compiled ONLY for the iPhone
// device, not the simulator (as it uses x86 code)
//
// we also define a few macros to make writing VFP code cleaner
#if defined(__APPLE__)
#include <TargetConditionals.h>

#if (TARGET_OS_IPHONE == 1) && (TARGET_IPHONE_SIMULATOR == 0)
#define ITOUCH_VFP_ENABLED 1

#define VFP_CODE_BEGIN() __asm__ volatile (
#define VFP_CODE_END() );
#define VFP_CODE_DEFAULT_END() : "r0" );

#define VFP_SET_VECTOR_LENGTH(vec_length) \
"fmrx r0, fpscr \n\t" \
"bic r0, r0, #0x370000 \n\t" \
"orr r0, r0, #0x000" #vec_length "0000 \n\t" \
"fmxr fpscr, r0 \n\t"

#define VFP_SET_VECTOR_LENGTH_ZERO() \
"fmrx r0, fpscr \n\t" \
"bic r0, r0, #0x370000 \n\t" \
"fmxr fpscr, r0 \n\t"
#endif
#endif

typedef struct {
union
{
float m_floats[3];
struct { float x; float y; float z; };
};
} __attribute__((packed)) Vector3;

// NOTE: for possible performance increases, you could
// remove the VFP_SET_VECTOR_LENGTH(...) calls from each
// function and instead group calls to each function between
// VFP_SET_VECTOR_LENGTH(...) macros

// vector dot (scalar) product
float vec3dot(const Vector3 *v1, const Vector3 *v2) {
#if ITOUCH_VFP_ENABLED
float ret;

VFP_CODE_BEGIN()

VFP_SET_VECTOR_LENGTH(2)

"fldmias %0, {s8-s10} \n\t"
"fldmias %1, {s12-s14} \n\t"

"fmuls s0, s8, s12 \n\t"
"fmacs s0, s9, s13 \n\t"
"fmacs s0, s10, s14 \n\t"

"fmrs %2, s0 \n\t"

VFP_SET_VECTOR_LENGTH_ZERO()

: "=r"(ret)
: "r"(v2), "r"(v1)

VFP_CODE_DEFAULT_END()

return ret;
#else
return v1->m_floats[0] * v2->m_floats[0] + v1->m_floats[1] * v2->m_floats[1] + v1->m_floats[2] * v2->m_floats[2];
#endif
}

// vector cross product
// NOTE: vOut should not point to the same thing as v1 and v2
void vec3cross(const Vector3 *v1, const Vector3 *v2, Vector3 *vOut) {
#if ITOUCH_VFP_ENABLED
VFP_CODE_BEGIN()

VFP_SET_VECTOR_LENGTH(2)

// load v1 into s0-s2 and v2 into s4-s6
"fldmias %0, {s0-s2} \n\t"
"fldmias %1, {s4-s6} \n\t"

// get v1 into the form of <Ay, Az, Ax>
// and store into s8-s10
"fcpys s8, s1 \n\t"
"fcpys s9, s2 \n\t"
"fcpys s10, s0 \n\t"

// get v2 into the form of <Bz, Bx, By>
// and store into s16-s18
"fcpys s16, s6 \n\t"
"fcpys s17, s4 \n\t"
"fcpys s18, s5 \n\t"

// get <AyBz, AzBx, AxBy> and store it into s8-s10
"fmuls s8, s8, s16 \n\t"

// get v1 into the form of <Az, Ax, Ay>
// and store it into s16-s18
"fcpys s16, s2 \n\t"
"fcpys s17, s0 \n\t"
"fcpys s18, s1 \n\t"

// get v2 into the form of <By, Bz, Bx>
// and store it into s24-s26
"fcpys s24, s5 \n\t"
"fcpys s25, s6 \n\t"
"fcpys s26, s4 \n\t"

// compute <AyBz - AzBy, AzBx - AxBz, AxBy - AyBx>
// and store it into s8-s10
"fnmacs s8, s16, s24 \n\t"

"fstmias %2, {s8-s10} \n\t"

VFP_SET_VECTOR_LENGTH_ZERO()

: "=r"(vOut)
: "r"(v2), "r"(v1)

VFP_CODE_DEFAULT_END()
#else
ret->m_floats[0] = v1->m_floats[1] * v2->m_floats[2] - v1->m_floats[2] * v2->m_floats[1];
ret->m_floats[1] = v1->m_floats[2] * v2->m_floats[0] - v1->m_floats[0] * v2->m_floats[2];
ret->m_floats[2] = v1->m_floats[0] * v2->m_floats[1] - v1->m_floats[1] * v2->m_floats[0];
#endif
}

Thursday, June 25, 2009

Running SQL Server Management Studio Under a Different Windows Account

I recently had the need to run SQL Server Management Studio under different Windows credentials than those under the account I was currently logged in as. So, I fired up SQL Server Management Studio and to my horror I couldn't change the account to use for Windows authentication.

After lots of searching I found a fix. It is the 'runas.exe' application (I'm using Vista I haven't tried this on anything else; also double check to make sure the path to the Management Studio is correct yours might be different). Here are the steps:

  1. Make sure you have logged into the computer with the desired account before (I don't know if this step is needed or not, but it's for good measure)
  2. Open a command prompt via "Run as administrator"
  3. Type the following and press enter: C:\Windows\System32\runas.exe /user:YourDomain\TheUserAccount "C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\SqlWb.exe"
  4. You will be prompted for your password, enter it (it won't display but your keystrokes are being registered) and press enter again
  5. SQL Server Management studio should now start up and you will see it using the account you specified for Windows authentication!

Saturday, May 9, 2009

Boot Camp Crap (Part 2)

Today it's time to fix two more issues that plague Boot Camp users (both are really simple though). The two problems are: when in OS X, you can't access your Windows NTFS partition (if it's over 32 GB I believe, which mine is) and in Windows audio can get crackly (on my mid-2008 MacBook Pro at least).

It gets old running my Boot Camp partition within VMWare Fusion whenever I wan't to transfer files over to OS X from my Vista partition. It turns out there is a fast, freely available driver that allows for mounting all NTFS drives! It's called NTFS-3G (it's the NTFS driver used in lots of Linux distros) and has a Mac port that you can find here. Download it, install it, love it.

The audio problem in Windows is a simple driver fix too. On the MacBook Pro the audio hardware is Realtek made, so we just need a better Realtek driver! You can get the Vista one (supposedly works on Windows 7, I'll find out today after my upgrade) here.

Monday, April 27, 2009

iTouch VFP Programming (Part 1) - Intro to the VFP

First off, sorry for the lack of updates. I've been busy with school, work, trying to get my own iPhone apps out the door, etc.... I have also wrote the majority of the content for this series I'm starting today - iTouch VFP Programming.

The iTouch (iPod Touch & iPhone) uses the ARM1176JzF processor, which contains a SIMD coprocessor called the VFP (which stands for Vector Floating Point). If you don't know what SIMD is, read the previous Wikipedia link. This series will focus on coding for the VFP, as well as areas we might use it (such as the vector cross and dot product). For this series I'm going to assume you're fairly comfortable for coding for the iTouch, know the difference between single and double precision floating point numbers, know a little ARM assembly, and that you have a physical iTouch that you use for development (we can't use our VFP code in the simulator, sorry).

CAUTION: code for the VFP will not work if our program is compiled with THUMB support. We have two options to remedy this, we can switch between THUMB and and non-THUMB as needed (messy and adds extra overhead) or we can disable compiling for THUMB all together (preferred, and besides if you're looking at SIMD programming you are going to want floating point performance so you probably don't want THUMB support anyways). If you don't know how to disable THUMB support look at this link.

Here in part 1 of the series, we are going to look over the format of VFP instructions as well as the VFP registers. To get your feet wet, take a look at this quick reference card (this quick reference will be your holy grail while coding for the VFP, just FYI...).


VFP Instructions


Instructions for the VFP generally come in the form of "f(operation)(s/d) destination, source(s)". As you can see, each instruction is prefixed with an f, then the name of the instruction, then a single letter specifying whether the instruction is working in single or double precision. For example, the instruction to copy the number in single precision register S5 to single precision register S0 would be "fcpys s0, s5". In some cases the source and the destination are swapped, which is why I said VFP instructions generally come in this format. (We will look over registers after we look at instructions, so when I talk about registers here don't fret!)



Since the VFP is a SIMD unit, it performs operations on more than one piece of data at a time. As the programmer, we are responsible for informing the VFP unit about the number ofdata components that it will be operating on at once (anywhere from one to four), this is known as "vector length" (we will go over this process later in the series). Extending our operation above, if we wanted to copy the contents of registers S24-S27 into registers S8-S11, we would inform the VFP unit to use a vector length of four, and then execute the instruction "fcpys s8, s24". As you can see, a vector length of four allows us to operate on a set of four registers at once (four destination registers, and four source registers). We won't be coding SIMD operations until later in the series, but as we take a general look at VFP instructions and registers I figure its best to start thinking about how it works conceptually now.


VFP Registers

The VFP unit has 32 single precision registers (S0-S31), and 16 double precision registers (D0-D15). The double precision registers overlap with the single precision registers, meaning D0 is comprised of S0 and S1, D1 is comprised of S2 and S3, and so on and so forth. Take a look at the image below for a visual representation of this.


As you can see, the figure above shows the relationship between the double (D0-D15) and single (S0-S31) precision registers in the VFP unit. The figure shows much more however. As you can see the registers have been placed into four groups. In the VFP, each of these four groups forms a "circulating bank of registers." This means that if a SIMD instruction were to start in a group and move past the edge of that group, the instruction would wrap around to the beginning of that group. For example, if we were going to copy the number 0 into four registers at once starting at S13 by informing the VFP we are using a vector length of four, we would copy the number 0 into S13, S14, S15, and S8. You need to be careful and take this into consideration when writing code for the VFP.

You may have also noticed the "S" and "V"s next to the different register banks. The VFP has special rules on register usage and SIMD operations, and the rules are as follows. If the destination of a VFP instruction is S0-S7, the operation is treated with a vector length of one regardless of the vector length we told the VFP to use; this is a "scalar" operation. Thus, S0-S7 can also be referred to as "the scalar bank." If the destination register is not S0-S7, but one of the source registers is, that instruction will not move that source to the next register(s) during the instruction, i.e. if we are using a vector length of four and executed the instruction "fcpys s8, s0" we would not copy S0-S3 into S8-S11 we would be copying only S0 into S8-S11; this is called a "mixed" operation. If both the destination and source registers are not the from the scalar bank, than the VFP will respect the vector length it has been told to use; this is a "vector" operation. This may seem confusing right now, but we will cover these again later in the series.


Conclusion

So far we have gone over the general format of a VFP instruction, and the VFP registers. We have seen that VFP instructions are prefixed with an "f", then comes the operation name, and then finally whether or not the instruction is single or double precision (this series only covers single precision operations). We saw that the VFP contains 32 single precision registers (S0-S31) and 16 double precision registers (D0-D15), that the double precision registers overlap single precision registers, and that the registers form "circulating banks." In part 2 of the series, we will get our feet wet and start looking at scalar operations with the VFP.

Wednesday, March 4, 2009

Time Machine Online Backups

I absolutely love Time Machine. The simplicity of such a powerful backup system is fantastic; I've really grown accustomed to the pretty UI and using Quick Look to find the files I want. I currently have a 500 GB external drive that I use for Time Machine on my Macbook Pro. This is a great solution, but only for the very short periods of time that I actually spend at my desk. I find myself going upwards of 4 days without a backup sometimes, which begins to defeat the purpose of backing up my files in the first place. Every time I look at how many days it has been since my last backup I cringe and wish I had a way to have an Time Machine backup my files online (I could use a Time Capsule, but online is available everywhere not just my network and is cheaper).

There are currently lots of online backup solutions. Two popular options are Carbonite (though their Mac support isn't quite finished yet) and Mozy, both of which provide unlimited backup storage and bandwith for $49.95 a year and $54.45 (non-commercial) a year respectively. Neither of these options, however, really matched what I wanted. I wanted multiple backups of the same file over time so that I could "go back in time" like I was used to with Time Machine to find the file(s) I wanted, I didn't just want the most recent version of my file backed up and that's it. I also really wanted to be able to use Time Machine with the service if at all possible, but I knew that would be asking too much. Or is it...?

After some research I have finally found a way for me to backup my files online using Time Machine (though this feature is technically not supported by Time Machine and by using it you run an increased risk of destroying your data at any time)! The process is pretty simple, we setup Time Machine to allow it use a network drive, setup an Amazon S3 account, get JungleDisk (which also has the ability to do backups itself like Mozy or Carbonite, but we won't be using it for this purpose) so we can use our S3 account as a network drive, and then tell Time Machine to use our S3 network drive. (Side note: if you'd rather just use a network drive instead online storage, you don't have to read on.)

Before we continue, I need to re-iterate that doing this is not supported by Time Machine! Though I and others so far have yet to have any problems, you do run the increased chance of loosing all your data that you backup. So with this said, follow this guide at your own risk. If something goes wrong it's your fault, not mine!

As I said, the first step we have to take is enabling network drives in Time Machine. To do this, open up the Terminal application (found in the Utilities folder within your Applications folder) and type the following EXACTLY, press enter, then quit the Terminal:

defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1


The next step is to setup an Amazon S3 account. Amazon S3 is a pay-as-you-go web storage service offered by Amazon, which means we only have to pay for the storage and bandwith that we actually use. Rather than explaining how to do that here, we will jump to the next step (you'll see why). Download and install JungleDisk for the Macintosh and follow the simple onscreen instructions, including setting up an S3 account.

If all goes according to plan you should now have a network drive mounted on your desktop, congratulations! The last step is to open up the Time Machine settings under System Preferences and tell Time Machine to use our S3 network drive (pictured below). Voila, we now have Time Machine working with online backups!