Friday, June 1, 2007

XPe tip #50: Smallest Windows XP(e) based thin client image

Another post in the newsgroup has led me to an investigation in results of which I came up with probably the smallest networked XPe image that include fully functioning RDP client.

Basically I suggested to start with image that would include the following:
- Your hardware + HAL (only hardware you plan to support in the image)

- NTFS FS/File format. The latter is zero-size and may bring more components in as dependencies - however not actually required if you don't plan to format disks in NTFS under XPe.
NTFS will help you to compress the file system and thus to minimize the disk space usage.

- Basic: NT Loader/Nt Detect/Windows API - GDI, Kernel, User, Advanced/Shell32 API/Ole32, OleAut32/Language support/Winsock/Common Controls and dialogs/etc.

- Terminal Server Client

- Windows RAM Disk Driver (this is for RAM or Remote Boot support if you plan to use it)

- Winlogon or Minlogon (big difference in image footprint and significant difference in boot time)

- CMD shell (useful for debugging purposes). You can remove this shell later when you are done with the image config, or replace it with your own custom shell or the MSTSC.exe (RDP client) running as the shell.

If you do the above with Minlogon your image size can be less than 96Mb (uncompressed). With Winlogon it will grow by ~20-30Mb. The given numbers are rough and I got those off some of my demo images I have here with RDP client working. The actual image size may very dependent on some other system specs. I often do more optimizations for image sizes in TD and post-FBA even if Minlogon used. Then 96Mb can go down to 64Mb or less (with RDP still working).

Perhaps, the most sensitive area is the hardware support. If you use TAP output to create platform macro component please make sure to disable there as many component dependencies as possible based on your final target device specs. I typically use Selector Prototype component as the prototype for all my platform macros and it helps me to disable unnecessary hardware support in TD.

Often, the challenge is to get the networking to work properly in the MS Client network environment and not blow the image size with lots of network components that have over-crossing dependency chains.

So, continuing the path I just drew I got my VPC Minlogon image with fully capable RDP client running and tested down to uncompressed size ~38Mb / compressed ~28Mb.
Plus you'd want to add 2-3 more Mb for the Terminal Services dynamic cache that is created at run time.
Note: one real device the image would be smaller.

As you can plainly see, the image is small enough to fit on cheapest 32Mb USB stick (yes, possible to make it to boot the image off a USB 2.0 device). Just imagine - you carry a stick that you can boot off on any Intel/AMD based machine and it will give you a Terminal Server Client capabilities to your network - cool!

XPe tip #49: Why HORM restricted to EWF-RAM?

There was an interesting question posted to the newsgroup recently that I attempted to answer. Thought I want to add the ideas posted to my blog as well.

The question was why HORM (feature of XPe SP2) is limited to EWF-RAM mode. HORM stands for Hibernate-Once-Resume-Many embedded enabling feature.
In Fp2007 to the ewfmgr console tool (as well as to EWF API) a couple of new commands/functions have been added that are directly related to the HORM and basically allow you to activate and deactivate the feature at run time.

You will also find a lot of internals on how HORM works in this thread where Slobodan and me discussed the HORM feature implementations way before it was called HORM :-)

Now back to the original question. Why we are limitted to RAM overlay when HORM is used?
I guess the main "limitation" there of EWF Disk mode vs EWF RAM Reg is that by default the overlay data is "persistent" in Disk mode.
The major reason for using EWF RAM for HORM scenarios was that the changes you do while you work with the image that are redirected to RAM are going to get lost on reboot. This way whatever OS does about the hibernation file (hiberfil.sys) will be lost as well and you can always go back to the original state of the device - this is basically the main purpose of the HORM.
With EWF Disk the changes are redirected to Disk overlay (another hidden partition on the disk) and EWF will pick them up at the next boot until you clean the overlay or commit the changes (there is also a minor difference in behavior depending on what restore level you selected for the disk mode but it is irrelevant to this discussion).

So, imagine the following scenario:

1) HORM is enabled. Hibernation file holds a valid state of the system saved at some point of time.

2) EWF is enabled too. Disk overlay is used. Initially the cache is *clean* (not really possible with EWF Disk mode when you enable HORM but just for the sake of this explanation).

3) You boot the image and it boots fast from the hiberfil.sys file.

4) You work. FS (actually disk level) writes are redirected to Disk overlay, no files on the protected partition are actually changed on the disk including the hiberfil.sys.

5) You reboot the system (gracefully or not doesn't matter since you are EWF protected).

6) The system boots from the same (old!) hiberfil.sys file. This means it will restore all the driver and application states to whatever states they had in memory at the moment the golden image was hibernated. EWF driver also has some (old) state there where it remembers what was the Disk overlay content at that time, i.e. it thinks the overlay is *clean* (see step#2).
However, your actual Disk Overlay is not clean since it was modified in an earlier OS session (see step#4). Now you get a discrepancy between the disk overlay and its state in RAM (EWF state machine). This will likely lead to an exception within the EWF driver and, since EWF is a kernel driver, to a BSOD.

The situation is even worse if we go real and don't assume the *clean* initial state of the EWF overlay. Since at the time we hibernated the image the EWF was already enabled (it must be enabled) you already got some changes to the disk (often unintentional such as logs, registry hive changes, etc.) that were redirected to the overlay.

Another issue could be that if you need to commit the changes (and stop the HORM) you won't be able to do that. The actual commit in Disk overlay mode occurs at the next boot time after you issued the commit command. At boot EWF driver reads all the changes off the hidden data partition and applies them to the disk. Obviously, this happens way later than the OS (the loader and the kernel) loads the RAM image off the hiberfil.sys (i.e. restores the system from the hibernation file). Since the hiberfil.sys file is the old one the EWF driver there (its state machine) doesn't know the commit command was issued and you will never get out of the loop.

The only way to test the above scenario is to use one of the "hacks" we discussed with Slobodan in the newsgroup (see the link above). Otherwise, HORM will be disabled as reported by the ewfmgr. I also discovered that hibernation won't work at all while EWF Disk Mode is protecting (enabled) the system partition that holds the hiberfil.sys file.

The actual bug, in my opinion, is there in the latest Enhanced Write Filter component UI in TD. There Microsoft added the checkbox for HORM - very nice feature - but hesitate to disable the option when the settings are changed to Disk Overlay usage.

XPe tip #48: Get rid of System Tray icons and remap the UI

There are some 3rd party applications (and even Microsoft application like BlueTooth systray agent) that install icons in the SysTray area - the right-lower corner of the Desktop screen. Some of these application rely on the presence of that icon to provide a way to access their UI. Unfortunately, often there is no other way left.

When you are trying to get rid of Explorer shell (on some embedded devices that run a custom shell apps it will make sense) you will remove the SysTray ("Notification Area") as well since it is a part of the Explorer application. More precisely it is a part of the TaskBar feature of Explorer Desktop that runs as system shell.

So, the question is how to get rid of the SysTray (the Explorer) but yet have a way to access such application UI? No easy answer.

Here is what I'd suggest (posted my thoughts in this newsgroup thread).

1) The best approach would be to implement the missing Explorer features within your own custom shell application. It is really hard with regards to some features and requires a lot of debugging on XP/XPe to understand how those features work and how to invoke and integrate them into your own application.
Just to encourage you, I've done this for some of the Explorer features such as AutoPlay, Taskbar, Volume Manager and etc.

2) How 3rd party applications install the icons in the System Tray area?
It is basically about about one API function - Shell_NotifyIcon - that is being used by applications to add icons to SysTray area and be able to receive notifications from the shell by user clicks on those icons through the application window procedures (when you call that API you register your window procedure to receive the events). The Shell_NotifyIcon is exported by shell32.dll. Assuming you don't use shell32.dll in your image that is supposed to run a custom shell, you can create your own version of shell32.dll that will expose (export) that function only. Used as a shim it will give you the full control over what and when that application is doing in the UI.

3) Another way, much simpler, is to find out what messages are sent to the application window procedure when user clicks on the application specific icon in SysTray area or selects an item on the appropriate menu shown by the left-[right-]click on the SysTray icon. Knowing those messages and their parameters (likely WM_COMMAND messages) you can always implement them and send them out from your own shell to the 3rd party application being investigated. This way you can "remap" all the functions of the application icon in SysTray to your own UI.
The best way to watch for the messages is to start with Microsoft Spy++ tool. Next steps is to subclass the window (via global hooks in Win32) and watch for the messages through debug output.


While working on demo images that I deploy on VPC (Virtual PC 2004-2007) I noticed an interesting thing that makes a perfect sense though.

I use the newest feature of Virtual Server 2005 R2 (and I love the feature) - Virtual Disk Mounting. That basically requires Microsoft Virtual Server SCSI disk device driver.
What I noticed is that if I leave the VHD (virtual disk image) mounted in the OS on my development machine and I try to boot that image in VPC - I catch the UNMOUNTABLE_BOOT_DISK BSOD right away.

This makes sense since the volume is not available for writes from within the VPC as the host OS locks the VHD image with the new disk device driver in the chain.

The tip is simple - don't do that! Don't mount the VHD and try to boot from it in VPC.

XPe tip #46: Another EWF related BSOD: UNMOUNTABLE_BOOT_DISK

Just discovered that if you use EWF Disk Overlay mode and set the disk overlay size (in TD) to something very small (by default it is only 1024Kb) you are going to get the UNMOUNTABLE_BOOT_DISK BSOD (Blue Screen of Death) right after the second reboot after you enabled the EWF.

The reason is, probably, that when you enable EWF and reboot (to take the change in affect) the OS starts redirecting all the disk writes to the Disk overlay. It is trying to but obviously the size of the overlay (the size of the hidden partition that holds that Disk overlay) is not enough to cover all the changes, even unintentional ones like underlying system logs, changes to registry hives, etc. Somehow EWF doesn't give you any notifications about it (and that may make sense on a headless embedded devices, btw). At the next boot the disk writes saved in the overlay are validated and this leads to the BSOD.

It is actually easy to prove that it is the EWF (not corrupted FS or etc.) that is the cause fot the BSOD. Just open up the system registry hive of the crashed image offline (from another OS load such as XP Pro or WinPE) and remove the EWF from the list of volume upper filters for the disk driver. You will be able to boot the "patched" image just fine.

Another way to prove that is to delete the EWF hidden partition that holds the broken overlay. Obviously, EWF will become useless after that but you can always get it back to the default working state by issuing the following command "rundll32 ewfdll.dll, ConfigureEwf" that will re-create and initialize the EWF config (overlay) partition on the disk.