Citrix IMA and Zone Data Collector Communication
Summary
The following text is written to assist Citrix customers in understanding how IMA traffic works in reference to Zone Data Collector (ZDC) elections in a Presentation Server 4.0 environment. Detailed information on IMA traffic is used to help you understand the finite communication processes Citrix uses in zone server-to-server communications. The data was obtained from information provided by Citrix Engineering and Citrix Technical Support.
To demonstrate this information, the following information uses a fictional company named Miami Inc.
Case/Customer
A large Citrix customer, Miami Inc, is working to establish an understanding of how zone elections work in their environment, and gain the ability to understand what to look at should troubleshooting IMA Zone Elections in the environment be necessary. The customer needs to understand not only how zones should be set up, but also how communication amongst member servers and data collectors works across multiple zones during normal farm operations.
Miami Inc has approximately 12,000 Citrix users connected at any given time. The users access the farm from multiple global locations, some from the Corporate LAN and others via the Citrix Access Gateway SSL VPN.
Case Study Outline
IMA Communication – Traffic Fundamentals
The following information addresses questions about IMA basics.
How does IMA traffic get sent and processed amongst Citrix servers?
Most customers understand that IMA traffic goes over port 2512, but very few, like Miami Inc, understood how the traffic is attached to machines for processing by IMA.
Due to the stringent security requirements Miami Inc has around data traversal amongst networks, they need to understand how data is transported amongst servers in a zone.
Citrix has what we call a transport “function” that is responsible for getting packets of information from one host to another. The transport component is relatively small and does not actually care about the data it is transporting. This is a small set of functions for setting up bindings to host and subsequently sending packets to those hosts.
How does IMA know who the hosts in a farm are to ensure communication requests are from approved sources?
A set of functions that we refer to as the Host Resolver component are responsible for providing information about all of the hosts in the farm. It provides APIs for enumerating hosts, setting/getting a host’s zone, and mapping between some of the various ways used to refer to a remote host. Hosts may be identified by name (a simple UNICODE string), by HOSTID (a unique integer representing a host), or by host binding (HBINDING).
While this is good information, Miami Inc needs this put into greater detail for its internal security review so below we will explain more on how the actual connections are made.
Mapping connections for traffic sending between servers (hosts) in the farm
Various parts of the IMA system use different specifiers to refer to remote hosts. These types of specifiers include:
Host Names – Used by user interface components to refer to hosts. A host name is used in conjunction with a port specifier (typically the default IMA port, 2512) in order to create a binding with the Transport component detailed above. Every host has a definitive name that it determines itself when joining the farm.
Host ID – This is an integer used mostly be subsystems to refer to hosts.
As mentioned above, any time a message is sent to a remote host, it needs to have a host binding for that host.
The host resolver maintains two mapping hash tables for quick translations.
The host resolver’s main data structure is the HOST_RECORD, which contains a host’s name, zone name, IMA port, Management Console port, host ID, version, and ranking information. The ranking information is used by the Zone Manager, which is described below, when electing a zone master.
Connection State Information
A binding attempt is always in one of three states:
• Connecting
• Active
• Closing
When an outgoing connection is created, it is first placed in the state CONNECTING. This is a temporary state that quickly is changed to WAIT_BIND_REQUEST as the connection waits for a bind request to come back from the remote host. Once a BIND_REQUEST is received, the original host sends a BIND_RESPONSE packet and moves into the WAIT_BIND_COMMIT state. Once the BIND_COMMIT packet is received from the remote host, the connection is fully initialized and moves into the ACTIVE state.
The case of handling an incoming connection is similar. The connection is first placed into CONNECTING temporarily. A BIND_REQUEST packet is sent to the connecting client, and the local host moves to WAIT_BIND_RESPONSE. Once the BIND_RESPONSE comes back from the other host, the local host sends a BIND_COMMIT and moves into the ACTIVE state.
How many connections to servers in the farm can IMA process/keep at one time?
While there is no finite answer to this, there is a registry setting that limits the Host Resolver to keeping only 512 open connections to hosts. This is very important in large farm design, and it can be manipulated.
The connections to hosts in a zone by a ZDC do not last forever, and can be torn down and re-established. It is important to farm performance that steps are taken in the zone to limit this teardown/setup process from occurring, and bumping up the registry setting alleviates this in zones with more than 512 hosts. The registry setting is:
HKEY_LOCAL_MACHINE\Software\Citrix\IMA\Runtime\ MaxHostAddressCacheEntries
When Miami Inc designs their global farm, the ZDC setup is of the utmost importance as the number of servers in each zone will grow over time to very high levels. A thorough understanding of this setting and the following information is critical.
Zone Setup and Information
What is the function of a zone?
Zones perform two functions:
• Collecting data from member servers in the zone
• Distributing changes in the zone to other servers in the farm
What is a Zone Data Collector (ZDC)?
Each zone in a Presentation Server farm has its own “traffic cop” or ZDC. A ZDC may also at times be referred to as the Zone Manager. The ZDC maintains all load and session information for every server in the zone. ZDCs keep open connections to other farm ZDCs for zone communication needs. Changes to/from member servers of a ZDCs zone are immediately propagated to the other ZDCs in the farm.
How does the ZDC keep track of all of the hosts in the farm to make sure they are live?
If ZDC does not receive an update within the configured amount of time from a member server (default 1 minute) in its zone, it sends a ping (IMAPing) to the member server in question. This timeframe can be configured in:
HKEY_LOCAL_MACHINE\Software\Citrix\IMA\Runtime\KeepAliveInterval
If ZDC does not receive an update within the configured amount of time from a peer ZDC server, it does not continually ping the “lost” ZDC. It waits a default of 5 minutes, which is configurable in: HKEY_LOCAL_MACHINE\Software\Citrix\IMA\Runtime\GatewayValidationInterval
How does the ZDC ensure servers communicating with are in the farm and authorized to trade information?
There are several layers of security used in this process, including those that exist in the Transport and Host Resolver functions. One of the most important checks a ZDC does to allow a server to communicate within the farm is called a magic number check. Magic Numbers are set the first time a server in a farm is joined into a farm.
If a server in the farm has a different magic number than the ZDC expects, it can cause the server to believe that it is in it’s own farm and declare itself a data collector, thus causing two data collectors to exist in a single zone and causing further zone elections.
The document offers more information around this setting:
Is there a setting for when the member servers in a zone update the Data Collector?
All updates a member server has are sent to the ZDC as soon as they are generated. Below is a graphical image of how both inter and intra zone IMA communications occur in an idle farm.
Most IMA traffic is a result of the generation of events. When a client connects, disconnects, logs off, and so on, the member server must update its load, license count, and so on to the data collector in its zone. The data collector in turn must replicate this information to all the other data collectors in the farm.
The client requests the data collector to resolve the published application to the IP address of the least loaded servers in the farm.
The client then connects to the least loaded server returned by the data collector.
The member server then updates its load, licensing, and connected session information to the data collector for its zone.
The data collector then forwards this information to all the other data collectors in the farm.
Important: Notice in the communication diagram there is no communication to the data store. Connections are independent of the data store and can occur when the data store is not available. Connection performance is not affected by a busy data store.
Election Process in Detail
What is meant by a Zone Data Collector election?
Should for any reason this ZDC not be available, another server in the zone can take over this role in its place. The process of taking this role is known as an election. The setup of how these elections take place are very important in a Presentation Server farm design, especially in large environments like Miami Inc’s. Miami Inc has a global distributed Citrix environment, where farm communication is heavily reliant on zone setup.
What server is the “boss,” and how is that determined?
Server Administrators must choose the Zone Data Collector strategy carefully during farm design. There are many variables associated with this process that are outside the scope of this document. When an election needs to occur in a zone, the winner of the election is determined using the following criteria:
• Highest Presentation Server version first (should always be 1)
• Highest rank (as configured in the Management Console)
• Highest Host ID number (a Host ID is just a number – every server has a unique ID)
If you want to see the HostID number and its version, you can run the queryhr.exe utility (with no parameters). You’ll get something that looks like this:
C:\>QueryHR.exe
---- Showing Hosts for "10.8.4.0" ----
Host 1:
-----------------------------
Zone Name: 10.8.4.0
Host Name: FTLDTERRYDU02
Admin Port: 2513
Ima Port: 2512
Host ID: 8022
Master Ranking: 1
Master Version: 1
-----------------------------
--- Show Host Records Completed ---
New Data Collector Election Process
When a communication failure occurs between a member server and the data collector for its zone or between data collectors, the election process begins in the zone. Here are some examples of how ZDC elections can be triggered and a high level of summary of the election process. A detailed description of this process and the associated functions used is further below in this document.
1. The existing data collector for Zone 1 has an unplanned failure for some reason, that is, a RAID controller fails causing the server to blue screen. If the server is shutdown gracefully, it triggers the election process before going down.
2. The servers in the zone recognize the data collector has gone down and start the election process.
3. The member servers in the zone then send all of their information to the new data collector for the zone. This is a function of the number each server has of sessions, disconnected session and applications.
4. In turn the new data collector replicates this information to all other data collectors in the farm.
Important: The data collector election process is not dependent on the data store.
Note: If the data collector goes down, sessions connected to other servers in the farm are unaffected.
Misconception: “If a data collector goes down, there is a single point of failure.”
Actual: The data collector election process is triggered automatically without administrative intervention. Existing as well as incoming users are not affected by the election process, as a new data collector is elected almost instantaneously. Data collector elections are not dependent on the data store.
Detailed Election Process:
As we know, each server in the zone has a ranking that is assigned to it. This ranking is configurable such that the servers in a zone can be ranked by an administrator in terms of which server is most desired to serve as the zone master. “Ties” between servers with the same administrative ranking are broken by using the HOST IDs assigned to the servers; the higher the host ID, the higher-ranked the host.
The process that occurs when an election situation begins is as follows:
1. When a server comes on-line, or fails to contact the previously-elected zone master, it starts an election by sending an ELECT_MASTER message to each of the hosts in the zone that are ranked higher than it.
2. When a server receives an ELECT_MASTER message, it replies to the sender with an ELECT_MASTER_ACK message. This ACK informs the sender that the receiving host will take over the responsibility of electing a new master. If the receiving host is not already in an election, it will continue the election by sending an ELECT_MASTER message to all of the hosts that are ranked higher than itself.
3. If a server does not receive any ELECT_MASTER_ACK messages from the higher-ranked hosts to which it sent ELECT_MASTER, it will assume that it is the highest ranked host that is alive, and will then send a DECLARE_MASTER message to all other hosts in the zone.
4. When a server that has previously sent an ELECT_MASTER message to the higher-ranked host(s) in the zone receives an ELECT_MASTER_ACK from at least one of those hosts, it enters a wait state, waiting for the receipt of a DECLARE_MASTER from another host. If a configurable timeout expires before this DECLARE_MASTER is received, the host will increase its timeout and begin the election again.
At the conclusion of the election, each host will have received a DECLARE_MASTER message from the new zone master.
What happens if a server incorrectly believes a new ZDC has won (false winner)?
Once the two ZDCs “fix” themselves through ZDC to ZDC communications establishing who the proper ZDC is, a direct communication to the member server(s) is sent notifying it of the correct ZDC for member servers to use.
Supporting data:
• Any state change on server (logon/logoff, disconnect/reconnect, load change) triggers a dynamic data update.
• Member server notifies its DC of the change, and in turn….
• The member server’s DC notifies ALL other DCs of the change.
Communication Events:
• Member server to zone DC heartbeat check.
• Key: HKEY_LOCAL_MACHINE\Software\Citrix\IMA\Runtime\KeepAliveInterval
• Default value: 60000 milliseconds REG_DWORD: 0xEA60
What happens if a server believes it is the new ZDC but the PZDC is still alive and has not resigned?
There are two ZDCs for a finite amount of time, however our code ensures that the ZDCs communicate to each other and communicate the true ZDC to all member servers in the farm once the election process has run its course. Presuming that the original server does not have a lower preference level than the “new” ZDC, it will close to always remain the ZDC, and in turn broadcast its status to all servers in the farm.