IEEE-1588 Software Design and Multi-core

From a high level, IEEE-1588 software design and multi-core design and implementations consist of time-stamping hardware, protocol parsing software and filtering and clock correcting algorithm. The prior article mainly dealt with how the time-stamps were exchanged between master and its various slaves while the next article would deal with the related algorithms. This section deals with how all this gets tied together as a complete system and solution that achieves synchronization.

In order to keep the operating-system and stack latencies from adding up, it is beneficial to time-stamp the packets as close to the hardware as possible i.e. when a PTP message leaves the master or when it reaches the slave. This requires a special time-stamping circuit with a Real Time Clock in the hardware. This hardware either has to understand the protocol (so as to insert the RTC value at the correct offset in the message), or otherwise, maintain a buffer or queue that the protocol parsing software can pick time-stamps  from, for each received or transmitted message.

Secondly, due to the above requirement, it becomes necessary for the hardware to be aware of messages that are PTP versus the other normal packets. This is because only PTP packets would need to be time-stamped. Such classified packets are separate from data-traffic and can then be switched to the 1588 protocol parsing software directly without requiring another slower classification at the software level.

Contrary to Sync-E, software plays the major role in IEEE 1588 while hardware lends a helping hand. The major responsibilities of the software are:

  1. Setting up the system based on the capability of the slave and advertisement by the master.
  2. Selecting the master clock by processing the Announce messages.
  3. Forming and maintaining the set of {t1, t2, t3, t4} time-stamps. These time-stamps, as we have seen before, come from different kinds of messages (t1, t2 from Sync message; t3, t4 from Delay response message). These messages have unique sequence numbers, so that the correct association can be formed. The parser also has to take care of the follow-up time-stamps if the master supports a two-step clock. In the real world scenarios, packet loss/re-ordering would render any set incomplete and such cases of packet-loss have to be handled gracefully by the software, as in such cases the rest of the collected time-stamps have to be discarded or re-arranged after a programmed time-out for each set.
  4. Processing the collected time-stamps and running the clock-correction algorithm to correct the DCO (Digitally controlled oscillator)
  5. Switch the hardware to locked, holdover or free-running states.
  6. Provide PTP system management and monitoring interface to the user through CLIs and APIs.

The 1588 Protocol stack is the full suite of message parsers and generators for all the defined message types in the 1588 PTP standard. The clock correction algorithms, on the other hand, only require the extracted time-stamps for processing to calculate the corrections for the Digital Clock Oscillator. Thus these two parts can work as separate threads doing their respective job with dedication and exchanging the necessary information through standard interfaces.

content-security

...that's not all folks!

To continue reading the article, please Login/Register

(c) AVChrono 2021, All Rights Reserved

IEEE-1588 synchronization

In continuation of time-synchronization article series, conceptually, IEEE-1588 synchronization is as simple as if you were asking the time from somebody and (s)he replying after looking at his/her watch. You would then correct your watch accordingly. This is simply:

Time to be set = Time told + some delay
But this simplicity just ends here.

Let us look at some of the issues we would face in the above scheme:

  1. To be very precise, you might want to calculate the exact delay it took him/her to tell you the time.
  2. In addition compensate for the time it would take you to set your own watch accordingly.
  3. What if this person is 4 km away and fastest way he can tell you the time is writing it on a piece of paper and sending through a homing pigeon? (No other source available)
  4. For calculating above delays whose time would you consider? Your own watch is not working as expected and you of course cannot borrow that person’s watch and there is no other reference.
  5. You definitely have no way of knowing when this carrier started flying towards you or if he lost his way in between, once or more, and took some breaks too.
  6. You probably want to send your own pet homing pigeons, to fly to him and back, to get an idea of flying latency by dividing this time by half.
  7. And if the delay you finally approximate is significantly more than the preciseness with which you want to set the time in your watch? Is it even possible?
  8. Maybe you plan to average out these delays over several trips to get a better approximation.
  9. Would your calculations be generic enough to give you the correct result, if this homing pigeon were to be replaced with a homing-duck or on the contrary if this person can tell you the time directly by calling on your phone?
  10. Wouldn’t your averaging process get skewed if you got no intimation of this abrupt change in carrier? Is there a way to know looking at the time-stamp? What if it is just a gradual but continuous slip?
  11. How would you conclude that your time is finally set and accurate w.r.t. that person? There is no second/third reference.
  12. Would you set your watch only once or correct it at every instance of the pigeon’s flight?
  13. Would you benefit by employing more than one flying carriers?

These questions are not an exaggeration of the situation, because in the computer networks the IEEE 1588 synchronization protocol aims at providing nanosecond accuracy even when the packets that are carrying the time-stamp data have millisecond latencies and sudden, random variations in delay, PDV (Packet Delay Variation), due to congestion, link failures and in addition the dynamic route changes, that remain invisible to the communicating nodes at two ends.

Let us now see how IEEE 1588v2 (version 2) works and accomplishes synchronization. The next figure is what you would commonly see no matter where you start reading about this technology because it depicts the idea very neatly:

IEEE-1588 synchronization

(Source: RTA Automation)

There are various interesting things to note in this figure:

  1. There are two time-lines (Master on left and Salve on right hand side) that depict the time in their respective clocks at any instant.
  2. Slave is a system that wants to synchronize its time with the Master clock. Master may be any system that is itself a very precise clock or getting synchronized from somewhere else.
  3. Slave clock (as you can see on top) starts from a garbage value (30) which is un-important and ir-relevant as it has to finally correct this time.
  4. It is also unimportant “when” the slave wants to synchronize with master. Slave system can boot-up at any time and wish to synchronize no matter what time it is.
  5. In IEEE-1588 synchronizaion, a Master system sends “Sync” messages (periodically) to the interested slaves. These are a type of PTP (Precision Time Protocol) messages that are sent over the IP protocol and contain the time-stamp (t1) when this packet left the master (like when the homing pigeon starts flying).
  6. This Sync Message reaches the slave system at time (t2). Note here that t1, t2 are not comparable with each other as they come from different clocks running independently, at least one of which (slave) contains a garbage value to start with. The only thing common between them is that both are the same physical quantity i.e. TIME. Also, we have discussed before that, it is not wise to say that slave is behind or ahead of master until we know for sure that slave is not faster or slower than the master. Otherwise no matter how much precisely we correct the slave clock, it will drift off.
  7. Once the Sync message has reached, the slave has two time-stamps t1 and t2. The slave can set its own time as equal to t1, but then it would not have accounted for the network delay (the time it took pigeon to fly to you) and a mere subtraction of these two would not give any meaningful information as both these clocks started at different unrelated and unknown times.
  8. In order to estimate this network delay, the slave sends a “Delay-Request” message back to the master at time t3, and the master replies to this with a “Delay-Response” message containing its own time t4, giving slave four different time-stamps to calculate the delay between master and slave, i.e. a set of {t1, t2, t3, t4}
  9. There is a concept of “Follow-up” message, but I will defer it for a later discussion.

content-security

...that's not all folks!

To continue reading the article, please Login/Register

(c) AVChrono 2021, All Rights Reserved

Synchronous Ethernet (SyncE)

In simple terms, Synchronous Ethernet extends the use of a PLL (Phase locked loop) clock to transmit data. At a very crude level, this, and only this, is the whole conceptual working of Synchronous Ethernet.

Synchronous Ethernet Functionality
Synchronous Ethernet Functionality (Picture Credit: http://www.oscilloquartz.com and Andre's blog)

At the physical layer, two Ethernet peer nodes are already synchronized through a PLL for the RX (receiving) end. A PLL works by using a negative feedback loop to lock. Just the way we tune a guitar: listening to a tuning-fork, plucking the string, comparing the sound and correcting the tension. In Ethernet, the receiving node monitors the incoming bits, compares their alignment and timing with its own, corrects it local oscillator and locks to the source. But, the story ends there. The extracted time is just used to receive the correct data by aligning clock to the incoming bits’ rise and fall times. But, when the same extracted clock is used to send the data out, it is called Synchronous Ethernet.

This post is in continuation to the series of posts on Synchronization in Telecommunication Networks. Previous posts:

Sync-E Hardware Expectations

The Sync-E concept, although straight forward, puts a lot of requirements at the hardware level for proper functioning. Few of these are:

  • Jitter/Wander tolerance, filtering and transfer
  • Reference Monitoring
  • Detect disconnection and switchover or holdover
  • Holdover stability
  • Hitless reference switching
  • Continuous averaging of locked reference
  • Support for active and backup Timing Card and hitless switching
  • Support 25, 125, 156.25 MHz and translation etc

Adhering to the standards the Synchronous Ethernet can provide better than 4.6 ppm accuracy across the network versus the conventional Ethernet (where free-running clocks have accuracy of 100 ppm between peer nodes).

Sync-E Software Requirements

The hardware would work efficiently with the given requirements. The software’s function in Synchronous Ethernet is more of a helping hand to the Hardware to function effectively. The ESMC (Ethernet Sync Message Channel) protocol is designed to communicate the Quality Level (QL) of the clock to the participating nodes. The nodes can thus decide, based on this information, the best source to lock to, thus forming a clock tree and hierarchy. The Sync-E standard (ITU-T G.8264) provides the rules to decipher the SSM (Synchronization Status Message) QLs.

Subscribe

Lock

The software standard defines the following:

  • Message encapsulation and priority
  • Quality level encoding (to inter-operate with SONET/SDH clock quality levels)
  • Best reference selection and Fail-over Switching
  • Handling bad PDUs (protocol data units) and preventing SSM floods
  • Distinguishing & handling events and information
  • Ten pps & five second rule: Ethernet’s slow-protocol PDUs have an upper limit of 10-pps and Sync-E follows this. Also, if a system does not receive ESMC packets for 5 seconds, it should switch-over to a different clock source.

content-security

...that's not all folks!

To continue reading the article, please Login/Register

(c) AVChrono 2021, All Rights Reserved

Synchronization in Networks

(Continuing from the previous post: "Clock Synchronization in Telecommunication Networks"...)

Like layers in a network, time synchronization consists of two basic layers:

  • Phase synchronization; and
  • Frequency Synchronization

(Read more about Synchronous / Asynchronous / Isochronous / Plesiochronous)

It is extremely easy to grasp this concept by considering the day-to-day questions:

  1. Why do you always make me wait?

It means the other person’s watch is behind (or your watch is ahead) of agreed-upon time. A 4 ‘o clock in other person’s watch might translate to 4:10 pm in yours. This is a phase error and requires phase synchronization.

  1. Is my watch/clock faster?

Faster is not only ahead. You may set it to behind a standard clock, but, eventually it would run ahead of it. This is a frequency error and it points to the paradoxical question, “Is my one second shorter than your one second?” Paradoxical, because the label 1-second is exactly 1 second, the rest manifests as error in the device.

Starting to apply these concepts in electronics, networking equipment and computer systems, we come to the point of addressing the first question, “Why can’t oscillators run at 125 MHz when they are designed to run at 125 MHz”? This is more of a physical and practical limitation to achieve the desired accuracy (accuracy, as we defined before, is an expectation, explicit or implicit). Theoretically it is easier to say 125 MHz, but practically what is being asked for is 125.000000000 MHz oscillator. The more zeros (or digits after the decimal place) it has, the more preciseness is being asked for. Routers and computers cannot be equipped with cesium clocks, as a cesium clock is a whole laboratory setup in itself and not just a small crystal.

Figure: Cesium Clock
(Source: National Institute of Standards and Technology)

Later we will see what is clock jitter and wander that define the quality of a clock. Higher quality comes at a premium price while equipment manufacturers are at a constraint to reduce the BOM (Bill of Material). Thus, an oscillator would specify that it can run at 10 MHz +/- 1 Hz (that is, 10.000001 to 9.999999 MHz).

Subscribe

Lock

An error of 1 Hz in 10 MHz means that:
The uncertainty of the crystal is 1 in 10,000,000.
Which is, 1 part per 10 million = 0.1 ppm
or, 100 ppb (parts per billion)

“Parts-Per” notation is a very important dimension-less quantity that just denotes a proportion and is important because you can compare many different un-related objects and services. Like the above accuracy is analogous (in order of magnitude) to, the following:

A packaging plant, packages 10 million chocolates, and only 1 out of them turn out to be bad.

One thing to emphasize is that, this error (+/- 1 Hz) is not just two numbers one positive and one negative, but infinite variations between those two extremes, depending on the heat generated, crystal stability, aging etc. Thus we have a range of XOs (crystal oscillators) sporting different technologies like: TCXO (Temperature Compensated), OCXO (Oven Controlled), TCOCXO (Temperature Compensated Oven Controlled), OCVCXO (Oven Controlled Voltage Controlled) oscillators to choose from, for specific needs. [Not covering details of oscillators in this paper]

Distribution of Time

A better quality oscillator provides better stability and less wander, akin to saying that you would have to set the time less often than you would on a cheaper quality clock. However, if you have the opportunity to set the time more frequently, then you might as well use the cheaper clock itself.

This is, in simple words, the whole idea of distribution of time, which brings us to the second question, “Does it matter if you wear a Tissot/Rado or a Rolex, but forget to set the time in either”? So, we can now equip a router or a network node or a computer system with a cheaper oscillator, because we can run a synchronization protocol that would “measure” and “correct” the error of the local RTC and free-running clocks at every minute, second or sub-second interval.
      

A very-stable clock is required only when you want to set-it once and refer to it many times later and expect it to be correct every time. Functionally, it is possible to emulate this for a more economical watch if it corrects itself every minute without your intervention, probably through a GPS receiver or Network Time Protocol. What needs to be ensured then, is, that the clock is stable for just 1 minute, as we are anyways going to correct it after that period. With the data traffic growing exponentially, the per-bit transfer cost is plummeting giving us the means to have more control-traffic in-band, thus changing the economics to deploy synchronization protocols that would not only offset the cost of precise oscillators but also make systems more flexible.

Before entering a mission the army men would sync-up their wrist watches, similarly, the network nodes would sync-up automatically at defined and periodic intervals with each other based on some derived or defined hierarchy. Thus the person with the most accurate time, because (s)he is the proud owner of an expensive time keeping device, would dictate it to others, who would further propagate it down-the line to their peers. However, like information turns into gossip with each participating node further away, similarly, the quality of time information degrades as it traverses different nodes.

Let us consider this time propagation as broadcasting service on a radio program. Consider three separate channels in your radio:

  • Channel-1: Frequency Broadcast
    Free-running ticks at regular intervals.
  • Channel 2: Phase Broadcast
    A periodic pulse or heart-beat after a defined set of ticks.
  • Channel 3: ToD (Time of Day) Broadcast
    RTC (Real Time Clock) value at regular intervals.

Let us now map various available synchronization solutions to these channels:

  1. Common clock source: Depending on hardware design, all 3 channels are possible.
  2. GPS clock distribution: Usually channel 3. Thus, any system that is receiving the GPS signals would receive the same time-of-day periodically and can align and correct its own clock. There is a risk of wandering off on signal interruptions, depending on local oscillator quality.
  3. Analog phase-locked loop: Usually channel 1 or 2. The receiver is un-aware of the time-of-day on the source system, but due to the frequency adjustments, the participating nodes are syntonized (i.e. they remain in phase lock with each other). This is the level where the Sync-E (Synchronous Ethernet) operates.
  4. Adaptive clock recovery: Depending on the defined packet-rate and actual arrival rate, the receiver can adjust its own clock. This is analogous to channel-1.
  5. Time-stamp in packet: Each packet contains an RTC time-stamp, i.e. the time-of-day information. Since an RTC time-stamp contains both time and tick information, it is a super-set of all three channels. This is where 1588 operates.

And here is the ready-reference matrix for this:

  Frequency Phase ToD
Common Clock
GPS    
Sync - E  
Adaptive    
IEEE 1588

content-security

...that's not all folks!

To continue reading the article, please Login/Register

(c) AVChrono 2021, All Rights Reserved

Time Synchronization in Telecommunication Networks

Clock synchronization across network nodes is the problem at hand. Instead of relying on elements “keeping” their own sense of time, a “distribution-of-time” approach is talked about here. The two promising solutions: Synchronous Ethernet and IEEE 1588 are presented in their approach, requirements, design and as an interesting comparison as well as a “combination” to deal with the challenge. Mandatory Sync-E hardware expectations and advanced 1588 concepts are covered. Ideas regarding modularization, extensions and software algorithmic requirements are individually defined. Plus, a new way of looking at IEEE 1588 software partitioning as a multi-threaded application or on a multi-core is presented.

In addition, the synchronization achieved is measurable both qualitatively and quantitatively using various tools and methods that are enumerated.

The changing scenario

It is a fact that the service providers have huge investment in legacy TDM, SONET/SDH, ATM network and equipment, however, what is also true is that they are cashing-in on the disruptive Ethernet technology. The reason is as the traffic grows exponentially, with the demands of the customers (like speed, bandwidth, choice, service and reliability) and newer applications, service providers are looking upon on newer and better technology to retain their profit margins (despite of competition and lower end-user costs) by moving to IP or MPLS, Ethernet, TDM Circuit Emulation and Mobile Backhaul.

This move, from a time synchronized to an asynchronous medium, although financially beneficial, is detrimental for the applications and services that rely fundamentally on the accuracy of time. These applications are designed for a network that has a small and discreetly measurable transmission delay and a significantly lower delay variation, both of which are absent in Ethernet. A typical service level would be:

Frame delay < 10ms
Frame delay variation 2ms
Frame error rate 0.0001%
Service disruption 50ms
Network availability 99.99%
Mean-time to repair 2h

Table: Service levels for mobile backhaul services
(Source: ADVA optical networking)

Ironically, this changing scenario is the reason that we are discussing synchronization in an asynchronous network.

Timing is Fundamental

Time, and its perceived accuracy, depends upon what use we put it to. Flight delays are not measured in seconds, getting your car repaired may take extra ten minutes while a delay of a day in construction may not be matters of any escalation or complaint. We have an implicit margin and a different expectation for each one of them. Reduction of these delays, by efficient processes and technology, is sometimes taken as a measure of progress and it is worth noting that that we have come a long way from pendulums to atomic clocks. The smallest measured time till current day, is 20 attoseconds (10-18) and the theoretically derived lower limit of time measurement is 10-44 seconds, known as the Planck Time (tp). As we keep on overcoming various physical limitations, this gap would go on decreasing.

Subscribe

Lock

For the computing machines, needless to say, one second is a long interval. Unlike us humans, they can talk to each other in nanoseconds and feel annoyed for micro-second delays. Just to put this into perspective, a nanosecond (10-9) when compared with 1 second (in terms of length) is analogous to a measurement accuracy of size of a virus or a DNA helix while measuring the height of an average human being.

Other analogies (in order of magnitude) for comparing one nanosecond to one second are:

  • 1 paisa in 1 crore rupees
    (or 1 cent in 10 million dollars)
  • Speed of an electron versus a snail.
  • Nerve cell potential versus a Lightening strike.

Time is one of the most accurately measured quantities and, considering this, it is really a big achievement, when we say that a Cesium clock has an uncertainty of 5.10 x 10-16 (Error of 1 second in 60 million years). We refer to these clocks as “Stratum-0” clock sources.

content-security

...that's not all folks!

To continue reading the article, please Login/Register

(c) AVChrono 2021, All Rights Reserved

You cannot copy content of this page
%d bloggers like this: