Performance Analysis in Real Time Systems

Real time systems, by definition, have time constraints. When developing a real time or embedded system, it is usually necessary to validate and measure the performance of the system.

What are some ways to measure performance?

Measuring the overall performance of some part of the system

A straightforward way to get a performance measurement, if using an operating system with a time or date function, such as Linux, is to put a call to time or date at the start and end of a test program that exercises the system in some known way. For example, in a networking system:

#define NUM_PACKETS 100000
int start = time();
for ( int i = 0; i < NUM_PACKETS; i++ )
int end = time();

int packets_per_second =  NUM_PACKETS / ( end - start);

In this example, be sure to account for the fact that the granularity of time() is seconds. The number of test loops has to be large enough to give a meaningful result. You may want to remove the effect of the test loop itself as well, by making a measurement with transmit_packet() commented out and subtracting that time from the time measured with transmit_packet() present. (but be sure the loop is not optimized away when transmit_packet() is commented out)

Also remember that in a multitasking system, other tasks may be running along with the one you are trying to measure. It may be necessary for you to disable task switching, or setup a special software load in order to get the measurement you want.

Refinement using a logic analyzer

If the performance found in a test similar to the above is not satisfactory, the next step is to try to determine where the bottleneck is. Where is the code spending most of the time? This information will enable you to decide what part of the code to try to refine and improve.

If you have a logic analyzer available, you can use it to analyze performance by adding a bit of code which will generate an analyzer trace, in some key locations.

One way to do this is to set the logic analyzer up to capture and store accesses to a known memory location. Start the analyzer, run the test program, stop the analyzer, and view the trace. In the example shown below, the data at each trace entry will tell you where in the program the trace entry originated. The time elapsed between trace entries will tell you how much time was spent in the various sections of your code.

     logic analyzer trace
#define TRACE_LOCATION  0xff001000
void trace(int trace_number)
    *(unsigned long *)TRACE_LOCATION = trace_number;

    transmit packet routine
void transmit_packet(char *buffer)



	char * tx_buf = get_next_tmt_buffer;

	copy_packet_data_to_tmt_buffer(buffer, tx_buff);






This technique can be used repeatedly to drill down into the code to find bottlenecks and potential areas to improve. In the above example, if you find that the copy_packet_data_to_tmt_buffer() routine is consuming a large fraction of the total time, add trace entries within that routine to determine where in that routine the delays are occurring. Perhaps you will find that the copy routine is copying data one byte at a time, rather than using 32 bit words when it can. Maybe you will find that copying packet data is not a good idea in the first place and re-design the code so that it does not have to do a copy.

Using an in-memory trace buffer

If a logic analyzer is not available, another similar technique is to create a circular trace buffer in memory, and store time trace entries as the code is running.

     in memory timing trace
#define TRACE_LOCATION  0xff001000
#define NUM_TRACE_ENTRIES  128
static unsigned long *ptr_trace_entry = TRACE_LOCATION;
void trace(int trace_number)
    *ptr_trace_entry++ = trace_number;
    *ptr_trace_entry++ = time();

    if ((unsigned long)ptr_trace_entry >= TRACE_LOCATION + NUM_TRACE_ENTRIES * 8)
        ptr_trace_entry = TRACE_LOCATION;

Use this in-memory trace routine in your code as shown above for the logic analyzer example. After the code runs, use a debugger or printf routine to dump out the memory at TRACE_LOCATION. The time at which each trace entry was reached is stored at the location following the trace entry number.

The above technique can be refined in many ways.

If your system has a hardware counter that is based on your system clock signal, reading and storing that at the trace points will provide better granularity than a function like time(). Most real time embedded systems have such a hardware counter that is used for a ‘tick’ clock which is used for task scheduling. In vxWorks, the tick clock is returned using tickGet(). Typically this is running at a 16.67mS rate. ( 60 ticks / second ). For measurements faster than tickGet() can provide, the actual hardware counter can be used. In vxWorks this can be retrieved using vxTimeBaseGet().

Another refinement to the memory trace routine would be to store the delta time between entries, since typically you want to measure the elapsed time between entries rather than the absolute time of the entry.

Also, since the memory trace routine is running as a circular buffer, you may want to write an extra flag word, such as -1, just after the entry location so that you can easily locate the last entry in the buffer.

To make things easier, you can code a small dump routine that will parse the trace buffer and generate a more human readable output to a console.

Development System Tools

The above basic methods do not require specialized tools from your software development system. However, many software development tools have support for extracting timing measurements from a running real time system. Any available tools should be examined for applicability to your situation.

Using Your Imagination

The above techniques are starting points for measuring timing in a real time system. It is often the case that a certain amount of creativity is required to determine how your system is operating. Hopefully this note will provide a jumping off point for your search.

Please contact us to see how we can help you with your project!

Comments are closed