AXIAL TOMOGRAHIC RECONSTRUCTION

Computed axial tomographic reconstruction using back-projection is presented as a challenging real-world application illustrating the performance obtainable with Alacron's FastSeries architecture compared to standard cluster architectures. We will calculate the performance of the latest microprocessors from Analog Devices, Intel, Philips Semiconductors, and Texas Instruments.The important properties of the selected processors are presented in the table below. Cluster mode designs reach bus saturation very quickly to the point where adding additional processors will not improve the rate at which calculations can be performed. The table assumes that the processors are isolated from each other when they are performing the calculations. This assumption is valid because most multiprocessor co-processor boards have isolated or local memory designs due to scalability issues. The only non-local memory design is the "native" PIII-450 design, added for comparison purposes.

SPECIFICATION

Intel PIII-450

Philips TM1300

TI C6701

ADI 21160

Architecture

CISC

VLIW

VLIW

VLIW

FPU

Yes

Yes

Yes

Yes

MFLOPs (Peak)

250

720

1000

600

MIPS (Peak)

450

900

1336

100

MOPS (Peak)

900

4860

1336

800

Memory Bus Bandwidth (MB/s)
800
570
332
400
1K FP cfft (µsec)
300
106
108
90
1K 16 bit cfft (µsec)
191
63
108
90
1F FP dot product (µsec)
7.38
2.84
3.07
5.12
16 x 16 MACs (MMAC/s)
2.32
1.42
3.07
5.32
8 x 8 MACs (MMAC/s)
9.34
6.55
7.11
11.80
512x512x8 bit Conv 3x3 (msec)
5.34
2.62
7.11
11.80
512x512x512x 8 bit Erosion/Dilation (msec)
7.34
1.42
3.62
3.93
"Glue" Logic Cost ($/CPU)
$150
$3
$65
$39
CPU Price ($)
$300
$100
$150
$100

 

Description of the Application

The reconstruction region is a square array of 1024 x 1024 pixels centered on the axis of rotation of the x-ray head (Figure 1). Data is collected from the x-ray head as radial scans, with 4 channels of 4096 values for each of 360 angles. The data collected occupies 4*4096*360*2 = 11,796,480 bytes. The data is moved into memory at an average rate of 5 megabytes per second (Figure 2).

Click to Enlarge

Figure 1: Relation of the region of reconstruction to the x-ray source and detector array.

Click to Enlarge

Figure 2: Data collection.

The following steps reconstruct the pixel densities:

  1. Calculation of the density estimate from the output of the four detectors at the end of each beam path.
  2. Back-projection. Conversion of the fan shaped polar form of the data to the square array of pixels in the reconstruction region is accomplished by summing the density along each ray into each pixel the ray encounters. Linear interpolation is used when the ray passes between two pixels (Figure 3).

Click to Enlarge

Figure 3: Back Projection.

Four channels of data are used to estimate the actual density along the beam path to compensate for the normal beam hardening that occurs as x-rays pass through an object.

The value summed into each pixel is:

Click to Enlarge

where Dbeam is the density measured for the current beam being processed and alpha is determined from the geometry of the beam.

  1. Application of a 2D FFT to the image array.
  2. Application of a weighting factor of the distance of each pixel from the axis of rotation. This compensates for the greater number of rays passing through pixels closer to the axis of rotation.
  3. Application of an inverse 2D FFT to the whole array
    Steps 2 through 5 effectively apply a convolution to the data to correct for the weighting error generated by the beam geometry, giving a good approximation to the density distribution of the object positioned in the reconstruction area 1.

Algorithmic Complexity

The number of operations performed in each step can be counted as follows. These are summarized in Table 1.

Density estimation from detector data: A fifth order polynomial is applied to the output of each the four detectors at the end of each beam path, and the result of these four polynomials are summed to give the density estimate. This step requires 4*(5 Multiples + 5 Adds) plus three adds per beam path, or 81,920 MACs (multiply accumulates) and 12,288 adds. For 360 angles, the total number of MACs is 29,491,200, and the total number of adds is 4,423,680. Additionally, each detector has an array of 24 coefficients and 4 values that must be read from memory, followed by a write of the density estimate. This results in 41,287,680 reads from memory and 1,474,560 writes to memory to process all 360 angles.

Back-projection: For each angle, one third of the beam path pass into the top and out from the bottom of the reconstruction region, encountering 2 pixels for each of the 1024 lines it crosses. The weights for each beam path is pre-computed, and so each of the 2048 pixels on these 1366 beams requires one MAC, or a total of 2,797,568 MACs. The other two thirds of the beam paths access from 3 to 2048 pixels depending on how far up the left or right side of the reconstruction region the ray path lies. In total, these beam paths access 2,300,323 pixels, requiring 2,300,323 MACs. For all beams then, 5,097,891 MACs are needed. Each MAC requires a pixel value to be read from memory, a table value to be read from memory, and a final write of the new pixel value back to memory. For all of the 360 angles, the back-projection step required 1,835,240,760 MACs, 3,670,481,520 reads from memory, and 1,835,240,760 writes to memory.

2D FFT: A 1024 x 1024 2D real FFT is composed of 2048 real 1024 point FFTs, 1024 along the rows and 1024 along the columns. Two real signals can be processed concurrently with one complex 1024 point FFT plus 2,044 additional additions. Two real FFTs can be performed by this method in 460 microseconds on a single 40 MHz SHARC 2106x processor, averaging to 230 microseconds for a 1024 point real FFT. The whole 2D FFT thus requires 0.470 seconds plus, plus 0.013 seconds required to read and write the whole array twice from memory, or 0.483 seconds on a single processor.

Pixel weighting by radial position: A pre-computed table of radius values is read with the image data, multiplied and written back to memory. This step requires 2,097,152 reads from memory, 1,048,576 multiplies, and 1,048,576 writes to memory.

Inverse 2D FFT: The 2D inverse FFT requires the same number of operations as required by the 2D FFT in step 3.

Step

MACs

Adds

Reads

Writes

1

29,491,200

4,423,000

41,287,000

1,474,000

2

1,835,240,000

0

3,670,481,000

1,835,240,000

3

28,800,000*

 

2,097,000

2,097,000

4

1,048,000

 

2,097,000

1,048,000

5

28,800,000*

 

2,097,000

2,097,000

Total

1,923,380,000

4,423,000

3,718,060,000

1,841,958,000

Table 1: Operations required.
*Based on averaged 230 µs FFT time.

Parallelization of the Algorithm

In order to improve performance, the computation will be divided between processors by splitting the reconstruction region into horizontal stripes. The first processor will get the first N rows (N = 1024 / number of processors), and each processor will get succeeding rows. The algorithm is inherently linear allowing this natural division between processors.

Memory Requirements

This algorithm uses several pre-computed tables. Each detector has a table of 24 calibration coefficients for a total of 98,304 four-byte floating point numbers. The pixel radius value table (distance from the rotation axis) requires 131,072 floats, taking advantage of symmetries. Again taking advantage of symmetries the ray table requires 57,351,273 floats. The reconstructed image requires 1,048,576 floats, and the input data requires 11,796,480 bytes. The total memory requirement is therefore under 256 megabytes.

Total (Elapsed) is the time the routine takes to run on one processor with overlapping I/O and compute.

Processor

Total

Time (Mac)

Time (Add)

Time (read)

Time (write)

TM1300-180

38.88

2.67

0.006

26.00

12.88

TI-C6701

85.18

5.79

0.013

57.27

27.91

ADSP21160

55.60

9.62

0.022

37.18

18.42

PIII-450-MMX

27.78

8.55

0.009

18.58

9.20

Processor Cost vs. Reconstruction Time

The illustration below shows the cost/performance analysis for the four processors.

Click to Register to Download the application note in pdf format

To Top of Page

news sales support downloads systems pmc cards camera support software applications frame grabbers about us search site map privacy legal home search site map privacy legal home site map search news sales support downloads systems pmc cards camera support software applications frame grabbers about us