# 2018 Brazilian Technology Symposium Memory mitigation techniques in space environments

Alexsander Deucher CTI Renato Archer Núcleo de Concepção de sistemas de Hardware Campinas, Brasil btx@hotmail.com

Saulo Finco CTI Renato Archer Núcleo de Concepção de sistemas de Hardware Campinas, Brasil saulo.finco@cti.gov.br

*Abstract*—Physical events of ionizing radiation that cause failures and affect the reliability of devices and electronic structures are widely discussed in the field of research and development of radiation-tolerant integrated circuits. Some protection techniques can be employed to mitigate such failures, such as hardened cells, digital design to ensure robustness, redundancies, and analog layout. In this context, a design technique with Triple Modular Redundancy (TMR) and Hamming will be addressed with the use of hardened cells from the Design Against Radiation Effects (DARE) 180 library to protect Dual-Port RAM memories. The Single Event Effect (SEE) tests for validation were performed at the Institute of physics at the University of São Paulo (IFUSP).

# Keywords—Memory, Redundancy, Single Event Effect.

# I. INTRODUCTION

Technological advances and the demand for the development of high performance devices, able to store large amount of information, lead to a prevision of strong concurrency in the market of flexible memories, since this is a semiconductor device with a trend to irreversible damages with impact in all the circuit functionality [1].

Memories devices are used in various areas and systems, such as industrial, military, aerospace, telecommunications, automobile, among others. Achieving requirements such as miniaturization and reduction of consumption are essential items of design, on the other hand arises the implication of the sensitivity of these memories mainly in environments under the effect of radiation.

In this hostile environment many works have contributed to the understanding of the behavior of devices, in order to increase their robustness to the effects of the cosmic radiation. Although these studies are restricted to radiations in space environments, many medical and land-based military equipment need to be put on the list of concerns for this exposure.

In Figure 1 it is shown the radiations that the Earth planet is subject in spatial, atmospheric and terrestrial terms [2].

Ângela Alves dos Santos CTI Renato Archer Núcleo de Concepção de sistemas de Hardware Campinas, Brasil angela.aads@gmail.com

Ântonio da Costa Telles *CTI Renato Archer Núcleo de Concepção de sistemas de Hardware* Campinas, Brasil antonio.telles@cti.gov.br



Fig. 1: Interaction of radioactive events with planet Earth [3].

For the component to be qualified as radiation tolerant, it must be specified according to the types of particles that may possibly reach the component, amount of energy, flow or fluence corresponding to the dose rate or cumulative dose, respectively, to the type of material employee, technology, nature of the component and its behavior in the interaction with radiation [4].

The higher the reliability, the lower its susceptibility to failure, as well as the techniques against radiation effects, the present work presents ways to increase memory robustness, from TMR and Hamming for data recovery with the use of library hardened cells DARE180.

#### II. SINGLE EVENT EFFECT

The radiation sources are classified into two sections: Single Event Effect (SEE) and Total Ionizing Dose (TID), which will not be addressed in this paper. Radiation through the effect of a particle can reach regions at the level of transistors. Within SEE are also classified the single event types: Single Event Transient (SET), Single Event Upset (SEU) and Single Event Latchup (SEL). SET events can cause a voltage peak or a transient current that results in a flip bit in memory. The SEU event may cause a temporary bit-flip in memory. The SEL may cause permanent failure due to high current flow [5].

# A. Radiation Enviroment

In a radioactive environment, some changes in memory devices can be noted, which vary according to exposition time, energy type, material of the component structure and the type of ion applied. The effects from these environments can be classified as permanent or transitory, possibly causing destructive changes.

The main sources of radiation are: cosmic rays, solar flares and radioactive fields. These in turn produce ionizing radiation, which are those that change an atom or a molecule, because they have enough energy to leave an atom charged. Cosmic rays produce ions with SEE effects, solar flares produce ions and protons that cause Displacement Damage. And finally, the radioactive fields that produce protons and electrons, related to the ionizing dose [6].

# B. Destructive Events

Many currently developed memories are more susceptible to failure to SEL events in space environments. This event is destructive, due to potential problems of parasitic elements in the circuit, consequently the current of operation is exceeded causing a short circuit. The latchup is basically a low-impedance generation in the CMOS path between power suply and ground, which means that these devices will be more sensitive to short-time pulses because their response time is not fast enough. One way to prevent this event is to use insulators between the NMOS and PMOS channels using a trench oxide or the use of guard rings available in the library DARE1980.

#### C. Non-destructive Events

Non-destructive events are based on the effects of heavy ions induced by SEU and SET. These are temporary effects, i. e., the memory returns to the normal state after the decaying of the radioactive level of the particle.

The more susceptible memories to SEU are the SRAMs and DRAMs while EEPROM and EPROM are more susceptible to SET events. Figure 2 shows a neutron striking a transistor in the bulk region.



#### \*Source: Actel Corporation

Fig. 2: High-energy neutron reaching the transistor.

The affected areas are the memory cells and the logical control part, causing bit-flips in the sequential and transient structure in the combinational structure. A bit-flit occurs when a state of the transistor that stores the value "0" reverses in value "1" and the reverse can also occur, which can randomly change the memory program. The transient is involved with charged particles that induce current pulses in combinational logic, in clock lines.

# **III. MEMORY DEVICES**

Memory is a storage device and is classified in terms of access, volatility and data transition. The access class is defined in either sequential or random access mode as well as access time. The volatility comprises in the memory whether or not volatile, that when the power supply is interrupted the data recorded in the memory can be lost. The data transition refers to the type of interface the memory provides, some can be read only while others can be read and write. There is also a class of storage type that can be performed, defined as static (data remain recorded) and dynamic (which implies in recordings of the same given occasionally) [7]. Figure 3 shows the classification of the two main memories that are the basis for all other existing memories.



\*Source: Created by the author

#### Fig. 3: Memory classification.

The memory device used in this article is a doubleended 256-port Random Access Memory (RAM) type. This memory has features such as: simultaneous access to writing and reading, a dedicated port for reading data and a dedicated port for recording data. Each of these ports has its own data bus and address. In Figure 4 it is possible to identify the input and output signals that this memory has.

| CLK1  |       | CLK2  |
|-------|-------|-------|
| ADDR1 |       | ADDR2 |
| DATA1 | DPRAM | DATA2 |
| EN1   |       | EN2   |
| WEN1  |       | WEN2  |
| PSEN1 |       | PSEN2 |
| Q1    |       | Q2    |

\*Source: Created by the author

Fig. 4: DPRAM signals

The CLK1 and CLK2 signals correspond to the clock signals. Addresses are made up of 8 bits, ADDR1 and ADDR2 signals. The data signals are DATA1 and DATA2 with 48-bit bus. The signals of Enable, EN1 and EN2. WEN1 and WEN2 writing enable signals. PSEN1 and PSEN2 enable memory reading. And output signals Q1 and Q2.

This type of memory is used specifically for low-volume data storage, operations, and synchronous recovery projects of two systems [8].

# IV. MITIGATION TECHNIQUES

Brokers and error detectors are coding schemes, much used in the area of memory protection. Some of the most famous are: Hamming, TMR and Reed Solomon coding [9].

# A. Coding Hamming

Created in 1940 by Richard W. Hamming, Hamming coding enables the correction of "one" error and the detection of up to "two" errors, through the use of parity bits. It is highly used in correcting data that suffer from soft errors which are errors caused by bit flips in the circuit[12].

#### B. TMR

The TMR is another technique used in space applications, based on the copy of the same given three times (redundant modules) and has as output a voting mechanism. The architecture proposed by von Neumann, known TMR is the majority voting, where the voter module receives the output of the three memory modules and selects the majority output[13].

#### C. Reed Solomon

Published in 1960 by Irving Reed and Gus Solomon. This encoding adds redundant information to the signal so that the receiving module can detect and correct errors arising from the transmission. One of its most relevant applications is its use in digital space application systems due to their effectiveness[14].

# V. THE DESIGN AGAINST RADIATION EFFECTS

United Microelectronics Corporation's (UMC) process radiation-hard technologies are hardened cells that contain special desing techniques such as: transistors layout, Enclosed Layout Transistors (ELT), and Guard Rings. The ELT provides radiation-induced decrease in leakage current, while the guard rings are responsible for leakage current reduction. The DARE180 nm technology provides 6 layers of CMOS metal, flip-flop hardened against SEU, Single and Dual Port Random Access Memory. The 180 nm (nano meter) represents the distance between the terminals of the transistors, the unit nm is a scale to measure dimensions within any Integrated Circuit (IC) [10].

#### VI. METHODOLOGY

The method used to the protection architecture of memories took in consideration the development of Hamming errors mitigation techniques and TMR. The 32 bits data are spliced in two parts of 16 bits, identified as Most Significant Bits (MSB) and Less Significant Bits (LSB). Each part is Hamming coded, improving the reliability and enabling the correction of two errors. The use of two TMRs to vote the MSB and LSB values turns possible that, even if a permanent failure of one of the voters occurs, the other will remain retrieving two data bytes. Figure 5 shows the general architecture of coding.



Fig. 5: Technical coding diagram

The data path during operation is as follows:

- The data to be encoded, consisting of 36 bits +12 bits of padding, enters through the Double Encoder. The Double Encoder is a Hamming encoder.
- The encoded data is transmitted to the memories and recorded.
- Between recording time in memory and reading from memory there may be data corruption.
- At the instant that a reading occurs, the data of each memory passes through decoders, which correct an error of change in the bit value that can detect up to two bits, but presents an unpredictable behavior with more than two wrong bits.
- Decoded data and information whether or not the data contains an error enters a majority voter. If there is incompatibility between the decoded values, the voter decides the data that is most likely to be correct.
- If a reliable datum could not be selected, the voter output datum will be "0" in all bits together with a signal (Error\_MSB or Error\_LSB) to indicate if the datum is valid.

During the test operation of this memory it is possible to bypass the Double Encoder block, which contains the Hamming coding, this feature is used to write data with incorrect parities in memory as a way to verify the veracity of the Decoder.

There is also the possibility and disable a specific memory that has suffered damage.

It is also possible to bypass the Decoder block, when this block is enabled the output data is 12 bits of parity and 36 bits of data, this mechanism can be used to check the data of the encoder and check the decoder.

The VOTER is responsible for the voting operation, as well as the other blocks it can be disabled, so select one of the decoders instead of voting. This feature can be used to check each memory or each decoder individually.

#### VII. RESULTS

After passing through the digital flow process [11], the GDS was sent to IMEC for the layout and encapsulation of the IC.

Figure 6 shows the position of the memories in the digital layout.



Fig. 6: Digital memory layout.

The PELLETRON particle accelerator was used at the IFUSP to perform the SEE radiation tests, where the device receives the ion beam to simulate cosmic radiation. The effect of each irradiated particle is measured through data acquisition systems and software that account for possible errors found in the memories. Figure 7 shows the encapsulation with the 120-lead Ceramic Quad FlatPack (CQFP). To control the reading and writing of data from memory, a PCB board was developed with integration of the Field Programmable Gate Array (FPGA) DE10 for software instructions and tests.



Fig. 7: Device with kit FPGA.

The experimental setup submitted to radiation is assembled in the chamber and all communication is done by cables connected to a DB50 connector. In Figure 8 is the plate mounted inside the vacuum chamber to be radiated.



Fig. 8: Device inside chamber IFUSP.

# A. Test Setup

All tests performed at USP need to be scheduled and send the test planning to those responsible. The test setup was set up according to Figure 9.





Outside of the radiation camera there is a notebook that will provide remote access to the notebook in the control room because during the tests the presence of person inside the radiation room is not allowed. The sequence of tests is as follows:

- Fill the memory module with a known data pattern.
- Turn on the ion beam and wait for a certain time.
- Shutdown the ion beam.
- Perform memory reading and compare to the pattern filled in at the beginning.
- Accounting for errors found.

Energies of 40 MeV were used with flows between  $2K/s.cm^2$  and  $9K/s.cm^2$ , with an exposure time of 30 seconds. The ions used were Carbon 12C, Oxygen 16O, Chlorine 35CL and Silver 107Ag.

# B. Failure mode analysis of an DPRAM

The tests were performed with the times of 15,30,35,45,60,75 and 95 seconds with high particle flow in order to reach the largest number of bit flips to determine the error recovery capacity at each stage.

The graphs show the total number of errors in memory after irradiation without the use of corrections, errors after corrections made by Hamming and total errors after the vote.

The graph in Figure 10 presents the errors in the oxygen, with flow of 9K/s.cm<sup>2</sup>.



Fig. 10: Errors in oxygen irradiation.

The flow of the ion beam depends on the material used, so there is a change in the total number of particles, the chlorine had a smaller number of particles  $6K/s.cm^2$ , but its

mass kept the number of errors close to that of the oxygen as can be seen in the Figure 11.



Fig. 11: Errors in chlorine irradiation.

Test with silver presented the lowest flow of particles, being  $2K/s.cm^2$  with results similar to the tests with oxygen and silver in Figure 12.



Fig. 12: Errors in oxygen irradiation.

The graphs demonstrate the efficiency in the combination of Hamming with TMR, increasing the reliability in the information.

# VIII. CONCLUSION

It can be concluded that the memory protection technique with the implementation of the Hamming and TMR error detectors contributed positively to validate the circuit as fault-tolerant. The composition of two voting mechanisms has ensured that even if one voting module is reached, another module can carry out majority voting. The use of hardened cells allowed to observe the effects of the ions in the irradiation test, which is close to the reality of the radioactive environments. The sensitivity to effects of SEE are shown to be very low, from the proposed architecture.

Together, these techniques can validate the memory device as radiation and fault-tolerant, providing good prospects for terrestrial, flight, and satellite projects. Remembering that for each project in specific radiation analyzes must be well designed for optimal performance.

#### ACKNOWLEDGMENT

The authors would like to thank everyone at NCSH- CTI Renato Archer, IMEC, IFUSP involved in memory design, library availability and test, respectively.

#### REFERENCES

- I. Fetahovic, M. Pejovic, M. Vujisic. "Radiation Damage in Electronic memory Devices". International Journal of Photoenergy, 2013.
- [2] J. L. Barth, C. S. Dyer, and E. G. Stassinopoulos, "Space, atmospheric, and terrestrial radiation environments," IEEE Transactions on Nuclear Science, vol. 50, no. 3, pp. 466–482, 2003
- [3] L. J. Lanzerotti, D. J. Thomson and C. G. Maclennan (1997). "Wireless at high altitudes environmental effects on space-based assets". Bell Labs Tech. J., 2: pp. 5-19.
- [4] E. M.Yoshimura. "Física das Radiações: interação da radiação com a matéria", vol III, n I, 2009, pp 57-67.
- [5] HOLMES-SIEDLE, A.; ADAMS L.; Handbook Radiation Effects OUP Oxford; 2 edition (1 Feb. 2002).
- [6] STASSINOPOULOS E. G.; RAYMOND J. P., "The space radiation environment for electronics," in *Proceedings of the IEEE*, vol. 76, no. 11, pp. 1423-1442, Nov. 1988.
- [7] ZWOLINSKI, M.; Digital System Design with System Verilog. Prentice Hall. 2010.
- [8] D. R. Gonzales. "Interface multi-processor using devices with dualport RAM". Microeletronics Journal, v.16, issue 3, 1985, pages 5-12.
- [9] MOHAMMAD, A.K.; CHEN, X.; Digital Design: Basic Concepts and Principles. CRC Press; 1 edition (November 27, 2007).
- [10] REDANT, S. and ed al. The Design Against radiation Effects (DARE) Library.
- [11] GROUT, I. A; Integrated Circuit Test Engineering: Modern Techniques. pp.50-52, Springer; Edição: 2006 (12 de dezembro de 2005).
- [12] HAMMING, R. W.; Error detecting and error correcting codes, Bell System Tech. J. 29 (1950) 147–160.
- [13] RAHMAN, M. H.; RAFIQUE S., ALAM M. S.; A Fault Tolerant Voter Circuit for Triple Modular Redundant System. Journal of Electrical and Electronic Engineering. Volume 5, Issue 5, October 2017, Pages: 156-166.
- [14] WICKER S.B.; BHARGAVA V.K.; An Introduction to Reed-Solomon Codes" in Reed-Solomon Codes and Their Applications. New York: IEEE Press, 1994, pp.1-15.