Skip to content

WeiminWangKolmostar/fx3-state-machine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kolmostart's GPIF State Machine Design

FX3 is a powerful USB 3.0 peripheral controller, allowing host to communicate with a wide range of peripherals via USB 3.0 interface.

The FX3 has a proprietary interface called GPIF II (The Second-generation General Programmable Interface) which can be used to interface with external devices such as FPGA, ASIC, or any other device that can be interfaced using a parallel data bus.

To use the GPIF II, the infineon offically provide a synchronous slave FIFO solution to support high throughput requirements.

However, the slave FIFO solution is not suitable for our application. We design a new design GPIF state machine to achieve high-throughput half-duplex data transfer between USB host and our ASIC.

We think that our design is also a general solution which is better than the synchronous slave FIFO solution in some cases.

Overview

The overview data path is shown below:

The GPIF functions as a synchronous slave inteface. The data bus is 32-bit wide and is bidirectional. The kasic use OE signal to send some control signals to the GPIF state machine.

To transfer data from kasic to USB, we build a many-to-one auto DMA channel from 2 GPIF sockets to USB socket. To transfer data from USB to kasic, we build a one-to-many auto DMA channel USB socket to 2 GPIF sockets. The socket switching on GPIF side is controlled by the GPIF state machine.

Note: The KASIC is an ASIC designed by us (kolmostar). The OE signal is not for Output Enable here. The reason we use OE name is for historical reasons.

Functional State Diagram

operation-states.drawio.svg

In our design, the GPIF state machine implements 3 operations:

  • kasic read operation: kasic can read data from FX3 GPIF (USB host)
  • kasic write operation: kasic can write data to FX3 GPIF (USB host)
  • flush operation: commit the half-filled DMA buffer to make the data available immediately to the USB.

The basic idea of the GPIF state machine design is that:

All the 3 operations (kasic-reading, kasic-writing and flush) are started from the a IDLE state. From the IDLE state, the kasic use different OE sequence to choose the operation it wants to perform. After the operation is finished, the state machine will transit back to the IDLE state.

Comparison with the synchronous slave FIFO interface

The FX3 socket switching in slave FIFO interface is controlled by the external processor. The external processor needs to track the availability of the socket by monitoring the FX3's flag outputs. In our solution, the socket switching is controlled by the GPIF state machine. It can keep the external processor logic simple.

With slave FIFO interface, before reading/writing data from/to GPIF, the external processor needs to wait for the FX3 is ready to transfer data by monitoring the FX3's flag outputs. This may not be a problem in some time-critical applications. However, in our solution, we ensure that the FX3 is alaways ready to transfer data when the external processor want to perform reading or writing operation. This is achieved by adding some constraints to the data transfer application.

The slave FIFO interface uses some dedicated signals (SLWR, SLRD and PKEND) to choose the operation to perform. While in our solution, we use the a single signal instead. This is because there is no more pins available in our ASIC. However, our design can be easily modified to use more signals to choose the operation for the external processor which has more pins available.

Execution model of GPIF state machine

The infineon did not provide a detailed description of the execution model of the GPIF state machine. After debugging with the GPIF state machine*, we summarize the execution model of the GPIF state machine as follows:

The state machine has internal registers to store the following information:

  • current state
  • internal event generated as a result of action execution (such as DMA_RDY_THx)
  • value of data bus and GPIO input signals

On every clock cycle, the state machine will do the following things parallelly:

  1. perform state transition based on the internal registers' value from previous clock cycle
  2. latch the generated internal event from the last action execution to internal registers
  3. execute actions of the transited state (state may not change)
  4. latch values of data bus and GPIO input signals to internal registers

We will use an example to explain the execution model of the GPIF state machine in the following sections.

Noted that the execution model may not be accurate but it's consistent with our observations of state machine behavior so far.

* Click to show how we debug the GPIF state machine

When designing the GPIF state machine, it's helpful to konw how the state transit in the state machine.

We connect the GPIO pins of a MCU to the FX3 GPIF pins and control the MCU's GPIO pins in bit-bang style to manually trigger the clock & OE signal and read / write the data bus. In this way, we can drive the GPIF state machine step by step. With the FX3's CyU3PGpifGetSMState() API, we can get the current state of the GPIF state machine in each clock cycle.

Design of the GPIF state machine

kasic -> USB data path (kasic writing operation)

The state machine for kasic -> USB data path is shown below:

In this state machine, we bind the IN_DATA of state WRITE2 to thread 2, and bind the IN_DATA of state WRITE3 to thread 3. From the IDLE state, use OE -> !OE -> !OE signal sequence to enter the kasic-writing operation. During the kasic-writing operation, the state machine will switch between thread 2 and thread 3 to write data to the USB host without data loss.

To achieve continuous data transfer, we set the watermark of the threads to 1 words, which means when the available size of the active DMA buffer for the thread is less than or equal to 1 word, the DMA_WM_THx internal event will be triggered. (See the Kasic Writing Example below to learn how the watermark works.)

In current design, the state machine is almost symmetric and you can find a mirror state for most states. The reason of the mirror states is that we need to "remember" the last-write thread and the status of the last-write DMA buffer. Let's me explain it in detail.

After the kasic-writing operation is finished, there are two possible situations:

  • If the active DMA buffer of the last-wrote thread has exactly been filled, the kasic needs to start writing to another thread in the next kasic-writing operation.
  • If the active DMA buffer has been partial filled, the kasic needs to start writing to the same thread in the next kasic-writing operation.

So which thread should be written first in a kasic-writing operation depends on the last-wrote thread and the status of the last-wrote DMA buffer. That is why we need 2 IDLE states (and the mirror states):

  • If kasic start to perform writing from state IDLE_W2, it will write to thread 2 first.
  • If kasic start to perform writing from state IDLE_W3, it will write to thread 3 first.

When exiting the kasic-writing operation (kasic assert !OE signal), the state machine will check if the current DMA buffer is completely filled and transit to proper IDLE state.

The timings for kasic writing operation is as follows:

alt text

Pseudo-code for the kasic writing operation
def kasic_write(wds: list[int]):
    # operation start with IDLE state

    kasic.oe(1)
    kasic.clk()  # IDLE -> IDLE; latch OE; 

    kasic.oe(0)
    kasic.clk()  # IDLE -> OE_CHK; latch !OE; 

    kasic.write_word(wds[0])
    kasic.clk()  # OE_CHK -> Wn; latch !OE,data bus; 

    kasic.sloe(1)
    for w in wds[1:]:
        kasic.write_word(w)
        kasic.clk() # Wn/WRITE -> WRITE; action: IN_DATA; latch !OE,data bus; 
        # The IN_DATA action will write the internal data register's value 
        # (which is latched from previous clock) to active DMA buffer

    kasic.oe(0)
    kasic.clk()  # WRITE -> WRITE; action: IN_DATA; latch !OE; 

    kasic.clk()  # WRITE -> CHK; latch !OE; 
    kasic.clk()  # CHK -> IDLE; latch !OE; 

Kasic Writing Example

Let's say we want to write 4 words to the USB host. The timings for the kasic writing operation is as follows:

alt text

When DMA buffer size is 3 words and thread 2 is the first thread to be wrote, the state machine will work as follows:

CLK OE DATA State Transition Actions Latched Data
1 1 IDLE_W2 -> IDLE_W2 OE
2 0 IDLE_W2 -> OE_CHK_0 !OE
3 0 0x01 OE_CHK_0 -> W2 !OE, 0x01
4 1 0x02 W2 -> WRITE2 IN_DATA to thread 2 OE, 0x02
5 1 0x03 WRITE2 -> WRITE2 IN_DATA to thread 2 OE, 0x03
6 1 0x04 WRITE2 -> WRITE2 IN_DATA to thread 2 OE, DMA_WM_TH2, 0x04
7 0 WRITE2 -> WRITE3 IN_DATA to thread 3 !OE
8 0 WRITE3 -> CHK_W3 !OE
9 0 CHK_W3 -> IDLE_W3 !OE

Noted that:

  • When executing IN_DATA action, the value written to the active DMA buffer is the data bus value from the internal registers, which is latched from the previous clock cycle. So in the 4th clock cycle, the first data 0x01 is written to the active DMA buffer of thread 2, while the second data 0x02 is written to the data bus. In the 7th clock cycle, the last data 0x04 is written to the DMA buffer, while there is no data on data bus at this time.
  • The DMA_WM_TH2 signal is asserted in 5th clock cycle and latched to the internal register in 6th clock cycle.

When each DMA buffer is 10 words and thread 2 is the first thread to be wrote, the state machine will work as follows:

CLK OE DATA State Transition Actions Latched Data
1 1 IDLE_W2 -> IDLE_W2 OE
2 0 IDLE_W2 -> OE_CHK_0 !OE
3 0 0x01 OE_CHK_0 -> W2 !OE, 0x01
4 1 0x02 W2 -> WRITE2 IN_DATA to thread 2 OE, 0x02
5 1 0x03 WRITE2 -> WRITE2 IN_DATA to thread 2 OE, 0x03
6 1 0x04 WRITE2 -> WRITE2 IN_DATA to thread 2 OE, 0x04
7 0 WRITE2 -> WRITE2 IN_DATA to thread 2 !OE
8 0 WRITE2 -> CHK_W2 !OE
9 0 CHK_W2 -> IDLE_W2 !OE

Flush the writing buffer (flush operation)

We already know that, usually, a DMA buffer is available for consumer socket only after a producer has filled the buffer. However, when sending data from kasic to FX3, after a kasic-write operation, the last buffer we wrote may not been full filled, in some cases, we still want to make the data to be available to the consumer socket (USB endpoint) immediately. So we also need a flush operation to commit the half-filled DMA buffer.

In GPIF state machine, the COMMIT action can be used to commit the active DMA buffer of a thread to make the it available immediately to the consumer. We bind the COMMIT action of state COMMIT2 to thread 2, and bind the COMMIT of state COMMIT3 to thread 3. From the IDLE state, use OE -> OE -> !OE signal sequence to enter the flush operation.

When execute COMMIT action on a thread, a empty buffer will always be committed:

  • If the active DMA buffer of the thread is partial-filled or full-filled, the current buffer and the next buffer (empty buffer) of this thread will be committed.
  • If the active DMA buffer of the thread is empty, only the buffer will be committed.

In our design, we always commit both thread 2 and thread 3 in flush operation. So, after the flush operation, in a new kasic-write operation, the state machine should start from another thread different from last-write thread.

For example, let's say the last write operation is performed on thread 2. After commit the thread 2, the last-write buffer and the next empty buffer of thread 2 will be committed. After commit the thread 3, only one buffer of thread 3 will be committed. So the next kasic-write operation should start from thread 3.

We use a counter of state machine to track the last-write thread number. The GPIF state machine provides 3 counters (CTRL, ADDR and DATA). You can think of them as 3 32-bit registers. The counter can be set by state machine action LD_XXX_COUNT, and incremented and decremented by COUNT_XXX action. Each counter can be config with a limit value, when the counter reach the limit value, the state machine will assert a signal XXX_CNT_HIT.

In our design, we use the CTRL counter to track the last-write buffer. We set the limit value of CTRL counter to 1. In LD_CTRL_COUNT action, the counter is set to 0. In COUNT_CTRL action, the counter is incremented by 1.

With our design, if the counter value is 1, it means the last-write thread number is 2, and if the counter value is 0, it means the last-write thread number is 3. So after execute COMMIT action on both threads, the state machine will transit to IDLE_W2 state if the counter value is 0, and transit to IDLE_W3 state if the counter value is 1.

This design can make sure that after the flush operation the kasic will start from another thread different from the last-write thread.

The timings for flush operation is as follows:

def flush():
    kasic.oe(1)
    kasic.clk()  # set_flag/idle -> idle (and latch OE)
    kasic.clk()  # idle -> oe_chk (and latch OE)
    kasic.clk()  # oe_chk -> wait (and latch OE)

    kasic.clk(30)  # send 30 cycles to wait DMA ready

    kasic.oe(0)
    # commit threads and wait in_data DMA ready, see the below text for more info
    kasic.clk(35)  

The reason that we add extra clocks in the timings will be talked in the below text.

USB -> kasic data path (kasic reading operation)

The kasic-reading operation is similar to the kasic-writing operation. It should have been designed like this:

Noted that the GPIF Designer does not allow to use ping-pong DR_DATA for 2 threads, we bypass this limitation by manually patching the gpifconfig.h file. See patch.md for more info.

We bind the DR_DATA of state READ0 to thread 0, and bind the DR_DATA of state READ1 to thread 1.

From the IDLE state, use OE -> !OE -> OE signal sequence to enter the kasic-reading operation.

During the kasic-reading operation, the state machine will switch between thread 0 and thread 1 to read data from the USB host without data loss.

The kasic-reading operation is also stateful:

  • If the kasic has consumed all the data in the current DMA buffer, the kasic needs to start reading from the another thread in the next kasic-read operation.
  • If the kasic has not consumed all the data in the current DMA buffer, the kasic needs to start reading from the same thread in the next kasic-read operation.

So we need 2 IDLE states:

  • If kasic start to perform reading from state IDLE_RO, it will read from thread 0 first.
  • If kasic start to perform reading from state IDLE_R1, it will read from thread 1 first.

We also need to check if the current DMA buffer is completely consumed when exiting the kasic-reading operation and transit to proper IDLE state.

Here is the timings for kasic reading operation:

def kasic_read(packet_len: int) -> list[int]:
    kasic.oe(1)
    kasic.clk()  # idle/chk_r -> idle (and latch OE)

    kasic.oe(0)
    kasic.clk()  # idle -> oe_chk (and latch !OE)

    kasic.oe(1)
    kasic.clk()  # oe_chk -> Rx (and latch OE)

    wds = []
    for i in range(packet_len - 1):
        kasic.clk()
        wds.append(kasic.read_word())

    kasic.oe(0)
    kasic.clk()  # last DR_DATA, latch !OE
    wds.append(kasic.read_word())

    kasic.clk()  # read -> chk
    return wds

Unexpected behavior of kasic-reading state machine

We found a weird behavior of the GPIF state machine.

When performing a kasic-reading operation, we expect the state machine will transit between READ0 and READ1 states to switch the active thread when the current DMA buffer is completely consumed.

However, we found that, when performing a kasic-reading operation, the state transition will be performed one clock cycle earlier than expected. What's more weird is, after thread switch, the first DR_DATA action on new state will be executed on the original thread! So the issue doesn't affect the data transfer in single kasic-reading operation. However it may cause the state machine transit to a wrong IDLE state when exiting the kasic-reading operation.

For example, let's say the kasic try to read from GPIF data bus starting from IDLE_RO and it exit reading after reading exact one buffer in thread 0. In this case, the expected state transition is as follows:

IDLE_RO -> OE_CHK_O -> RO -> READ0 -> CHK_R0 -> IDLE_R1

Actually, the observed state transition will be performed as follows:

IDLE_RO -> OE_CHK_O -> RO -> READ0 -> READ1 -> CHK_R1 -> IDLE_R0/IDLE_R1

During this process, the state machine never execute DR_DATA action on thread 1 but it stays in state READ1 for one cycle.

In the example's case, when the state machine is transited to CHK_R1, if signal DMA_RDY_TH1 is asserted, the state machine will work with no issue since the expected next state is IDLE_R1. But if the signal is not asserted, the state machine will transit to IDLE_R0 and try to use thread 0 again in next kasic-read transaction. This will mess up the data transfer.

To avoid this issue, we update the state machine design based on the unexpected behavior.

The new design even simper than the original design. To make the new state machine work properly, the kasic should read at least two words in one kasic-reading operation.

Expand to learn why this state machine can work.

Suppose the kasic exits the kasic-reading operation from READ0 state, there will be the following conditions depending on the previous state of READ0

  • R0 -> READ0 (READ0 is transited from R0 state): in this case, the state machine will transit to IDLE_R1 state after the kasic-reading operation.

  • READ1 -> READ0: there are 2 different conditions in this case,

    • No data is read from thread 0 since transition to READ0: in this case, the state machine will transit to IDLE_R0 state after the kasic-reading operation.
    • Some data is read from thread 0 since transition to READ0: in this case, the state machine will transit to IDLE_R0 state after the kasic-reading operation.

One exception is that, if there is only 1 word in the DMA buffer of thread0 before the kasic-reading operation, and the kasic try to read only 1 word in the kasic-reading operation. In this case, the state machine will exit the kasic-reading operation from READ0 state and transit to IDLE_RO state. However, the kasic should start from thread1 in the next kasic-reading operation. So we add the constraint that the kasic should read at least 2 words in one kasic-reading operation.

There is another unexpected behavior for ping-pong DR_DATA. If there is only 1 word left in the DMA buffer of a thread (let's say it's thread0) after a kasic-reading operation, and the kasic try to read 2 or more words in the next kasic-reading operation, then the ping-pong thread switching will not work, the state machine will always stay on READ0 state and read data from thread0. It will cause DMA underrun error eventually.

To avoid this issue, the kasic should always not leave only 1 word in the DMA buffer of a thread after a kasic-reading operation. One way to achieve this is to introduce some conventions in both host driver and kasic side. For example, the host driver should always submit a DMA buffer with size of multiple of 2 words. And the kasic should always read data with size of multiple of 2 words in one kasic-reading operation.

Merge the two data paths

We already designed the state machine of kasic-reading operation and kasic-writing-with-flush operation. Now we need to merge the two state machine. In the final state machine, we expect the following behavior:

  • The kasic can use different OE signal sequence to choose the operation it want to perform.
  • Kasic exit current operation by asserting !OE signal. After the operation is finished, the state machine will transit back to the IDLE state.
  • Kasic can start a new operation without reset the GPIF state machine.

To learn how the final state machine should look like, let's first image a scenario that the kasic first perform a kasic-writing operation, then perform a kasic-reading operation. As we already know, the kasic-writing and kasic-reading operations are stateful. So after those two operations, the kasic might be in 4 different states:

  • IDLE_R0W2: in this state, if kasic want to perform a kasic-writing operation, it should start from thread 2. If kasic want to perform a kasic-reading operation, it should start from thread 0.
  • IDLE_R0W3, IDLE_R1W2, IDLE_R1W3: similar to IDLE_R0W2.

So the final state machine should have 4 IDLE states. The final state machine is shown below:

It can be constructed by merging the two state machines we designed before. Specifically, we need to:

  • Copy 2 kasic-reading state machines to the final state machine, let's call them A and B.
  • Copy 2 kasic-writing-with-flush state machines to the final state machine, let's call them C and D.
  • Merge the state IDLE_R0 of A and state IDLE_W2 of C, and rename the state to IDLE_R0W2.
  • Merge the state IDLE_R1 of A and state IDLE_W2 of D, and rename the state to IDLE_R1W2.
  • Merge the state IDLE_R1 of B and state IDLE_W3 of D, and rename the state to IDLE_R1W3.
  • Merge the state IDLE_R0 of B and state IDLE_W3 of C, and rename the state to IDLE_R0W3.

Extra limitations of the state machine

DMA descriptor switching latency

In the previous section, we already know that a socket takes a finite amount of clocks to switch from one DMA descriptor to another after it fills or empties a DMA buffer. The socket will not be able to transfer any data while this switch is in progress.

When the socket is from USB block, DMA descriptor switch will not cause data loss since the USB protocol has flow control mechanism. When the socket is from GPIF block, we bypass the issue by using multiple GPIF threads to switch the active GPIF socket in a ping-pong manner. However, after a socket is switched out, if the socket is switched back again before the DMA descriptor is finish loading, the socket will still not be able to transfer any data. In our experiment, the DMA descriptor switch takes 33 state machine clock cycles. This limitation adds the following constraints to our data transfer application:

Constraint 1: After initiating the state machine, the kasic need to wait for at least 33 clock cycles before start the first kasic-writing or kasic-reading operation.

We add a initiating operation for kasic to init the state machine, the timing for the initiating operation is as follows:

def init():
    kasic.sloe(0)
    kasic.clk(80)

Constraint 2: The USB producer socket should not submit a DMA buffer with data size less than 33 words.

For example, in a kasic-reading operation, if there are following DMA buffers to be read:

buffer0: 4k words
buffer1: 10 words
buffer2: 4k words

After completely reading buffer0 from a thread, the GPIF state machine will switch to another thread to read buffer1. However, after 10 clocks cycle, the GPIF state machine will switch back to the thread to read buffer2. In this case, the thread will not available to transfer any data.

This constraint will be handled by the host driver.

Constraint 3: If a GPIF producer socket has just fully filled a buffer, when we try to perform COMMIT action on the thread of the producer socket, we need to wait for 33 state machine clock first to make sure the DMA descriptor switch is finished. So we add some extra clocks in the flush operation before the COMMIT action.

Constraint 4: After flush operation, the kasic should wait for 33 state machine clock cycles before start a new kasic-writing operation.

This is because the COMMIT action in flush operation will always make the GPIF producer socket to commit one or 2 buffer of its thread. If the kasic start a new kasic-writing operation before the DMA descriptor switch is finished, the GPIF producer socket will not be able to transfer any data.

This constraint can be simply satisfied by adding 33 empty clocks (OE=0) in the flush operation after the COMMIT action.

DMA buffer lock releasing latency

In our practice, we found that it needs some state machine clocks to make a DMA buffer available to GPIF consumer socket after a USB producer socket has submitted the buffer. It also needs some state machine clocks to make a DMA buffer available to GPIF producer socket after a USB consumer socket has consumed the buffer. The latency is about 30 clock cycle.

We guess that the latency is caused by the DMA buffer lock releasing mechanism. It might take some time for GPIF sockets to be notified after the USB socket releases a DMA buffer lock.

The latency will cause issue in such a scenario:

Before the USB host send some data to kasic, all DMA buffer of USB-to-GPIF channel are empty. After USB host send the data, the kasic start a kasic-reading operation. In this case, the kasic will read garbage data in the kasic-reading operation since the new data is not available to the GPIF consumer socket yet.

To avoid this issue, in such a scenario, the kasic should send 30 dummy clock cycles before start the new kasic-reading operation. The initiating operation can be used to provide the dummy clock cycles.

Summary

We design the GPIF state machine to achieve high-speed half-duplex data transfer between FX3 and kasic. The kasic can interact with the GPIF state machine by the following operations:

  • init(): Initialize the state machine.
  • kasic_read(n): Receive specified length of words (at least two) from the USB
  • kasic_write(words): Send any length of words (at least one) to the USB
  • flush(): Flush the writing buffer to make the data available immediately to the USB.

The init() must be the first operation of the data transfer. Then the kasic_read(), kasic_write() and flush() can be performed in any order.

To make the state machine work properly, 3 rules should be followed:

  • The USB producer socket should not submit a DMA buffer with data size less than 33 words.
    This needs to be handled on the host driver side.
  • The kasic should always not leave only 1 word in the DMA buffer of a thread after a kasic-reading operation.
    A possible solution is to make the host driver always submit a DMA buffer with size of multiple of 2 words, and the kasic always read data with size of multiple of 2 words in one kasic-reading operation.
  • If all DMA buffer of USB-to-GPIF channel are empty before the host send data to kasic. After USB host send the data, the kasic should perform a init() operation before start a new kasic-reading operation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published