'Understanding bubble vs stall vs repeated decode/fetch

I'm really confused on the difference between bubbles, stalls, and repeated decoding/fetching. My text is the Patterson text, 3rd edition.
Example 1:

add $3, $4, $6
sub $5, $3, $2
lw $7, 100($5)
add $8, $7, $2

Solution: Click here (Using an image because it is very hard to type out what is there)
In this example/solution, FIVE bubbles are inserted into a new row in between the 3rd and 4th instructions.

Example 2:

lw $4, 100($2)
sub $6, $4, $3
add $2, $3, $5

Solution: Click here
In this example, a bubble wraps the 2nd and 3rd instruction in clock cycle 4. In clock cycle 4, I2's decode is repeated and I3's fetch is repeated.
What is the difference between examples 1 and 2? Why is a row of bubbles inserted in example 1 whereas in example 2, a bubble is inserted and decode/fetch repeats? Are they functionally the same? If they are functionally the same, is this a valid solution for example 1?

I1: IF ID EX MEM WB
I2:    IF ID EX  MEM WB
I3:       IF ID  EX  MEM WB
I4:          IF  NOP ID  EX MEM WB 

Would this also be a valid solution for example 1?

I1: IF ID EX MEM WB
I2:    IF ID EX  MEM WB
I3:       IF ID  EX  MEM WB
I4:          NOP IF  ID  EX MEM WB 

Would this be a valid solution for example 2?

I1: IF ID EX  MEM WB
I2:    IF NOP ID  EX MEM WB
I3:       NOP IF  ID EX  MEM WB

Would this also be a valid solution for example 2?

I1: IF ID EX  MEM WB
I2:    IF ID  ID  EX MEM WB
I3:       IF  IF  ID EX  MEM WB


Solution 1:[1]

I know the question is old, but I hope this helps. Stalling and bubbling is used in different context. Stalling represents keeping the register with the same inputs and continuing onto the next stages with same inputs, meaning no output or no regsisters inputs are different, while bubble represent a NOP instruction, that will make the register inputs be 0(If i am not wrong). Now the difference between bubble and stalling you can understand by the following example. In the case of a jump mispredicted branch you have following instructions that you dont know yet if they will be taken or not. For cost purposes of delay and bandwidth the stalling will keep the decode stage in stalling, while if a bubble would be entered you will be back of two stages the fetch and the decode. In the case the branch was good selected and was stalled, then the resume will be already in the decode stage, while if was bubbled the result would have to start again from the fetch instruction. I think that the difference is also of costs, in adding inputs to registers.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 DharmanBot