Practice Set 6 - SOLUTIONS

1. Consider the following statements.
S1: The RISC processor has CPI always 1.
S2: In horizontal instruction control signals are always in encoded form.
S3: In vertical instruction control signals are always in encoded form.
S4: In terms of speed vertical instruction is slower than horizontal instruction.

Which of the above statements are true?
(a) Only S1 and S2 (b) Only S2 and S3
(c) Only S1, S3 and S4 (d) None of the above

Solution: Option (c)


S1 is true property of RISC architecture.
S2 is false, horizontal instruction control signals are always in decoded form.
S3 is true, vertical instruction control signals are always in encoded form.
S4 speed of vertical instruction is less than horizontal instruction.
[Encoded control signals need to be decoded first and then they are interpreted]

2. Computer uses addressing mode technique for __________.
(a) giving program versatility to user by providing facilities as a pointer to memory counters for
loop control
(b) reducing number of bits in the field of instruction
(c) specifying rules for modifying or interpreting address field of the instruction
(d) All of the above

Solution: Option (d)

Computer uses addressing mode technique for giving program versatility to user by providing facilities as a pointer to memory counters for loop control and to reduce number of bits in the field of instruction. Addressing modes are used in specifying rules for modifying or interpreting address field of the instruction.
So, all options are correct.

3. Which of the following is added to the page table in order to track whether a page of cache has been modified since it was read from the memory?
(a) Reference bit (b) Dirty bit
(c) Tag bit (d) Valid bit

Solution: Option (b)

Dirty bit is used to represent the status of cache whether it has been defined after copying from main memory to cache. Dirty bit = 0 shows no modification and dirty bit = 1 shows modification.

4. Match List-I with List-II and select the correct answer using the codes given below the lists:
List-I List-II
A. Pointer 1. Indirect addressing mode
B. Position independent code 2. Immediate addressing mode
C. Constant operand 3. Relative addressing mode
(a) 1 2 3
(b) 3 2 1
(c) 1 3 2
(d) 2 3 1

Solution: Option (c)

Explanation: (c)

For making use of pointer in programs, indirect addressing mode is used. Pointer stores the address of a variable and indirect addressing mode stores address of effective address in instruction.
Position independent code makes use of relocation concept which is implemented by the use of relative addressing mode which uses relocation register to set the difference of logical and physical address.

Immediate addressing mode provides the value directly in the instruction which is suitable to be used for constant operands of the program.

5. A 50 kbps device is connected to the processor. The interrupt overhead is 50 μsec. The minimum performance achieved when interrupt initiated and data transferred is used instead of programmed I/O is __________.
(a) 2.4 (b) 0.4
(c) 3 (d) 3.5

Solution: Option (b)

Explanation: (b)

1 sec → 50 kbyte
1 byte → 150𝑘=20×10−6𝑠𝑒𝑐=20 𝜇𝑠𝑒𝑐
For interrupt driven mode it takes 50 μsec
So performance achieved when interrupt driven used over programmed I/O

6. Consider the following sequence of micro-operations.

PC ‹ Vector address.
Which of the following operation performed by this sequence?
(a) Instruction fetch (b) Operand fetch
(c) Interrupt subprogram initialization (d) Conditional branch

Solution: Option (c)

PC holds the value of next instruction to be executed. We store the value of PC to MBR and value of stack pointer to MAR. Then store the value of PC which is available in MBR to location addressed by MAR.
Atleast vector address return to the PC. This can be done in interrupt subprogram initialization.

7. We have two designs P1 and P2 for a synchronous pipeline processor. P1 has 8 pipeline stages with execution time of 3 nsec, 2 nsec, 4 nsec, 8 nsec, 2 nsec, 5 nsec, 4 nsec and 1 nsec while design P2 has 5 stages each with 5 nsec each with 5 nsec execution time. How much time (in μsec) can be saved using design P2 over design P1 for executing 500 instructions? (upto 3 digit).
(a) 2.536 (b) 1.365
(c) 1.536 (d) 1.653

Solution: Option (c)
Execution time for pipeline = (k + n – 1) × tp
where k = Number of stages
n = Number of instruction
tp = Execution time = Max (all stages)
P1 = [8 + 500 – 1] × 8 = 4056
P2 = [5 + 500 – 1] × 5 = 2520
Time saved using P2 = 4056 – 2520 = 1536 nsec = 1.536 μsec

8. In which of the following addressing mode, the content of the program counter is added to the address part of the instruction to get the effective address?
(a) Indexed addressing mode (b) Implied addressing mode
(c) Relative addressing mode (d) Register addressing mode

Solution: Option (c)
In relative addressing mode content of the program counter is added to the address part of the instruction to get the effective address.

9. Suppose that a cache is 20 times faster than main memory and cache memory can be used 80% of the time. The speed-up factor that can be achieved by using the cache is __________.
(a) 2.16 (b) 3.16
(c) 4.20 (d) 4.16

Solution: Option (d)
Apply Amdhal’s law
F = 80%, S = 20, overall speed-up = [(1−𝐹)+𝐹𝑆]−1
= 4.16

10. If the last operation performed on a computer with an 8-bit word has an addition in which the two operands were 00000010 and 00000011, what would be the value of the Overflow, Sign and Half-Carry flags respectively?
(a) 0, 0, 0 (b) 0, 1, 0
(c) 1, 0, 1 (d) 0, 1, 1

Solution: Option (a)

0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 1
0 0 0 0 0 1 0 1
Overflow = 0, sign = 0, Half carry = 0
Half carry indicate addition of packed decimal numbers. When carry takes out of the lower digit order, this flag is set. Auxiliary carry is also known as half carry.

11. A 4 byte long PC-relative branch instruction is fetched from memory address 51210 and while its execution, the branch is made to location 88510. What is the unsigned displacement present in the instruction? (relative value)

Solution: 369


4 byte instruction storage
Effective address = PC + Relative value
Relative value = EA – PC
= 885 – 516 = (369)10

12. Consider the following program segment:
Instruction Meaning Size (words)
I1 LOAD r0, 500 r0 ⃪ [500] 2
I2 MOV r1, r0 r1 ⃪ [r0] 1

I3 ADD ro, r1 r0 ⃪ r0 + r1 1
I4 INC r0 r0 ⃪ r0 + 1 1
I5 INC r1 r1 ⃪ r1 + 1 1
I6 ADD r0, r1 r0 ⃪ r0 + r1 1
I7 Store r1, r0 M[(r1)] ⃪ r0 2
I8 Halt Stop 1
Assume that memory is word addressable with word size 32 bits. Program is loaded into memory location (3000)10 onwards. The value of PC at the end of execution of above program is __________.

Solution: 3009

Word addressable storage
3000 – 3001
3007 – 3008
Valid program counter value after program is 3009.

13. A branch mark program is running on a 40 MHz processor. The executed program consists of 100,000 instruction executions, with the following instruction mix and clock cycle count.
Instruction Type Instruction Count Cycles per Instruction
Integer arithmetic 45000 1
Date transfer 32000 2
Floating point 15000 2
Control transfer 8000 2
The execution time in msec is __________.

Solution: 3.87 (3.86-3.88)

3.87 (3.86-3.88)
𝐶𝑃𝐼=Σ(𝐽𝑖×𝐶𝑃𝑖) 𝐼𝐶=[45000×1+32000×2+15000×2+8000×2]105
Execution time = 𝐼𝐶×𝐶𝑃𝐼𝑓=105×1.5540×106=3.87 𝑚𝑠𝑒𝑐.

14. A hypothetical control unit supports 5 groups of mutually exclusive control signals. The number of bits that can be saved using vertical approach compared to horizontal are __________.

Solution: 22
In horizontal: 1 + 5 + 7 + 15 + 8 = 36
In vertical: log 1 + log 5 + log 7 + log 15 + log 8 = 1 + 3 + 3 + 4 + 3 = 14
Total saved bits = 36 – 14 = 22

15. Consider the micro-programmed control unit which support 256 instructions, each of which on an average takes 16 micro operations. The system support 16 flag conditions and 52 control signals. If vertical microprogramming control is used in the system then total length of control word is __________ (bits/word).

Solution: 22
Number of words in control memory = 256 × 16 = 4096 words
⇒ Address field = 12 bits
16 = 24 and 52 < 26

The length of control word =
= 4 bit + 6 bit + 12 bit = 22 bits/ word

16. Consider a non-pipeline processor has clock rate of 25 MHz and CPI of 6, another processor designed with same clock rate and 8 stage instruction pipeline. If program containing 500 instructions is executed on both processors, then the speedup factor is __________.

Solution: 5.91 (5.90-5.92)
5.91 (5.90-5.92) 𝑆𝑝𝑒𝑒𝑑−𝑢𝑝(𝑆)=𝑁𝑜𝑛−𝑝𝑖𝑝𝑒𝑃𝑖𝑝𝑒𝑙𝑖𝑛𝑒=𝑛𝑡𝑛𝑘+𝑛−1
n = 500
tn = 6 (for non-pipeline)
K = 8 (for pipeline) 𝑆=500×6500+8−1=3000507=5.91

17. Consider the following statements:
S1: Compulsory miss can be reduced.
S2: Conflict miss can be reduced.
S3: Capacity miss can be reduced.
Which of the above statements are True?
(a) Only S1 (b) Only S1 and S2
(c) Only S2 and S3 (d) S1, S2 and S3

Solution: Option (d)
S2: Conflict miss are occur when too many blocks are mapped into same line or set. So by increasing the associativity i.e. increases the size of set and increases the number of sets.
S2: Compulsory miss can be reduced by increasing the line size i.e., reduce number of lines.
S3: Capacity miss can be reduced by increasing the cache memory size.
All of the three statements are true.

18. Consider the hypothetical processor which support 512 k words memory. It uses the memory mapped IO configuration. In which when 2 MSB bits of address are 1 then assigned to IO port.
How many numbers of I/O port address and memory addresses are possible in the processor respectively?
(a) 1×217, 3×217 (b) 3×217, 1×217
(c) 2×217, 1×217 (d) None of these

Solution: Option (a)
512 kW = 219 words = 19 bit address
So number of input output addresses = 1 × 217
Number of memory addresses = 3 × 217

19. Which of the following statements are True?
S1: Reference bit in page table entry used for page replacement.
S2: In hierarchical memory access organization, CPU perform read and write operation on only
level 1 memory.
S3: In simultaneous memory access organization, CPU perform read and write operation on any
level of memory.
(a) S1 and S2 only (b) S1 and S3 only
(c) S2 and S3 only (d) S1, S2 and S3

Solution: Option (d)
All the above statements are correct.
S1: Reference bit sometimes called access bit used in page table entry to show if page is replaced or not.
S2: In hierarchical memory access, CPU perform read and write operation only on level 1 memory. If miss occur then data is first transferred to level 1 then CPU access data.
S3: In simultaneous memory access, CPU perform read and write operation on any level of memory i.e. not necessary to take data first into level 1 memory than access it.

20. A computer has 32-bit instruction and 9-bit address. If there are 400 two address instructions then how many one address instructions can be formulated?
(a) 214 (b) 232 – 200
(c) 214 – 400 (d) (214 – 400) × 29

Solution: Option (d)

214 two address instructions are possible.
Here 400 two addresses are needed so (214 – 400) op-codes are free.
We can store (214 – 400) × 29 one address instructions.

21. Which of the following is true?
(a) In write through protocol, cache location and main memory location are updated
(b) In write back protocol, cache location and main memory location are updated simultaneously.
(c) Modified or dirty bits are used by write through protocol.
(d) None of these.

Solution: Option (a)
Write through protocol update cache and main memory simultaneously where write back first cache is updated and marked by dirty bit then main memory is updated.
Dirty bits are used by only write back protocol to know which cache block is updated.

22. A 16KB 4-way set associative write-back cache is organized as multiple blocks, each of size 64-bytes.The processor generates 32-bit addresses. The cache controller maintains the tag information for each cache block comprising with 1 valid bit and 1 Modified bit. As many bits as the minimum needed to identify the memory block mapped in the cache. What is the total size of memory needed at the cache controller to store meta-data (tags) for the cache?
(a) 5362 bytes (b) 5361 bytes
(c) 704 bytes (d) 176 bytes

Solution: Option (c)

 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠=16×210𝐵26𝐵=28 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑡𝑠=2564=64
Total TAG size will be 22 bits
Total size of meta data will be 22×2568𝑏𝑦𝑡𝑒𝑠=704 𝑏𝑦𝑡𝑒𝑠

23. Consider a system with the main memory access time as 200 ns and cache access time as 10 ns. Hit ratio for read request is 0.8 and 80% of the memory requests are for read. If write through policy is used, then the average time considering both read and write requests is __________.

(a) 169.6 ns (b) 192.4 ns
(c) 78.4 ns (d) None of these

Solution: Option (c)


Tmemory = 200 ns Hread = 0.8
Tcache = 10 ns Hwrite = 1 (by default for write through)
fread = 80% fwrite = 20% 𝑇𝑎𝑣𝑔 𝑟𝑒𝑎𝑑=(0.8×10)+(0.2×200)
=8+40=48 𝑛𝑠 𝑇𝑎𝑣𝑔 𝑤𝑟𝑖𝑡𝑒 =1×200=200 𝑛𝑠 𝑇𝑎𝑣𝑔=𝑓𝑟𝑒𝑎𝑑×𝑇𝑎𝑣𝑔 𝑟𝑒𝑎𝑑+𝑓𝑤𝑟𝑖𝑡𝑒×𝑇𝑎𝑣𝑔 𝑤𝑟𝑖𝑡𝑒
=0.8×48+0.2×200=78.4 𝑛𝑠

24. Consider the following instructions.
I1 : R1 = 100
I2 : R1 = R2 + R4
I3 : R2 = R4 + 25
I4 : R4 = R1 + R3
I5 : R1 = R1 + 30
Calculate sum of (WAR, RAW and WAW) dependencies the above instructions.
(a) 10 (b) 12
(c) 6 (d) 8
Solution: Option (c)
Sum = (0 + 3 + 3) = 6

25. A 4-way set associative cache memory consists of 128 blocks. The main memory consists of 32768 memory blocks and each block contains 512 eight bit words. Find how many bits are needed to represent TAG, SET and WORD field respectively?

(a) 5, 9, 10 (b) 10, 6, 8
(c) 10, 9, 5 (d) 10, 5, 9

Solution: Option (d)

Main memory size = 32768 blocks
1 block = 512 words
= 32768 × 512 words
= 215 × 29 = 224 words
Main memory takes 24 bits.
Block size = 512 words = 29 words
Number of bits for block size = 9 bits.
Number of blocks in set associative = 128
Number of blocks in one set = 4
Number of sets in cache = 128/ 4 = 32 = 25
Number of bits in set offset = 5 bits
Number of TAG bits = 24 – (9+5) = 10 bits.

26. Suppose directed mapped cache with 2m lines 2p bytes per cache lines. Memory is byte addressable of 2n bytes. Compute the space required for storing tags (in bits)?
(a) 2n – (m+p) (b) 2m × (n – (m+p))
(c) 2n+m (d) (m+n) × 2m

Solution: Option (b)

Line offset = log2(2m) = m bits
Word offset = log2(2p) = p bits
Address field size = log2(2n) = n bits
Tag bits per line = n – (m+p)
Tag size = Number of cache lines = Number of tag bits per line
= 2m × (n – (m+p))

27. Consider a small two-way set-associative cache memory, consisting of 4 blocks. For choosing the block to be replaced, use the least recently used (LRU) scheme. The number of cache misses for the sequence of block addresses 18, 22, 10, 22, 18 is __________.
(a) 2 (b) 3
(c) 4 (d) 5

Solution: Option (c)
Sequence: 18, 22, 10, 22, 18.
Apply the LRU as follows:

Total number of misses = 4.

28. Consider a pipelined system with four stages: IF, ID, EX, WB. Following chart shows the clock cycles required by each instruction to compete each stage
How many clock cycles are required to complete the above instructions?
(a) 15 (b) 9
(c) 14 (d) 13

Solution: Option (a)

It requires 15 clock cycles.

29. A computer has a cache, main memory and a hard disk used for virtual memory. If referenced word is in cache, 20 ns are required to access it. If it is in main memory but not in cache 60 ns are needed to load it into cache and then reference is started again. If word is not in main memory, 12 ms are required to fetch the word from disk followed by 60 ns to copy into cache, the reference is started again. The cache hit ratio is 0.9 and main memory hit ratio is 0.6. The average time in nano seconds required to access a referenced word on this system is ___________.

Solution: 480026

There are 3 cases to consider
So average access time should be
= 0.9(20) + (0.06)(80) + (0.04)(12000080) = 480026 nsec

30. In an enhancement of a design of a CPU, the speed of a floating point unit has been increased by 30% and the speed of a fixed point unit has been increased by 20%. The overall speedup achieved if the ratio of the number of fixed point operation to floating point operations is 4 : 6 and the floating point operation used to take twice the time taken by fixed point operation in the original design (upto 2 decimal places) is __________.

Solution: 1.27 (1.26-1.28)
Let total operation be 100 and total time taken is t = 100 sec.
Floating point operations are =610×100=60
Fixed point operations are =410×100=40
Let time taken by floating point is t1 and time taken by fixed point is t2. So t1+t2=100 … (1)
Time taken by 60 floating point operation = t1
1 floating operation = 𝑡160
Similarly 1 fixed point operation takes 𝑡240
∵ 𝑡160=2𝑡240
2t1 = 6t2
From equation 1 and 2
𝑡1=6008 𝑎𝑛𝑑 𝑡2=2008
After speed-up 𝑡1′=𝑡11.3=6001.3×8=6000104
𝑆2=10078.52×𝑆1=1.27 𝑆1

31. Suppose that in 1000 memory references there are 150 misses in first level and 100 miss in second level cache. Assume that miss penalty from L2 cache to memory is 120 cycles. The hit time of L2 cache is 50 cycles. If there are 4 memory references per instruction, the average stall per instruction is ___________.

Solution: 78


4 memory references → 1 instruction
1000 memory references → ? instructions.
Number of instructions = 10004=250 [# 𝑚𝑒𝑚𝑜𝑟𝑦 𝑠𝑡𝑎𝑙𝑙𝑠# 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠]=[# 𝑚𝑖𝑠𝑠𝑒𝑠 𝐿1# 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠×ℎ𝑖𝑡 𝐿2]+[# 𝑚𝑖𝑠𝑠𝑒𝑠 𝐿2# 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠×𝑚𝑖𝑠𝑠 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿2]
=[30+48]=78 𝑐𝑦𝑐𝑙𝑒𝑠

32. Consider the machine with a byte addressable main memory of 216 byte, block size of 16 byte and a 2 way set associative mapped cache having 210 lines. Suppose there are two bytes in main memory i.e. first byte [E 01 F]16 and second byte [E 208]16 respectively then the difference of the set value (in decimal) between given two bytes i.e. (SET value of second byte – SET value of 1st byte) is __________.

Solution: 31

Block size = 216 byte = 4 bits
Blocks in main memory = 210
So number of sets = 21021=29⇒9 𝑏𝑖𝑡𝑠
# of bits in physical address= 216 ⇒ 16 bits
SET value1 = 000000001 = Decimal value = (1)10
SET value2 = 000100000 = Decimal value = (32)10
Difference = SET2 – SET1
= 32 – 1 = (31)10

33. Consider two cache organizations. The first one if 64 KB way associative with 64 byte block size. The second one is of the 64 KB direct mapped cache. The size of an address is 32 bits in both organizations. A 4 to 1 multiplexer has latency of 0.8 ns which k bit comparator has latency of k/5 nsec. The difference between the hit latencies of both cache organizations (i.e. associative hit latency – direct mapped hit latency) (in nsec) is ___________.

Solution: 1.2

Number of cache lines = 64 𝑘𝐵64 𝐵=1 𝑘=210
Number of sets = 21022=28
H1 = 18/5 + 0.8 ns
= 3.6 + 0.8 ns = 4.4 ns
Direct mapped cache:
This is no need of multiplexer in direct mapped cache.
So H2 = 16/ 5 nsec
= 3.2 nsec
Difference = H1 – H2
= 4.4 – 3.2 = 1.2 nsec