Thursday 2 March 2017

Lab 7: Inline Assembler

Part A - Class Lab

Using the same code as my post in Lab 5, we are creating another algorithm that makes use of the inline assembly code, specifically SQDMULH.

The following code that our group created:

void adjustVolume8(int16_t* original,float factor,int max){
        register int16_t intFactor asm("w16");
        intFactor=(int)(factor*32767);

        int16_t *x;
        int16_t *loop_max = original+max;
        for(x=original ;x<loop_max ;x+=8)
        {
                __asm__(
                        "LD1 {v0.8h},[%[p]]\n"
                        "DUP v1.8h, w16\n"
                        "SQDMULH v0.8h,v0.8h,v1.8h\n"
                        "ST1 {v0.8h},[%[p]]"
                        :
                        :[p]"r"(x),"r"(intFactor)
                        :
                        );
        }
}


And using the same response times from Lab 5 as well:

 5000000 size sound file, new array, simple: 92.00
5000000 size sound file, original array, simple: 91.00                  *** NAIVE VERSION ***
5000000 size sound file, new array, table: 56.00
5000000 size sound file, original array, table: 56.00                     *** TABLE LOOKUP ***
5000000 size sound file, new array, int hack: 41.00
5000000 size sound file, original array, int hack: 40.00
5000000 size sound file, new array, inline assembly: 0.00
5000000 size sound file, original array, inline assembly: 4.00      *** INLINE ASSEMBLY ***


As we can see, the inline assembly version was HUGELY more efficient than the other versions we had created. This code essentially does the following things:

Reserve a register (w16) for our intFactor value.
Loop through the array we had created in 'original', grabbing several elements at a time.
Use vector registers (to hold 128-bits at a time - thereby going through the array faster)
Multiplying the entire vector register by the same value, so that several values change simultaneously.
Moving those new values back into the original array. 

Just from this process of grabbing several array elements at a time and using the same multiplication on all those values (at the same time) we have increased our efficiency from the table lookup version by 1400% or 14x faster!!

From this lab I learned that using methods to grab several elements from an array is possible and incredibly more efficient. Also, there are methods that can do processes on several array elements simultaneously. I wasn't aware before that I didn't have to go through arrays one element at a time and change each value separately before moving on to the next element. Understanding this kind of processing techniques can definitely improve future performances of programs that I write if I can find a way to apply this knowledge. 



Part B - Individual Task 

Important.png
After Memory Architecture
Do this part of the lab after the class on Memory Architecture.

I will have to wait a bit before doing this part I believe... 

No comments:

Post a Comment