Parallel Matrix-Matrix Multiplication of Square Matrices using POSIX threads

  1 #include <stdio.h> 
  2 #include <stdlib.h> 
  3 #include <math.h> 
  4 #include <pthread.h> 
  6 #define NUM_THREADS 100        // Can be = 4,16,25,64,100
  8 double **A,**B,**C; 
  9 int n; 
 10 int n_bar;  
 12 struct thread_data 
 13 { 
 14     int blocki;    // Row of Block
 15     int blockj; // Column of Block
 16 }; 
 18 // Function to print matrices 
 19 void print_matrices(){ 
 20     int i,j; 
 21     printf("A MATRIX\n"); 
 22     for( i=0; i<n; i++)
 23         for( j=0; j<n; j++){
 24             printf(" %.0f", A[i][j] ); 
 25             if( j>0 && (j+1) % n == 0 ) printf("\n");
 26         }     
 27     printf("\nB MATRIX\n"); 
 28     for( i=0; i<n; i++)
 29         for( j=0; j<n; j++){
 30             printf(" %.0f", B[i][j] ); 
 31             if( j>0 && (j+1) % n == 0 ) printf("\n");
 32         }     
 33     printf("\nC MATRIX\n"); 
 34     for( i=0; i<n; i++)
 35         for(j=0; j<n; j++){
 36             printf(" %.0f", C[i][j] ); 
 37             if( j>0 && (j+1) % n == 0 ) printf("\n");
 38         }     
 39         printf("\n"); 
 40 } 
 42 //The thread will begin control in this function
 43 void *multiply( void *param ) {
 44    struct thread_data *data = param; // the structure that holds our data
 45     int i=0,j=0,k=0;
 46     int i_start = n_bar * data->blocki;    // Starting point row of underlying matrix 
 47     int j_start = n_bar * data->blockj; // Starting point col of underlying matrix 
 49    for( i=i_start; i < (i_start + n_bar) ;i++)
 50        for ( j=j_start; j < (j_start + n_bar); j++)
 51            for (k=0; k<n; k++)
 52                C[i][j]+=A[i][k]*B[k][j];
 54    //Exit the thread 
 55    pthread_exit(0); 
 56 } 
 59 int main(int argc, char *argv[]) {
 60    int i,j,k,time; 
 61    int t; 
 62    pthread_t threads[NUM_THREADS]; 
 63    pthread_attr_t attr; 
 64    struct thread_data index[NUM_THREADS]; 
 65    void *status; 
 67   pthread_attr_init(&attr); 
 68   pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); 
 70   n = atoi(argv[1]);    // Order of matrices A, B and C (Perfect Square)
 71   n_bar = n/sqrt( NUM_THREADS ); 
 73   //Allocate and fill matrices 
 74   A = (double **)malloc(n*sizeof(double *));
 75   B = (double **)malloc(n*sizeof(double *));
 76   C = (double **)malloc(n*sizeof(double *));
 78   for(i=0;i<n;i++){
 79      A[i] = (double *)malloc(n*sizeof(double));
 80      B[i] = (double *)malloc(n*sizeof(double));
 81       C[i] = (double *)malloc(n*sizeof(double));
 82      } 
 85   for (i = 0; i<n; i++)
 86       for(j=0;j<n;j++){
 87           A[i][j] = rand() % 5 + 1;
 88           B[i][j] = rand() % 5 + 1;
 89           C[i][j] = 0.0;
 90       } 
 91    // Start timer  
 92    time=timer(); 
 93     t=0; 
 94     for(i = 0; i < n/n_bar; i++) {
 95           for(j = 0; j < n/n_bar; j++) {
 96              //Assign a row and column for each thread
 97              index[t].blocki = i; 
 98              index[t].blockj = j; 
 99              pthread_create(&threads[t], &attr, multiply, (void *)&index[t] );
100              pthread_join( threads[t], NULL);
101           } 
102        } 
104     // Print out the matrices 
105 //    print_matrices();         
107    time=timer()-time; 
108    printf("Elapsed time: %f \n",time/1000000.0);
110 pthread_exit(NULL);  
111 } 


Here is a quick tutorial on how to compute NULLABLE, FIRST and FOLLOW sets for a given Context Free Grammar (CFG). What I have below is more an informal methodology on working out the aforementioned sets.

For Nullable

1. Start by looking for all productions with only an “ε” in the Right-Hand Side of the production. Ex: X –> ε. All such non-terminals are termed NULLABLE, hence in our example “X” is NULLABLE.

2. Following the same example as in step 1, start looking for productions where nullable non-terminal X is is being used in the RHS of the production. Ex: Y –> X. Note that its important to pick up only the productions that have ONLY nullable non-terminals on the RHS and not a combination of nullable and other non-terminals. These productions have the non-teminals pointing to nullable non-terminal(s) and hence are also nullable. So in our example, Y is NULLABLE too.

3. Now for the last straw, In a production like say  “Y –> ABC”, Y is NULLABLE if and only if ALL A, B and C are NULLABLE! So ensure that you look for productions where all the non-terminals in the RHS are nullable, then you have yourself another nullable non-terminal.

Now you have a list of all the NULLABLE non-terminals.


1. Write Empty FIRST sets for all non-terminals (variables) in the LHS as we will keep filling up these empty sets as we compute FIRST!

2. Start by tackling the productions which START with a terminal. So look for productions of the form “A –> aβ” and FIRST(A) will contain “a” in it. Nice and Simple! Example: E –> (E). The RHS starts with a terminal “(“, so FIRST(E) contains “(” in it. Remember to keep adding all such terminals to your corresponding FIRST set.

3. Now lets deal with the productions that have Non-Terminals on the RHS only. So we are now looking for productions of the form “A –> Bβ”. Following our example, Say X,Y are NULLABLE and we have a production like  Z –> XYZ

3a. If B is not nullable, then FIRST(A) contains FIRST(B) (when I say contains it means you should copy over all non-terminals from one FIRST set to the other to reflect this) and you can stop processing that production. So in our example, if X was not nullable you could say FIRST(Z) contains FIRST(X) and stop processing. Now for the tricky bit.

3b. When B is nullable, We must compute FIRST(β) which is basically looking for productions of form A –> Bβ recursively and in the process we add to the relevant FIRST sets. So in our example, we would start like so

FIRST(X) contains FIRST(XYZ) = FIRST(X).

Since X is nullable, we continue this search … so FIRST(X) also contains  FIRST(YZ) = FIRST(Y).  We are not done yet!

Since Y is also nullable, we continue … and add to FIRST(X) non-terminals from FIRST(Z). That is it you are DONE!

Note that even if a non-terminal is nullable, you must copy over non-terminals from it’s FIRST set to the FIRST set of the LHS non-terminal.


1. The Follow sets will be formed by spotting non-terminals from the RHS of the production this time. So unlike FIRST sets, we focus on the non-terminals on the right hand side of the production and work backwards (well in a way).

So first step is to look for productions of the form  “A –> … Bβ”.

FOLLOW(B) contains FIRST(β). Note how its FOLLOW(B) not A! Ex: E –> E + S, then FOLLOW(E) contains FIRST(“+ S”) which is the terminal “+”. Don’t forget how to compute FIRST!

2. If β is NULLABLE, then FOLLOW(B) contains FOLLOW(A) in it. Ex:  E –> XY and Y is nullable, then FOLLOW(X) contains all terminals in FOLLOW(E). Also watch out for productions like, E –> X because it can be treated as X being followed by something nullable, and hence FOLLOW(X) contains FOLLOW(E) in our example. Think of it as something NULLABLE right behind X, because X is the very last non-terminal in the production.

Note: You must recursively look for this format of  … Bβ even in a single production to compute the FOLLOW sets of each constituent variable.

Thats it you are done computing FOLLOW sets.

Now for one final example to wrap up all the rules and show you how to apply this methodology.

Consider the grammar below

Z –> d | XYZ

Y –> c | ε

X –> Y | a

Lets start by computing the nullable set.

Y –> ε   => Y is NULLABLE.

X –> Y and Y is nullable =>  X is NULLABLE.

Z is not nullable. So now we have {X,Y} as our NULLABLE non-terminals.

Lets compute the FIRST sets now. Initially FIRST(X) = FIRST(Y) = FIRST(Z) = {}.

Lets start by looking for the productions that start with a terminal first.

X –> a  => a in FIRST(X)

Y –> c => c in FIRST(Y)

Z –> d => d in FIRST(Z)

Now lets deal with the productions that have variables in them.

X –> Y and c in FIRST(Y) => c is also in FIRST(X)

Z –> XYZ and {a,c} in FIRST(X) => {a,c} also in FIRST(Z)

Now for the NULLABLE + FIRST set computation rules,

Z –> XYZ, Nullable X and c in FIRST(Y) => c is in FIRST(Z); no change to FIRST(Z) as we already have c in it!

Z –> XYZ and Nullable (XY) => FIRST(Z) includes FIRST(Z); no change needed here.

So finally, FIRST(X) = {a,c}  FIRST(Y) = {c} and  FIRST(Z) = {a,c,d}

Finally the FOLLOW sets.

Z –> XYZ and c in FIRST(Y) (same as FIRST(YZ) ) => c in FOLLOW(X)

Z –> XYZ, Nullable Y and a,c,d in FIRST(Z) => a,c,d in FOLLOW(X)

Z –> XYZ and a,c,d in FIRST(Z) => a,c,d in FOLLOW(Y)

X –> Y and a,c,d in FOLLOW(X) => a,c,d in FOLLOW(Y)  as FOLLOW(Y) contains FOLLOW(X).

So finally, FOLLOW(X) =  FOLLOW(Y) = {a,c,d} and  FOLLOW(Z) = {}