Binding an external library into Pharo

In this post I am going to show you how to call external functions from a Pharo image. Here, we are going to use the LAPACK (Linear Algebra Package) library that is written in Fortran.

Why do we need this library?

In the Pharo AI project (https://github.com/pharo-ai), we are working on an implementation of linear regression. Currently, we are writing the logic completely in Pharo. But, linear regression can be formulated in terms of least squares and Lapack implements efficiently the minimum-norm solution to a linear least squares problem.

So we want to get the best of the two worlds: nice objects in Pharo and call some fast and low-level libraries for crunching numbers. We want to bind the routine dgelsd() that does exactly what we want.

Implementing the binding

We need to have the library already installed in our machines. As a first step, we create a subclass of FFILibrary called LapackLibrary and we need to override the methods: macLibraryName, win32LibraryName and unixLibraryName. In those methods we should return the path in which the library is installed. For MacOS, we override the macLibraryName method as follows:

LapackLibrary >> macLibraryName
	
	^ FFIMacLibraryFinder new 
		userPaths: { '/usr/local/opt/lapack/lib/' };
		findAnyLibrary: #('liblapack.dylib').

And for Windows:

LapackLibrary >> win32LibraryName

	^ FFIWindowsLibraryFinder new
		userPaths: { '/usr/local/opt/lapack/lib/' };
		findAnyLibrary: #( 'liblapack.dylib' )

For using this binding on Linux is only needed to override the remaining method. One can use the class FFIUnix32LibraryFinder or FFIUnix64LibraryFinder.

Now, we are going to create the class LapackDgelsd. We override the method ffiLibrary to return just the class LapackLibrary.

LapackDgelsd >> ffiLibrary

	^ LapackLibrary

Now we can implement the method which will eventually make the FFI call. We saw in the documentation that dgelsd receives 14 parameters, all pointers, that have different types. To make the FFI call we have to use self ffiCall: and inside put the signature of the foreign function.

The variables that are passed must be either local or instance variables. We can specify the type of each of the variables, or we can say that the type is void*. That means that the FFI library is not going to do the mapping to the correct type, but is a responsiblity of the programmer to instantiate the variables with the correct type. So, we get:

LapackDgelsd >> ffiDgelsdM: m n: n nrhs: nrhs a: a lda: lda b: b ldb: ldb s: s rcond: rcond rank: aRank work: work lwork: lwork iwork: iwork info: anInfo
 
	^ self ffiCall: #( void dgelsd_(
		void* m,
        void* n,
        void* nrhs,
    	void* a,
        void* lda,
    	void* b,
        void* ldb,
    	void* s,
    	void* rcond,
        void* aRank,
    	void* work,
        void* lwork,
    	void* iwork,
    	void* anInfo ) )

In the documentation of the routine , we see that we need integer pointers, double pointers, integer arrays, and double arrays. To use pointers in Pharo, we need to use the class FFIExternalValueHolder. It actually will create an anonymous class of the type that we need. To ease the work, we will create a helper class that will create the pointer for us.

We can name the class LapackPointerCreator and that class has to have two class variables: DoublePointerClass and IntegerPointerClass. In the initialize method of the class, we instantiate the value of the class variables to be:

LapackPointerCreator class >> initialize
	
	DoublePointerClass := FFIExternalValueHolder ofType: 'double'.
	IntegerPointerClass := FFIExternalValueHolder ofType: 'int'

And then we create the two helper methods:

LapackPointerCreator class >> doublePointer: aNumber

	^ DoublePointerClass new value: aNumber

LapackPointerCreator class >> integerPointer: aNumber

	^ IntegerPointerClass new value: aNumber

And we will also create an extension method of Collection to convert a collection into n FFI external array.

Collection >> asFFIExternalArrayOfType: aType
	"Converts collection to FFIExternalArray.
	
	Example:
	#(1 2 3) asFFIExternalArrayOfType: 'int'"
	
	| array |
	array := FFIExternalArray newType: aType size: self size.
	self withIndexDo: [ :each :index | array at: index put: each ].
	^ array

Now we are ready to make the call to the Lapack dgelsd() routine. We are going to follow this example of how to use dgelsd() with the same values. First, we need to create all the variables that we are going to pass as arguments.

numberOfRows := 4.
numberOfColumns := 5.
numberOfRightHandSides := 3.

matrixA := #( 0.12 -6.91 -3.33  3.97 -8.19
			  2.22 -8.94  3.33  7.69 -5.12
			 -6.72 -2.74 -2.26 -9.08 -4.40
		 	 -7.92 -4.71  9.96 -9.98 -3.20 ) asFFIExternalArrayOfType: 'double'.

matrixB := #( 7.30  1.33  2.68 -9.62 0.00
		  	  0.47  6.58 -1.71 -0.79 0.00
			 -6.28 -3.42  3.46  0.41 0.00 ) asFFIExternalArrayOfType: 'double'.

reciprocalConditionNumberPointer := LapackPointerCreator doublePointer: -1.

singularValuesArray := FFIExternalArray newType: 'double' size: numberOfRows.
iworkArray := FFIExternalArray newType: 'int' size: 11 * numberOfRows.

numberOfRowsPointer := LapackPointerCreator integerPointer: numberOfRows.
numberOfColumnsPointer := LapackPointerCreator integerPointer: numberOfColumns.

numberOfRightHandSidesPointer := LapackPointerCreator integerPointer:
    numberOfRightHandSides.

leadingDimensionAPointer := LapackPointerCreator integerPointer: numberOfRows.
leadingDimensionBPointer := LapackPointerCreator integerPointer: numberOfColumns.

rankPointer := LapackPointerCreator integerPointer: 0.
infoPointer := LapackPointerCreator integerPointer: 0.

The WORK array is a workspace that is used internally by the dgelsd() routine. We must allocate the memory for this array before passing it to the routine. To calculate the optimal size of the WORK array, we run the dgelsd() routine with LWORK value -1 and WORK as any pointer. After this first execution, the optimal workspace size will be written into the WORK pointer. Therefore, we will be making two FFI calls to the routine: first to calculate the optimal workspace and then to find the solution to the least squares problem.

lworkPointer := LapackPointerCreator integerPointer: -1.
workPointer := LapackPointerCreator doublePointer: 0.
	
dgelsd := LapackDgelsd new.

dgelsd
	ffiDgelsdM: numberOfRowsPointer 
	n: numberOfColumnsPointer 
	nrhs: numberOfRightHandSidesPointer
	a: matrixA 
	lda: leadingDimensionAPointer 
	b: matrixB 
	ldb: leadingDimensionBPointer 
	s: singularValuesArray
	rcond: reciprocalConditionNumberPointer 
	rank: rankPointer
	work: workPointer 
	lwork: lworkPointer
	iwork: iworkArray 
	info: infoPointer.

Now, the variable workPointer contains the value of the optimal workspace. With that information, we run the dgelsd() routine again to solve the problem for matrices A and B.

optimalWorkspace := workPointer value asInteger.
workArray := FFIExternalArray newType: 'double' size: optimalWorkspace.
workArraySizePointer := LapackPointerCreator integerPointer: optimalWorkspace.

dgelsd	
	ffiDgelsdM: numberOfRowsPointer 
	n: numberOfColumnsPointer 
	nrhs: numberOfRightHandSidesPointer
	a: matrixA 
	lda: leadingDimensionAPointer 
	b: matrixB 
	ldb: leadingDimensionBPointer 
	s: singularValuesArray
	rcond: reciprocalConditionNumberPointer 
	rank: rankPointer
	work: workArray 
	lwork: workArraySizePointer 
	iwork: iworkArray 
	info: infoPointer

The result of the computation is stored in several variables. For example, the values of matrixB have been replaced with the minimum norm solution. The effective rank is contained in the rankPointer and the S array contains the singular values of matrix A in decreasing order. The INFO variable contains an integer value that informs us whether the routine succeded or failed (0 = successful exit).

To see the value of a pointer, we can use the value message, for example, rankPointer value.

Improving the API

Make the call to the Fortran routine was a little tricky. We don’t want to give the user the responsibility of creating all of those pointers for using the not-so-nice method signature. So we will use Pharo in our favour.

Actually, we only need to set 5 values of the method, the rest can be calculated internally. What we will do is to have all the needed variables as instance variables of the class LapackDgelsd. Like:'matrixA matrixB numberOfRows numberOfColumns numberOfRightHandSides leadingDimensionA leadingDimensionB singularValues rank info reciprocalConditionNumber workArraySize minimumNormSolution iworkArray'.

Then, we will create the setters for the 5 values that we need the user to insert. Note that the user does not need to create any pointer. They pass only a Pharo Array.

LapackDgelsd >> numberOfColumns: aNumber

	numberOfColumns := aNumber

LapackDgelsd >> numberOfRows: aNumber

	numberOfRows := aNumber

LapackDgelsd >> matrixA: anArray

	matrixA := anArray asFFIExternalArrayOfType: 'double'

LapackDgelsd >> matrixB: anArray

	matrixB := anArray asFFIExternalArrayOfType: 'double'

LapackDgelsd >> numberOfRightHandSides: aNumber

	numberOfRightHandSides := aNumber

Now, as we do not want to give the responsiblity to the user to call the method twice, one for obtaining the optimal workspace and then for solving the actual equation. We will create a solving method that does all the work. We have:

LapackDgelsd >> solve

	| singularValuesArray numberOfRowsPointer numberOfColumnsPointer
      numberOfRightHandSidesPointer leadingDimensionAPointer leadingDimensionBPointer
      rankPointer infoPointer reciprocalConditionNumberPointer workArray
      workArraySizePointer |

	singularValuesArray := FFIExternalArray newType: 'double' size: numberOfRows.
	
	"iwork dimension should be at least 3*min(m,n)*nlvl + 11*min(m,n),
     where nlvl = max( 0, int( log_2( min(m,n)/(smlsiz+1) ) )+1 )
     and smlsiz = 25"
	iworkArray := FFIExternalArray newType: 'int' size: 11 * numberOfRows.
	
	numberOfRowsPointer := LapackPointerCreator integerPointer: numberOfRows.
	numberOfColumnsPointer := LapackPointerCreator integerPointer: numberOfColumns.
	numberOfRightHandSidesPointer := LapackPointerCreator 
        integerPointer: numberOfRightHandSides.
	leadingDimensionAPointer := LapackPointerCreator integerPointer: self leadingDimensionA.
	leadingDimensionBPointer := LapackPointerCreator integerPointer: self leadingDimensionB.
	rankPointer := LapackPointerCreator integerPointer: 0.
	infoPointer := LapackPointerCreator integerPointer: 0.
	reciprocalConditionNumberPointer := LapackPointerCreator 
        doublePointer: self reciprocalConditionNumber. 
	
	workArraySize := self
		findOptimalWorkspace: numberOfRowsPointer 
		n: numberOfColumnsPointer 
		nrhs: numberOfRightHandSidesPointer
		a: matrixA 
		lda: leadingDimensionAPointer 
		b: matrixB 
		ldb: leadingDimensionBPointer 
		s: singularValuesArray
		rcond: reciprocalConditionNumberPointer 
		rank: rankPointer 
		work: nil  
		iwork: iworkArray 
		info: infoPointer.

	workArray := FFIExternalArray newType: 'double' size: workArraySize.
	workArraySizePointer := LapackPointerCreator integerPointer: workArraySize.

	self
		ffiDgelsdM: numberOfRowsPointer 
		n: numberOfColumnsPointer 
		nrhs: numberOfRightHandSidesPointer
		a: matrixA 
		lda: leadingDimensionAPointer 
		b: matrixB 
		ldb: leadingDimensionBPointer 
		s: singularValuesArray
		rcond: reciprocalConditionNumberPointer 
		rank: rankPointer
		work: workArray 
		lwork: workArraySizePointer 
		iwork: iworkArray 
		info: infoPointer.

	minimumNormSolution := matrixB asArray.
	singularValues := singularValuesArray asArray.
	rank := rankPointer value.
	info := infoPointer value.

LapackDgelsd >> findOptimalWorkspace: m n: n nrhs: nrhs a: a lda: lda b: b ldb: ldb s: s rcond: rcond rank: aRank work: work iwork: iwork info: anInfo

	| lworkPtr workPtr |
	lworkPtr := LapackPointerCreator integerPointer:  -1.
	workPtr := LapackPointerCreator doublePointer:  0.

	self ffiDgelsdM: m n: n nrhs: nrhs a: a lda: lda b: b ldb: ldb s: s rcond: rcond rank: aRank work: workPtr lwork: lworkPtr iwork: iwork info: anInfo.
	
	^ workPtr value asInteger

Now, all that we have to do, as users, is:

numberOfRows := 4.
numberOfColumns := 5.
numberOfRightHandSides := 3.
matrixA := #( 0.12 -6.91 -3.33  3.97 -8.19
			  2.22 -8.94  3.33  7.69 -5.12
		     -6.72 -2.74 -2.26 -9.08 -4.40
		 	 -7.92 -4.71  9.96 -9.98 -3.20 ).
matrixB := #( 7.30  1.33  2.68 -9.62 0.00
		  	  0.47  6.58 -1.71 -0.79 0.00
		     -6.28 -3.42  3.46  0.41 0.00 ).

algorithm := LapackDgelsd new
    numberOfRows: numberOfRows;
    numberOfColumns: numberOfColumns;
    numberOfRightHandSides: numberOfRightHandSides;
    matrixA: matrixA;
    matrixB: matrixB;
    yourself.

algorithm solve.

This is definitively better, since it hides the implementation details of the underlying implementation. And for getting the solution, we only need to call these accessors:

"Info represents if the process completed with success"
algorithm info.

"The array with the solutions"
algorithm minimumNormSolution.

"The effective rank"
algorithm rank.

"And the singular values of matrix A"
algorithm singularValues.

Conclusion

As we saw, doing the binding for the first method was the hardest part. But all the next methods will be easier to implement because we can use the same infrastructure.

Here we showed a binding for a widely used linear algebra library. That will help us to speed up the mathematical computations of libraries such as PolyMath (https://github.com/PolyMathOrg/PolyMath/) and Pharo-AI (https://github.com/pharo-ai). Lapack is a huge library so we don’t want to bind all the methods, but only the ones that we need. If people from the community would like us to migrate other methods of the library, we will be happy to do it. We will work on demand. The code and instructions are available here.

Our next step is to use dgelsd() in our linear regression implementation in Pharo for benchmarking against Python and R. Because those languages use also Lapack fortran library and they are considered as industry standards.

Sebastian Jordan Montano

Why do we need this library?

Implementing the binding

Improving the API

Conclusion

Share this:

Related

Leave a comment Cancel reply