Josip Zohil, Koper, Slovenija, February, 2015 

          Asynchronous reading of a file in FoxPro 

  

Visual FoxPro (VFP) is usually synchronous. There are multiple functions to read a file in VFP. Usually we open a file and read it line by line or chunk by chunk of bytes. The question is why should we want to solve a problem asynchronously? The synchronous execution blocks the computer CPU and the VFP screen is frozen. On the other side, it is possible to perform a long running task (reading a file) without blocking in an asynchronous way. In this article we shall read a file using a collection of timers. We shall hide the complex asynchronous file reading behind a simple interface. 

          Reading a file 

 

You can read a VFP file in various ways: 

·        Entire file, 

·        Line by line, 

·        Chunk of bytes. 

Using VFP functions to read a file in a synchronous way result in a blocking application (frozen screen, VFP application is unresponsive to events, VFP can't render on the screen, timer events are waiting in the event loop...). Reading a file of 170 MB and processing its content results in a blocking application for two (2) minutes or more!  

Optimizing the reading and processing speed may result in an improvement of, for example, 100%, but also in the optimized solution the screen remains blocked for thirty (30) seconds! 

The similar results we obtain by improving the hardware (processor, disk speed...). 

The main problem is in the way our programs are written: the synchronous execution. On the other side, writing asynchronous code may result in a relatively complicated and error prone code. In dynamic programming languages and event driven systems (like VFP) it is possible, in a relatively easy way, create user defined functions (UDF) and other constructions that run in an asynchronous way. The users (programmers) of this UDF write asynchronous code in a similar way as synchronous (with another interface). The asynchronous complexity is hidden behind the UDF interface.  

The benefits of using this asynchronous constructions are a non blocking screen (reactive VFP) and faster speed of execution. Sometimes, there may be a penalty: a slower  execution; VFP executes also other VFP jobs (mouse move, key press...) or the operating system (OS) gives other work to the CPU. In the synchronous execution (blocking) VFP executes (mostly) one job and the OS »respect« this VFP »priority«! 

As said, asynchronous code is sometimes (ideal condition) faster than the synchronous code. That means, hardware is not a bottleneck. We can improve the speed of reading, processing a file and unblock the VFP application. The cause of blocking is the way our programs are written. The screen is frozen for a minute or more because of bed implemented VFP programs! 

  

          Event driven reading of a file 

  

Let us observe the process of reading a file chunk by chunk (a similar explanation is also for a line by line reading). 

VFP execution  is event driven: VFP »iterates« over its event loop (similar to a do while loop iteration over the records of a DBF file). It is a »never ending« loop (a stream of event handlers). Each loop iteration is a »tick«.  Ticks executes functions (event handlers) in the event queue. The functions were pushed in the event loop by the event emitter (input/output operation, rendering to the screen, timers...).  Each event pushs in the event loop the corresponding event handler (a function). 

A read of a chunk of a file is an event. We have a sequence of events (eventually never ending). The event of reading a chunk has it's event handler.  

 Note. Theoretically, we can read some chunks of the file, then write lines in it, read lines and so on.  

When VFP finds our code to »read a chunk«, it puts the handler (usually a function) in the event loop; pay attention, it does not execute this code immediately: the code is pushed in the event loop. Later, when VFP is not busy (after all other code executes), it executes this function. The event handlers in the event loop are executed in a serial way. The same function (p.e. UDF) may be pushed in the event loop in various ways: by a VFP program or by a timer...  

For example, suppose we have a program: 

Function blockMe() 

Fun() 

Gun() 

Iun() 

endfunc 

VFP pushed in the event loop the blockMe function. When it starts executing, it runs Fun, blocks the VFP, then it executes Gun, VFP is still blocked, then it executes Iun, VFP is blocked. VFP remains blocked during the execution of these three functions; the function blockMe is all the time on the VFP call stack and VFP is unable to perform other actions. 

Suppose now, each of these functions are fired by a timer (three timers). When the first timer fires, the Fun is put in the event loop (first phase), when the second is fired the Gun enters the event loop.... When Fun executes (a second phase) it enters the Call Stack, when it terminates, it goes out of the Call Stack, after that Gun enters the Call Stack.... Between the two entering of the Call Stack, the VFP is able (in certain condition) to react to other events (user actions, timers events ...). 

Let us conclude:  

The function blockMe iterates over three functions, during the execution, it is on the Call Stack and blocks the VFP. 

The execution with the timers has three visits on the Call Stack. The program »iterates« over a collection of three timers; the VFP is blocked for three times for a shorter period. 

From the point of view of a CPU, reading a line (of a file) from a disk is a waiting operation: a CPU is waiting the disk operation to terminate (seek the line position, read the data from the disk...). During this waiting, the CPU may perform another operation.  

  

          Reading a file using a collection of timers 

  

In the preceding section we saw that we have two iterations over a collection of: 

·        Timers and their events (each timer my fire multiple events), 

·        The file's chunks. 

There are a one to one correspondence between the timer events and reading a chunk. 

We are going to write a program that: 

·        Creates a 179 MB file with a name test.txt and random content (see, W.I.P.), 

·        Creates a cursor to store the results, 

·        Spawn a collection of 254 deferred objects (deferedARF) with the corresponding timers (Mt), 

·        Creates a file object (rf) to maintain state. It has a public interface, a function nextChunk and get-functions, 

·        Eventually  stop the iteration (stop reading a file). 

We can look at the timer events as a sequence, we iterate over. At each »iteration« the function nextChunk is called: read a line, process the data and eventually insert the data in a cursor. The file object (oFile) is manipulated by 254 timers in an ordered way (they access the oFile object one after the other - FIFO). 

  

          Asynchronous read of a file 

  

Clear 

Set Escape On 

Close Databases All 

Local lni,lcs,lcs2 

Private loCol As Collection,mt,notimers,tcFileName,ttt, buffSize 

tcFileName = 'test.txt' 

* Generate an 174 MB text file 

lcs = Replicate("Coco jambo",80) + Chr(13) + Replicate("tatu rata",50) + Chr(13) + "tatu rata coco " + Replicate("papa ",80) + Chr(13) 

lcs2 = Replicate(lcs,10000) 

Strtofile(lcs2,m.tcFileName) 

For lni = 1 To 2      &&10    174 MB , more than 170 MB

            Strtofile(lcs2,m.tcFileName,1) 

NEXT 

loCol = Createobject("Collection")  && a collection of defered objects (timers) 

notimers=254 && 8,16,128 number of timers 

buffSize=64000

*tickms=1 && 1 ms interval 

mt=0   

ttt=Seconds() 

OFORM=Createobject("mFORM") 

OFORM.Show() 

Read Event 

Return 

In the first part of a program we have: 

·        Some private variables, 

·        A group of commands that creates a text file test.txt, 

·        A collection object. Later we shall populate it with deferred objects (and timers), 

·        A form with two command buttons to: start and stop the asynchronous file read. 

  

From a file class rf we create an object with a read only interface: 

Define Class rf As Custom

            Protected f_h,tcSearchText,curName,Lineno,endOfFile,fthen,Thisf

*buffSize=64000

            Thisf=Null

            fthen=""

            f_h=""

            tcSearchText=""

            curName=""

            fileSize=0

            charReaded=0

            Lineno=0

            endOfFile=.F.   && resolved

            Procedure Init(f_h,tcSearchText,curName,Thisf,fthen)

            This.f_h=f_h

            This.fileSize= Fseek(This.f_h, 0, 2) && Move pointer to EOF

            Fseek(This.f_h, 0)                  && Move pointer to BOF

            If This.fileSize <= 0               && Is File empty?

                        = Fclose(This.f_h)      && Close the file

                        ?"Error in file-open."

                        Throw "Error in file-open. Empty file."

            Endif

            This.tcSearchText=tcSearchText

            This.curName=curName

            If Vartype(fthen)="C"

                        This.fthen=fthen

            Endif

            If Vartype(Thisf)="O"

                        This.Thisf=Thisf

            Endif

            Endproc

            Procedure getEOF

            Return This.endOfFile

            Endproc

            Procedure getF_h

            Return This.f_h

            Endproc

            Function setThen(F)

            If This.resolved && don't alow changes after resolved

                        Return

            Endif

            This.fthen=F

            Endfunc

            Function getThen()

            Return This.fthen()

            Endfunc

* iterates over a chunks "data structure", is called from a timer

* put "the next function" in the event loop

            Procedure nextChunk  && read a current chunk

            If This.endOfFile

                        Return

            Endif

            Activate Screen

            Local al,rl,gcString,lenGc,Back

            gcString=Fread(This.f_h,buffSize)  && read a chunk of file

            lenGc=Len(gcString)

            al= Alines(aMyArray, gcString)    && push a string in an array of lines

            al=Alen(aMyArray)

*          ?"len",al,lenGc,This.charReaded,this.fileSize

*          ii=0

*If This.charReaded+lenGc<This.fileSize  && not EOF

            If !Feof(This.f_h)    

                        rl=al-1      && If not end of file, ignor the last line

            Else

                        rl=al         && If end of file, include also the last line

            Endif

            For i=1 To rl

*AT(cSearchExpression, cExpressionSearched [, nOccurrence])

                        This.Lineno = This.Lineno + 1

                        content=Upper(aMyArray[i])

*IF AT("COCO",content)>0

                        If This.tcSearchText $ Upper(m.content)

                                    m.lineno=This.Lineno

                                    Insert Into (This.curName) From Memvar

                        Endif

            Endfor

            If !Feof(This.f_h)    && not end of file

                        Back=Len(aMyArray[rl+1]) && last "line" is dropped

                        This.charReaded=This.charReaded+buffSize-Back   && update state

*fs=Fseek(This.f_h,-Back,1)  && move a pointer to the end of the last readed line

                        fs=Fseek(This.f_h,This.charReaded)  && move a pointer to the end of the last readed line

*?"Back",back,fs

            Else

                        This.charReaded=This.charReaded+lenGc  && update the state

            Endif

            If Feof(This.f_h) And Not This.endOfFile     && when end of file is reached

                        This.endOfFile=.T.     && update state

                        Fclose(This.f_h)      && close a file

                        Activate Screen

                        ?"End loop:",Seconds()-ttt,Recno()

                        If Len(Trim(This.fthen))>0 And Vartype(This.Thisf)<>"X" && optional then function

                                    Select (This.curName)  && bind to a grid recordsource

                                    Go Top

                                    Try

                                                With This.Thisf    && bind the object and the function

                                                            Evaluate(This.fthen+"()")  && bind to thisform

                                                Endwith

                                    Catch To ex

                                                ?ex.Message

                                    Endtry

                        Endif

            Endif

            DoEvents       && pause and pass control to the VFP event loop

            Endproc

Enddefine 

This class has the state variables LineNo, charReaded, fileSize and endOfFile. The first is the iteration index, the second register the number of read characters, the third is a constant, the file size and the fourth register the end of file (and terminates the iteration). 

The initial values are: a file handle number (f_h), a string to search (tcSearchText) and the cursor name (curname); in it we insert the results. The second pair of initial values (thisf and fthen) are the function to execute, when the read of file terminates and the object of which this function is a member. This pair is optional.

A procedure nextChunk is called on each line until the end of file is reached (endOfFile=.T.). 

A command Fread(This.f_h,buffSize) reads a chunk of bytes (64 k)  and put the content in a variable gcString. The command  Alines(aMyArray, gcString) split the string gcString in an array aMyArray. In each array element is a line except in the last array's element. It is usually only a part of a line. We count the number of characters in this element: Back=Len(aMyArray[rl+1]) and reduce the number of read characters This.charReaded=This.charReaded+buffSize-Back. We simply ignore the characters of the last line. The exception is the last read chunk (end of file).

We iterate over the array elements. If tcSearchText is inside a string content, the line number and the content are inserted in the cursor.  We move back the pointer of the read file characters.

If the end of file is reached Feof(This.f_h) (the first timer), we close the file. Then we check, if the fthen function is present. In such case we evaluate it.

After that a Doevents is called: it suspends execution of a current function and pass control to a VFP event loop. When VFP returns it resume immediately after this command. If the end of file is reached (Feof(This.f_h)=.T.), we close a file and reports the end of computation (the end of the iteration).  

Other public functions of this class are of access type. 

  

The timer class is a usual timer with a very small interval of 1 ms. It's specific is a mrun function; inside it, we can insert a filter. 

Define Class mt As Timer

            Interval=1

            oparent=Null

            Procedure Init(oparent)

                        This.oparent=oparent

                        Activate Screen

                        This.Enabled=.T.

                        mt=mt+1

                        ?"START",mt

            Endproc

            Procedure mrun(x) && insert code for eventual filter of events

                        If This.oparent.getOfileEOF()

                                    This.Enabled=.F.

                                    This.Interval=0  && stop the timer

                        Endif

            Return x

            Endproc

            Procedure Timer

                        This.Enabled=.F.

                        This.mrun()

                        This.Enabled=.T.

            Endproc

Enddef

If the iteration reachs the end of a file (This.Parent.getOfileEOF()), we stop the timer (interval=0). 

Note. To the function mrun, we could pass other filter condition in a form of a function. 

When the timer event happens, we call the function mrun. Later we shall bind this function to the nextChunk function of the file object. 

We will run the timer with the smallest possible time interval of 1 ms. We can »drop« this interval in smaller pieces using this technique: We push on the event loop multiple timer event handlers with a time interval of 1 ms. For example, we push a first timer T1 on the event loop (for example, using a command T1.enabled=.T.). In the same way we push the second timer in the event loop. The time interval between the two timers initialization is very small (measured in NS – nano seconds).  After 1 ms it fires the event of the T1 timer and it's event handler (function) is pushed in the event loop. Immediately after that the timer T2 fires and its event handler (call back function) is pushed in the event loop. VFP executes serially the event handlers (call back functions) in the event loop. For example, when it is not busy it executes the T1 callback function, after that the eventual functions in the event loop and after  that the T2 callback function. The time interval  between the execution of the two event handlers may be very small (measured in NS). In one ms (one millisecond) two events can fire. Using sixtyfour (64) timers, in one millisecond, sixtyfour events may fire. 

This pushing and execution of the functions in the event loop is relatively cheap operations. 

As you see, this is not a strictly imperative style of programming: Using the timers and doevents we only partially »dictate« the order of execution: we give a VFP engine the possibilities to insert other event handlers in the event loop, in »parallel« with our timers events. 

  

*This class compose a file object and the timer; we shall spawn a collection of "deferred" objects 

Define Class deferedARF As Custom

            Protected resolved,ofile

*Add Object mt As mt

            mt=Null

            resolved=.F.

            ofile=Null

            Procedure Init(ofile,tickms)  && ticks in ms

                        This.ofile=ofile

                        This.mt=Createobject("mt",This)

                        This.mt.Interval=tickms

                        Bindevent(This.mt,"mrun",This.ofile,"nextChunk",1)

            Endproc

            Procedure getOfileEOF

                        Return This.ofile.getEOF()

            Endproc

            Function getOfileF_h()

                        Return This.ofile.getF_h()

            Return

            Function getResolved()

                        Return This.resolved

            Endfunc

Enddef

To the object of type deferedARF we pass a timer (MT) and the file state object (rf). We bind the two added object using a bindevent: when a timer event fires, it calls a mrun function and after that a nexLine is executed. Using this construction, on each timer event a nextChunk is called: we iterate over the groups of file's lines. (Using the timer(s) we iterate over the file). 

The role of this object is also to stop a timer when the end of file (test.txt) is reached. This filter is inside a function mrun. 

The mrun function is not called when we initialize the object, but later (in certain cases never), when the timer event happens, so the name deferred. Also, when mrun fires, nothing useful happen. When the binded function nextChunk executes we eventually obtain a useful »result« (a record in a cursor). In case of eventual errors, we have no meaningful result (In this article we don't consider this case). The eventually inserted record in a cursor curName is called also a future or a promise value. 

  

* all the complexity is inside this function

Function readAsync(filename,tcSearchText,curName,notimers,oformR,fthen)

tickms=1  && 1 ms

Activate Screen

*?filename

ttt=Seconds()

Select (curName)

Delete All

Local f_h

Try

                        f_h=Fopen(filename,10)

                        ofile=Createobject("rf",f_h,tcSearchText,curName,oformR,fthen)

Catch To ex

*          ?ex.Message

                        Throw ex

Endtry

Local i

i=loCol.Count

Do While i>0

                        loCol.Remove(i)

                        i=i-1

Enddo

*** Start (Add) a collection of deferred objects (timers)

For i =1 To notimers  && spawn the timers

                        loCol.Add(Createobject("deferedARF",ofile,tickms))

                        loCol(i).Name="T"+Ltrim(Str(i))

Endfor

Return ofile

Endfunc

The function readAsync is an interface function for reading a file in an asynchronous way and process the content. It creates a file object, ofile and pass it the parameters: a file handle (a reference to a file), a text to search and a cursor name. We have also a pair of optional parameters: oFormR, fthen. Fthen is a function that will be executed after the file is readed. Fthen is a member of the object oFormR.

Note. We can enrich this construction and pass to the function also a function to catch eventual errors and a function to execute when the iteration terminates ( a then function or a promise function), for example: 

readAsync(filename,tcSearchText,curName,notimers, oformR,fthen, ferror). This example (ferreor) is not presented in this article. 

The parameter noTimers means the number of timers we shall spawn to iterate over a file. I obtained a good performance with 4-128 timers (It seams the number of timers doesn't influence the speed of reading a file). In the example in this article we have a collection of a relatively large number of timers (254). 

In the last part of this function, we add the timers to the lCol object created in the first part of this article. 

  

Define Class mform As Form

            Add Object cmdStart As CommandButton With Left=10, Caption="Start"

            Procedure cmdStart.Click

                        Local ofr,curName

                        curName=Sys(2015)

                        Create Cursor (curName) (Lineno i, content m)

                        Local oformR

                        oformR=Thisform

                        Try

*read a file, filter using "COCO", put results in curName and use a number of notimers

                                    ofr=readAsync(tcFileName, "COCO",curName,notimers,oformR,"fthen")

*ofr=readAsync(tcFileName, "COCO",curName,notimers) && without a then function

                        Catch To ex

                                    Activate Screen

                                    ?ex  && a bad error management!!!, pass a function

                                    Throw ex

                        Endtry

                        Return ofr

            Endproc

            Add Object cmdStop As CommandButton With Left=200, Caption="Stop"

            Procedure cmdStop.Click

                        Activate Screen

                        ?"stoped",Recno(),loCol.Count,loCol(1).getOfileF_h()

                        For i =1 To loCol.Count  && spawn the timers

                                    loCol(i).mt.Interval=0

                        Endfor

                        If loCol(1).getOfileF_h()>0

                                    Fclose(loCol(1).getOfileF_h())

                        Endif

*Thisform.Release()

            Endproc

Enddefine

The mform class creates an objects of type form with two buttons: to start reading a file in an asynchronous way; and a button to stop this iteration. 

 

Function fthen()

*SELECT (.curName)

Activate Screen

If Vartype(.grid1)="U"

                        .AddObject("grid1","GRID")

                        .grid1.Move(2,20)

Endif

.grid1.RecordSource=Alias()

.grid1.Visible=.T.

Endfunc

This function will be eventually executed when the process of reading a file will terminate. It will add (bind a grid to the form object)  a grid and it's recordsource. A »value« RecordSource is a promise. This function is optional. We can read the file without this »promise« function.

  

          Tests 

  

When we run the program presented at the beginning of this article, we obtain time intervals in the range 11-12 seconds. If we read this file using the usual synchronous code, we obtain the execution  times in the range 400-500 seconds (1:40) and 18 seconds (1:1.5) if the file is opened in buffered mode (FOPEN (FileName)). The importance of this measurement is very limited as our main goal was to unblock VFP.  It shows us that the asynchronous execution unblock the VFP application with a significant reduction in time execution.

Note. The fastest execution is also a consequence of a larger buffer size (64000 characters).

  

          Conclusion 

  

The asynchronous version doesn't block VFP, it is reactive during a twelve (12) seconds time interval: in »parallel« it reads a large file and executes other user actions. 

The asynchronous read of a file has these characteristics: 

·        A function to call the asynchronous execution is simple (simple interface): 

ReadAsync (tcFileName, "COCO", curName, no timers). 

·        We use this function in a similar way as the synchronous version. 

·        There are no multithreading problems, no dll registration. 

·        The asynchronous version may execute faster than the synchronous.  

VFP is event driven, its event loop executes in an asynchronous way (by default). During a process of a synchronous reading of a file, VFP block, because of bad implemented programs. VFP programmers have the possibilities to write programs that doesn't froze a VFP application for ten seconds or more. The example of this article demonstrates this.