@title;noquote@ @context;noquote@

@title@

Notes in preparation for adding IMAP to legacy bounce MailDir paradigm

New procs

For imap, each begin of a process should not assume a connection exists or doesn't exist. Check connection using 'imap ping' before login. This should help re-correct any connection drop-outs due to intermittent or one-time connection issues.

Each scheduled event should quit in time for next process, so that imap info being processed is always nearly up-to-date. This is important in case a separate manual imap process is working in tandem and changing circumstances. This is equally important to quit in time, because imap references relative sequences of emails. Two concurrent connections would likely have different and overlapping references. The overlapping references would likely cause issues, since each connection would expect to process the duplicates as if they are not duplicates.

variables useful while exploring new processes like forecasting and scheduling

scan_in_active_p
(don't use. See si_active_cs). Answers question. Is a proc currently scanning replies?
si_active_cs
(don't use. See si_actives_list.) The clock scan of the most recently started cycle. If a cycle's poll doesn't match, it should not process any more email.
si_actives_list
A list of start clock seconds of active imap_checking_incoming procs
scan_incoming_configured_p
Is set to 0 if there is an error trying to connect. OTherwise is set to 1 by acs_mail_lite::imap_check_incoming
replies_est_next_start
Approx value of [clock seconds] next scan is expected to begin
duration_ms_list
Tracks duration of processing of each email in ms of most recent process, appended as a list. When a new process starts processing email, the list is reset to only include the last 100 emails. That way, there is always rolling statistics for forecasting process times.
scan_in_est_dur_per_cycle_s
Estimate of duration of current cycle
scan_in_est_quit_cs
When the current cycle should quit based on [clock seconds]
scan_in_start_cs
When the current cycle started scanning based on [clock seconds]
cycle_start_cs
When the current cycle started (pre IMAP authorization etc) based on [clock seconds]
cycle_est_next_start_cs
When the next cycle is to start (pre IMAP authorization etc) based on [clock seconds]
parameter_val_changed_p
If related parameters change, performance tuning underway. Reset statistics.
scan_in_est_dur_per_cycle_s_override
If this value is set, use it instead of the scan_in_est_dur_per_cycle_s
accumulative_delay_cycles
Number of cycles that have been skipped 100% due to ongoing process (in cycles).

Check scan_incoming_active_p when running new cycle. Also set replies_est_next_start to clock seconds for use with time calcs later in cycle. If already running, wait a second, check again.. until 90% of duration has elapsed. If still running, log a message and quit in time for next event.

Each scheduled procedure should also use as much time as it needs up to the cut-off at the next scheduled event. Ideally, it needs to forecast if it is going to go overtime with processing of the next email, and quit just before it does.

Use duration_ms_list to determine a time adjustment for quitting before next cycle: scan_in_est_dur_per_cycle_s + scan_repies_start_time = scan_in_est_quit_cs

And yet, predicting the duration of the future process is difficult. What if the email is 10MB and needs parsed, whereas all prior emails were less then 10kb? What if one of the callbacks converts a pdf into a png and annotates it for a web view and takes a few minutes? What if the next 5 emails have callbacks that take 5 to 15 minutes to process each waiting on an external service?

The process needs to be split into at least two to handle all cases.

The first process collects incoming email and puts it into a system standard format with a minimal amount of effort sufficient for use by callbacks. The goal of this process is to keep up with incoming email to all mail available to the system at the earliest possible moment.

The second process should render a prioritized queue of imported email that have not been processed. First prioritizing new entries, perhaps re-prioritizing any callbacks that error or sampling re-introducing prior errant callbacks etc. then continuing to process the stack.

Using this paradigm, parallel processes could be invoked for the queue without significantly changing the paradigm.

To reduce overhead on low volume systems, these processes should be scheduled to minimize concurrent operation.

Priorities should offer 3 levels of performance. Colors designate priority to discern from other email priority schemes:

Priority is calculated based on timing and file size

 
set range priority_max - priority_min
set deviation_max { ($range / 2 }
set midpoint { priority_min + $deviation_max }
time_priority =  $deviation_max (  clock seconds of received datetime - scan_in_start_cs ) / 
            ( 2 * scan_in_est_dur_per_cycle_s )

size_priority = 
   $deviation_max * ((  (size of email in characters)/(config.tcl's max_file_upload_mb *1000000) ) - 0.5)

set equation = int( $midpoint + ($time_priority + size_priority) / 2)

Average of time and file size priorities.

hpri_package_ids and lpri_package_ids and hpri_party_ids and lpri_party_ids and mpri_min and mpri_max and hpri_subject_glob and lpri_subject_glob are defined in acs_maile_lite_ui, so they can be tuned without restarting server. ps. Code should check if user is banned before parsing any further.

A proc should be available to recalculate existing email priorities. This means more info needs to be added to table acs_mail_lite_from_external (including size_chars)

Import Cycle

This scheduling should be simple. Maybe check if a new process wants to take over. If so, quit.

Prioritized stack processing cycle

If next cylce starts and current cycle is still running, set scan_in_est_dur_per_cycle_s_override to actual wait time the current cycle has to wait including any prior cycle wait time --if the delays exceed one cycle (accumulative_delay_cycles.

From acs-tcl/tcl/test/ad-proc-test-procs.tcl
    # This example gets list of implementations of a callback: (so they could be triggered one by one)
     ad_proc -callback a_callback { -arg1 arg2 } { this is a test callback } -
    set callback_procs [info commands ::callback::a_callback::*]
    
  

Each subsequent cycle moves toward renormalization by adjusting scan_in_est_dur_per_cycle_s_override toward value of scan_in_est_dur_per_cycle_s by one replies_est_dur_per_cycle with minimum of scan_in_est_dur_per_cycle_s. Changes are exponential to quickly adjust to changing dynamics.

For acs_mail_lite::scan_in,

Keep track of email flags while processing.
Mark /read when reading.
Mark /replied if replying.

When quitting current scheduled event, don't log out if all processes are not done. Also, don't logout if imaptimeout is greater than duration to cycle_est_next_start_cs. Stay logged in for next cycle.

Delete processed messages when done with a cycle? No. What if message is used by a callback with delay in processing? Move processed emails in a designated folder ProcessFolderName parameter. Designated folder may be Trash. Set ProcessFolderName by parameter If empty, Default is hostname of ad_url ie: [util::split_location [ad_url] protoVar ProcessFolderName portVar] If folder does not exist, create it. ProcessFolderName only needs checked if name has changed.

MailDir marks email as 'read' by moving from '/new' dir to '/cur' directory. ACS Mail Lite implementations should be consistent as much as possible, and so mark emails in IMAP as 'read' also.

Email attachments

Since messages are not immediately deleted, create a table of attachment url references. Remove attachments older than AttachmentLife parameter seconds. Set default to 30 days old (2592000 seconds). Unless ProcessFolderName is Trash, email attachments can be recovered by original email in ProcessFolderName. No. Once callbacks are processed, assume any transfer of attachments has occurred, so that processed email can be purged.