databases - psyphi.net blog

Using historical data to predict future conditions

This article, written by Gwyn Williams G4FKH and published in RadCom December 2011 is © RSGB 2011, reproduced with permission.

BRIEF HISTORY

The RSGB 5MHz Experiment started toward the end of 2002, when the 5MHz Working Group was formed under the chairmanship of John Gould, G3WKL. The current NoVs for access to 5MHz expire on 30 June 2015, but it is possible that Ofcom and MoD will agree to a further extension. The 5MHz Experiment could then run well into the current solar cycle. On the website [1] are various ways in which information can be retrieved from the 5MHz database. In fact there are three separate parts to the database. The first two are concerned with the logging of the three propagation beacons that have been established: the first for manual logging and the second for automated logging. The third part is a compilation of propagation data for the same period. It suddenly occurred to me that the database could be used to forecast propagation.

BASIC CONCEPT

This article and the subsequent web application discuss the automatically logged data and the propagation data. The three beacons in questions are GB3RAL, GB3WES and GB3ORK situated in Maidenhead Locator squares, IO91in, IO84qn and IO89ja respectively. They transmit on a frequency of 5.290MHz every quarter of an hour for one minute each, starting on the hour. Each sequence therefore lasts for three minutes. The logging software was designed by Peter Martinez, G3PLX and is available on the web via the experiment’s main page [1]. To date, the automatic logging part of the database has accumulated over 1.3 million lines of data; the propagation part about one third of this. To perform the forecasts it is necessary to interrogate both of these datasets.
It will be appreciated that there are many stations contributing to the database and therefore there are an equal number of station setups! In order to explain the potential pitfall, I will explain my setup. I have a dipole aerial with a radio and PC. The output from the radio couples to the PC’s soundcard and the program manipulates data from the soundcard. My radio does not provide sufficient output to drive the soundcard, so I have installed an inline amplifier. I can, therefore, adjust my own recorded background/signal strength. Not taking care when analysing the data will cause these differences to become exaggerated. The G3PLX program compensates somewhat for this internally and displays the background signal level and the SNR of the station being monitored. This internal rationalisation and the forecasting program’s own scheme mitigate these differences.

I write small computer programs in the Perl language; during a fairly recent local club meeting I discussed this idea with Roger Pettett, G7TKI, a professional systems developer. My personal experience is not up to his standard and, as Roger kindly volunteered to develop the program in his own time, I quickly agreed. Roger is not au fait with propagation so the collaboration is a good one.

5MHz PROPAGATION

Before I go on to discuss the developed program and its usage, it seems expedient to first outline propagation at this frequency. Most of the propagation at this QRG is via NVIS (near vertical incidence skywave) communication. The E-layer (and probably, during some periods, the F-layer) do influence the longer distance circuits. NVIS for most practical purposes has a range of about 400km; therefore, if a station is set up in the middle of the UK, NVIS would be the dominant mode for the whole of the country. However, we do not live in an idealised world so a method is needed to perform forecasts between any two parts of the UK. To facilitate this database locators are used to correlate distances. The R₁₂ (twelve-month smoothed relative sunspot number) and the A_P (planetary A-index) are used to represent propagation conditions. R₁₂ is automatically retrieved from the internet, but it is necessary to input the A_P [2] that represents the propagation conditions under consideration. The best results within the UK will be achieved when using a 1â„2Î» dipole; the program suggests some aerial dimensions.
The ionosphere is fed by ionisation from the sun and this has a direct correlation with sunspot numbers. The ionosphere requires X amount of ionisation to allow for NVIS communication. For simplicity we can view the A_P as a measure of disturbance in the ionosphere. The higher the A_P, the worse the propagation will become.

Instead of using propagation prediction engines, eg VOACAP, we interrogate the databases to find parity (within parameters) to those conditions input by the user. A more complete introduction to NVIS communications can be found in [3].

5MHz FORECASTING PROGRAM

The completed program can be found at [4] and should be straightforward in use. As mentioned it is only necessary to input the A_P, month, year and locator squares that are to be forecast, the last input tick is for a smoothed graph output. Figure 1 is an example of the input form. It can take a little time to search through the databases to find the correct matches; when the relevant data are found a table like Table 1 is produced along with a graph, Figure 2.

The table, which can be viewed as a median for a whole month, comprises three headed columns. The first column is the hour of the day, the second an SNR (signal to noise ratio) for the hour and the third an explanatory note for the SNR. For the purposes of this dataset an SNR below 20 should be considered as unreadable. The graph is a pictorial view of the table data; the thin green line is where the signal drops out of usability.
It may not be apparent why we need a tool such as the one described, because after all we have experience at this QRG and have other prediction tools at our left hand. What these things do not provide is the ability to visualise propagation when conditions are disturbed or changing. Inputting the correct answers into the input form allows the userÂ to see what conditions will be like when the ionosphere becomes disturbed, such as from Coronal Mass Ejections and solar flares. This ability is lacking in prediction engines, but this program shows to what extent any degree of disturbance should have upon any circuit. However, there is one short-coming suffered by all forecasting tools and that is the ability to see things before they happen, ie a short- wave-fade from a large flare. This effect will last the same amount of time as the flare is visible from Earth (utilising the correct viewing techniques), from a minute or two, to perhaps an hour or so. These effects have a quick onset and an equally quick fall-off period.
Comparatively speaking and from interest the output from this program was compared with REC533, the same prediction engine used to produce the RadCom predictions (to be fair, predictions programs were never really designed for such short distances).
A survey was produced for RadCom in 2000 and REC533 was found to perform best. Comparing the output from REC533 (Table 2 Predicted SNR column) and what was actually copied using the automatic monitoring program (Table 2 Automatic Recording SNR column) showed that REC533 was only 25% accurate. When it is considered that this percentage of accuracy is only obtainable when geomagnetic conditions are quiet it shows a severe shortcoming.

Comparing this to the output from our new program (Table 2 Forecast SNR column) with an accuracy of 67% against actual monitoring, under all geomagnetic conditions, it shows the new program to be far superior to anything previously available. Comparison was carried out with the criteria of the SNR being within 10dB. In order to obtain more accurate results more data is required as well as data from a greater number of stations.
Table 3 and Figure 3 demonstrate the way in which conditions can change in reality. These clearly demonstrate the different conditions when an ionospheric disturbance is at hand. Of course it should be understood that diurnal conditions also have a bearing on the results shown in the table, ie at the very end and very beginning of the day, for some circuits no path is expected. What is happening is that the foF2 (F2 critical frequency) is dropping below the required frequency; this is when the E and F layers can accommodate communication.

SUMMARY

This simple but effective idea can prove useful in all sorts of circumstances, ie for everyday amateur use, emergency situations (RAYNET) and other occasions outside the amateur service where 5MHz is in use, especially when conditions become disturbed during ionospheric upheavals. It is the use of a database with all varieties of propagation conditions that is the winning ingredient in performing accurate forecasts. As the database builds up to a crescendoÂ in 2015, a whole solar cycle and more of information will have been logged and will provide an unsurpassed amount of quality data for this frequency.

WEBSEARCH

[1] www.rsgb.org/spectrumforum/hf/5mhz.php
[2] www.swpc.noaa.gov/ftpdir/weekly/27DO.txt
[3] Near Vertical Incidence Skywave Communication, Theory, Techniques and Validation by LTC, David M Fiedler, (NJ ARNG) (Ret) and MAJ Edward J Farmer, PE (CA SMR). Published by Worldradio Books,
[4] www.rsgb-spectrumforum.org.uk/5mhzpf

This article may also be downloaded in PDF format here:Â 5MHz Propagation Forecasting utilising the Experiment’s Database

rosetta stone — http://www.flickr.com/photos/banyan_tree/2802823634/sizes/m/

Background: I have a big table called “channel” with a few hundred thousand rows in – nothing vast, but big enough to cause some queries to run slower than I want.

Today I was fixing something else and happened to run

show full processlist;

I noticed this taking too long:

SELECT payload FROM channel  LIMIT 398800,100;

This is a query used by some web-app paging code. Stupid really – payload isn’t indexed and there’s no use of any other keys in that query. Ok – how to improve it? First of all, see what EXPLAIN says:

mysql> explain SELECT payload FROM channel  LIMIT 398800,100;
+----+-------------+---------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows   | Extra |
+----+-------------+---------+------+---------------+------+---------+------+--------+-------+
|  1 | SIMPLE      | channel | ALL  | NULL          | NULL | NULL    | NULL | 721303 |       |
+----+-------------+---------+------+---------------+------+---------+------+--------+-------+
1 row in set (0.00 sec)

Ok. So the simplest way would be to limit a selection of id_channel (the primary key) then select payloads in that set. First I tried this:

SELECT payload
FROM channel
WHERE id_channel IN (
    SELECT id_channel FROM channel LIMIT 398800,100
);

Seems straightforward, right? No, not really.

ERROR 1235 (42000): This version of MySQL doesn't yet support
  'LIMIT & IN/ALL/ANY/SOME subquery'

Next!

Second attempt, using a temporary table, selecting and saving the id_channels I’m interested in then using those in the actual query:

CREATE TEMPORARY TABLE channel_tmp(
  id_channel BIGINT UNSIGNED NOT NULL PRIMARY KEY
) ENGINE=innodb;

INSERT INTO channel_tmp(id_channel)
  SELECT id_channel
  FROM channel LIMIT 398800,100;

SELECT payload
  FROM channel
  WHERE id_channel IN (
    SELECT id_channel FROM channel_tmp
  );

mysql> explain select id_channel from channel limit 398800,100;
+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | channel | index | NULL          | PRIMARY | 8       | NULL | 722583 | Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+
1 row in set (0.00 sec)

mysql> explain select payload from channel where id_channel in (select id_channel from channel_tmp);
+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+
| id | select_type  | table       | type            | poss_keys | key     | key_len | ref  | rows   | Extra       |
+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+
|  1 | PRIMARY      | channel     | ALL             | NULL      | NULL    | NULL    | NULL | 722327 | Using where |
|  2 | DEP SUBQUERY | channel_tmp | unique_subquery | PRIMARY   | PRIMARY | 8       | func |      1 | Using index |
+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+
2 rows in set (0.00 sec)

Let’s try a self-join doing all of the above without explicitly making a temporary table. Self-joins can be pretty powerful – neat in the right places..

mysql> explain SELECT payload
  FROM channel c1,
       (SELECT id_channel FROM channel limit 398800,100) c2
  WHERE c1.id_channel=c2.id_channel;
+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref           | rows   | Extra       |
+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+
|  1 | PRIMARY     | derived2   | ALL    | NULL          | NULL    | NULL    | NULL          |    100 |             |
|  1 | PRIMARY     | c1         | eq_ref | PRIMARY       | PRIMARY | 8       | c2.id_channel |      1 |             |
|  2 | DERIVED     | channel    | index  | NULL          | PRIMARY | 8       | NULL          | 721559 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+
3 rows in set (0.21 sec)

This pulls out the right rows and even works around the “no limit in subselect” unsupported mysql feature but that id_channel selection in c2 still isn’t quite doing the right thing – I don’t like all the rows being returned, even if they’re coming straight out of the primary key index.

A little bit of rudimentary benchmarking appears to suggest that the self-join is the fastest, followed by the original query at approximately one order of magnitude slower and trailing a long way behind at around another four-times slower than that, the temporary table. I’m not sure how or why the temporary table performance happens to be the slowest – perhaps down to storage access, or more likely my lack of understanding. Some time I might even try the in-memory table too for comparison.

Category: databases

5MHz Propagation Forecasts utilising the Experiment’s Database