This package is structured to allow for simple workflows for connecting to both the forward facing, and back facing postgres databases. Additionally it allows for simple processing of daily collections of articles.
The main function is process_articles. This function reads in raw articles within a given date range, processes them and stores them in the processedarticle database.
process_articles is then broadly broken down into set_up_inputs and create_processed_df.
To use this package, the package WritePostgres.jl must first be installed.
using Pkg
Pkg.add("https://github.com/Baffelan/WritePostgres")This package uses environment variables to access remote postgreSQL servers.
The required environment variables are:
IOMFRNTDB="Forward_Facing_DataBase_Name"
IOMFRNTUSER="User1"
IOMFRNTPASSWORD="User1_password"
IOMFRNTHOST="Forward_Facing_Host_Address"
IOMFRNTPORT="Forward_Facing_Port"
IOMBCKDB="Back_Facing_DataBase_Name"
IOMBCKUSER="User2"
IOMBCKPASSWORD="User2_password"
IOMBCKHOST="Back_Facing_Host_Address"
IOMBCKPORT="Back_Facing_Port"
NEWSAPIKEY="API_KEY"
Additional environment variables when using process_todays_articles are:
IOMALIGNMENTTOKENS=["word1", "word2","word3",...]
IOMBURNINRANGE=[[start_date], [end_date]]
IOMEMBDIM="4"
to do
using JSON
using MakeWebJSON
using Dates
open("user_web_example.json", "w") do f
write(f, JSON.json(j))
end
Note that the only function exported by MakeWebJSON is create_web_JSON.
Given a temporal sequence of text (
To do this, we create a relational network between the words (
We define an edge at time
We then decompose the matrix
We focus on detecting anomalies in the matrix
Then, for each pair of times
To create a benchmark for this algorithm, we generate a set of synthetic data from an observed network, and test the algorithm's ability to detect when the synthetic data has been randomised.
In this way we hope that the synthetic data is created from the same distribution as the real world process that created the original network.
To create our synthetic data, we make exploit a useful feature of the SVD is that we can readily construct a Random Dot Product Graph (RDPG). To do this we take $$ X = L^t {R^t}' $$
A sample of
To create anomalies in the synthetic data, we create a function XXXXX to randomly select a proportion (
When adding noise to a network, we randomly select
We then randomly select
For this type of randomization, we select vertices in a similar way to the above, but instead of setting the entry
At each noise level, we take 51 samples from the RDPG for 50 total distances.
At every second time step we apply noise to the network
The distribution without noise can be viewed as a baseline.
Once we have a distribution for each noise level, we can see how likely they are to overlap with the baseline.