• Score job descriptions using GPT-3.5

  • Organize job indices by date, so we can track over time while still caching job descriptions - job descriptions in data/jobs directory - indices in data/search/YYYYMMDD directory - scored jobs, same as above

  • Write page for JobSentry

  • Cache all scoring results, which is the most expensive operation (in $$ terms)

  • Add simple UI using Streamlit

  • Add pros and cons, instead of a single reason

  • Fix exception when parsing a specific malformed job description

Issues

  • UI: search_dir() is cached and so doesn’t update when running UI into next day, past 12am

  • UI: new search takes a long time to get_job_desc(), even if they all seem to be cached - some issue with new caching change?

  • UI: small glitch, the “Analyzing with AI…” progress bar goes back - most likely because we are calling it also from an internal function with different i/n

Improvements

  • pros and cons are very accurate, score needs to be tuned
  • Faster scoring with multiple API requests in-flight
  • Collect stats on token usage, RPM and TPM
  • Look into OpenAI’s batch API
  • Find a way to keep only one Notebook and then export it also as analysis.py script