I'm diving head first into AI agents to find up to date info and I realized I need a consistent benchmark to measure results.
So this is part of a series of small tests where we start with a query and compare how different tools provide results.
The tools I chose to compare (not exhaustive):
- Perplexity - an LLM search-engine with up-to-date data from the web
- Google AI Overview discarded because it's hard to reproduce results
- Copilot - LLM with internet connection (similar to ChatGPT with browsing enabled)
- OneQuery - AI agent to browse the web
To keep things consistent, I'm going to paste the raw results from these tools.
1) Task
Find two issues open on the pyppeteer PR
2) Results
Perplexity
The two open issues on the Pyppeteer pull request (PR) relate to functionality and compatibility concerns within the library:
1. Browser Connection Issues: One issue involves difficulties in establishing a connection to the browser when running Pyppeteer on specific platforms like PythonAnywhere. The user reports that while their code works locally, it fails on the server due to problems with the WebSocket endpoint URL, specifically when attempting to call urlopen(url)
on a local address (http://127.0.0.1:46574/json/version
). This indicates that the server's environment does not support socket connections properly, leading to unexpected browser closure errors.
Source
2. Deprecation Warnings in Async Code: Another issue raised pertains to the use of deprecated features in Python 3.8 and later versions. The user has encountered a DeprecationWarning when using asyncio.wait() with coroutine objects, which is set for removal in Python 3.11. They are seeking a more current implementation that adheres to best practices and avoids these warnings, suggesting a transition to using asyncio.gather() instead.
Source
AI Agent (OneQuery)
{
"issues":[
{
"date":"3 weeks ago",
"title":"can not get document object in evaluate function",
"author":"datuizhuang",
"description":"User is unable to get document object in evaluate function, receiving None as result. When using page.evaluate(\\'document\\') or page.evaluate(\"document.querySelector(\\'body\\')\") it returns None, but page.querySelector(\\'body\\') works correctly."
"link": "https://github.com/pyppeteer/pyppeteer/issues/486"
},
{
"date":"2 weeks ago",
"title":"Generate fingerprint",
"author":"krolaper",
"description":"Issue #487 related to fingerprint generation, though specific details are not provided in the given text.",
"link": "https://github.com/pyppeteer/pyppeteer/issues/487"
}
]
}
Copilot
Sure! Here are summaries of two open issues on the Pyppeteer GitHub repository:
1. Issue #487: "Can not get document object in evaluate function" - This issue was opened recently by a user named krolaper. The problem is related to the inability to retrieve the document object when using the evaluate
function in Pyppeteer. This issue is currently open and awaiting a solution.
2. Issue #484: "Chrome version --\u003E (129.0) open a blank browser window in headless mode!" - Reported by Basselfathy, this issue involves Chrome version 129.0 opening a blank browser window when running in headless mode. This problem is causing difficulties for users who rely on headless browsing for automation tasks.
Would you like more details on any of these issues?
Source
3) Compare
Accuracy will be judged on if these tools can return the latest open issues as well as ease of use.
At the time of this post, these are the current open issues:
Tool | Accuracy | Ease of use |
---|---|---|
Perplexity | ❌ - Stale issues | ✅ Simple API ✅ Returns citations ✅ Low latency |
AI Agent | ✅ - Current issues | ✅ Simple API ✅ Returns citations ⚠️ Async instead of instant, takes about a minute |
Copilot | ⚠️ - Got 1/2 issues right | ⚠️ No API ✅ Returns citations ✅ Low latency |
Seems like LLMs are great at getting instant results but because of their large index, it's hard to nail down factual data with accuracy. It might be possible to get "real time" data if prompted with extra context, such as "what are the issues in December 2024". AI agents have an upper hand here since they `browse` the web for you. But due to their async nature, it's easier to use them in the background than as part of a data fetching script.
This was a fun exercise as I learn more about this field, hope it was an interesting read! To keep in mind: the comparisons are judged according to a developer's POV not a regular user.