Mythos Proves Potent in Vulnerability Discovery, Less Convincing Elsewhere

In judgment, it rejected false positives better than its predecessors, “but sometimes lost true positives when evidence did not formally satisfy its criteria.” Mythos requires precise prompts for best results.The model exhibits substantial strength in both native code vulnerability discovery and reverse engineering.In the reverse engineering tests, XBOW concluded Mythos is “capable of triaging both its own results and competitor-model findings,” and the model could reason through unusual firmware and embedded systems contexts.XBOW’s visual acuity tests examine the model’s ability to interact with live websites through a browser interface; that is, the ability to identify the right UI element and click in the right place. “It was not perfectly pixel-accurate when asked for exact coordinates, but it was practically effective at selecting the right browser actions,” writes XBOW.There is, however, one statistic that can easily be overlooked by users overawed by the power of Mythos. “Mythos Preview is not just any new model: it’s a true titan. But titans are big, and big means expensive.”At the time of writing, specific costs are not available, although Anthropic has said it will be 5x as expensive as an Opus model. This made XBOW question whether it would be possible to give a cheaper model more time and get more accuracy at less cost.The conclusion was yes. “If we normalize by estimated running cost, the picture is rather clear: Mythos Preview isn’t terribly inefficient, at least if you desire high accuracy, but it’s not best-in-class on our benchmarks either.” For finding web vulnerabilities with a fixed token budget, Mythos outperforms Opus 4.6 but is outperformed by GPT5.5.None of these findings detract from the original fundamental claim. Mythos is better at finding vulnerabilities in code than other models. Overall, however, the primary takeaways from XBOW’s testing are:Mythos is extremely powerful for source code audits.It’s good, but less powerful, at validating exploits.Its judgment is mixed. It can be too literal and conservative and also tends to overstate the practical relevance of its findings.It is strong in native-code vulnerability discovery and reverse engineering.“Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks,” concludes XBOW.Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

The model exhibits substantial strength in both native code vulnerability discovery and reverse engineering.In the reverse engineering tests, XBOW concluded Mythos is “capable of triaging both its own results and competitor-model findings,” and the model could reason through unusual firmware and embedded systems contexts.XBOW’s visual acuity tests examine the model’s ability to interact with live websites through a browser interface; that is, the ability to identify the right UI element and click in the right place. “It was not perfectly pixel-accurate when asked for exact coordinates, but it was practically effective at selecting the right browser actions,” writes XBOW.There is, however, one statistic that can easily be overlooked by users overawed by the power of Mythos. “Mythos Preview is not just any new model: it’s a true titan. But titans are big, and big means expensive.”At the time of writing, specific costs are not available, although Anthropic has said it will be 5x as expensive as an Opus model. This made XBOW question whether it would be possible to give a cheaper model more time and get more accuracy at less cost.The conclusion was yes. “If we normalize by estimated running cost, the picture is rather clear: Mythos Preview isn’t terribly inefficient, at least if you desire high accuracy, but it’s not best-in-class on our benchmarks either.” For finding web vulnerabilities with a fixed token budget, Mythos outperforms Opus 4.6 but is outperformed by GPT5.5.None of these findings detract from the original fundamental claim. Mythos is better at finding vulnerabilities in code than other models. Overall, however, the primary takeaways from XBOW’s testing are:Mythos is extremely powerful for source code audits.It’s good, but less powerful, at validating exploits.Its judgment is mixed. It can be too literal and conservative and also tends to overstate the practical relevance of its findings.It is strong in native-code vulnerability discovery and reverse engineering.“Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks,” concludes XBOW.Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

In the reverse engineering tests, XBOW concluded Mythos is “capable of triaging both its own results and competitor-model findings,” and the model could reason through unusual firmware and embedded systems contexts.XBOW’s visual acuity tests examine the model’s ability to interact with live websites through a browser interface; that is, the ability to identify the right UI element and click in the right place. “It was not perfectly pixel-accurate when asked for exact coordinates, but it was practically effective at selecting the right browser actions,” writes XBOW.There is, however, one statistic that can easily be overlooked by users overawed by the power of Mythos. “Mythos Preview is not just any new model: it’s a true titan. But titans are big, and big means expensive.”At the time of writing, specific costs are not available, although Anthropic has said it will be 5x as expensive as an Opus model. This made XBOW question whether it would be possible to give a cheaper model more time and get more accuracy at less cost.The conclusion was yes. “If we normalize by estimated running cost, the picture is rather clear: Mythos Preview isn’t terribly inefficient, at least if you desire high accuracy, but it’s not best-in-class on our benchmarks either.” For finding web vulnerabilities with a fixed token budget, Mythos outperforms Opus 4.6 but is outperformed by GPT5.5.None of these findings detract from the original fundamental claim. Mythos is better at finding vulnerabilities in code than other models. Overall, however, the primary takeaways from XBOW’s testing are:Mythos is extremely powerful for source code audits.It’s good, but less powerful, at validating exploits.Its judgment is mixed. It can be too literal and conservative and also tends to overstate the practical relevance of its findings.It is strong in native-code vulnerability discovery and reverse engineering.“Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks,” concludes XBOW.Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

XBOW’s visual acuity tests examine the model’s ability to interact with live websites through a browser interface; that is, the ability to identify the right UI element and click in the right place. “It was not perfectly pixel-accurate when asked for exact coordinates, but it was practically effective at selecting the right browser actions,” writes XBOW.There is, however, one statistic that can easily be overlooked by users overawed by the power of Mythos. “Mythos Preview is not just any new model: it’s a true titan. But titans are big, and big means expensive.”At the time of writing, specific costs are not available, although Anthropic has said it will be 5x as expensive as an Opus model. This made XBOW question whether it would be possible to give a cheaper model more time and get more accuracy at less cost.The conclusion was yes. “If we normalize by estimated running cost, the picture is rather clear: Mythos Preview isn’t terribly inefficient, at least if you desire high accuracy, but it’s not best-in-class on our benchmarks either.” For finding web vulnerabilities with a fixed token budget, Mythos outperforms Opus 4.6 but is outperformed by GPT5.5.None of these findings detract from the original fundamental claim. Mythos is better at finding vulnerabilities in code than other models. Overall, however, the primary takeaways from XBOW’s testing are:Mythos is extremely powerful for source code audits.It’s good, but less powerful, at validating exploits.Its judgment is mixed. It can be too literal and conservative and also tends to overstate the practical relevance of its findings.It is strong in native-code vulnerability discovery and reverse engineering.“Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks,” concludes XBOW.Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

There is, however, one statistic that can easily be overlooked by users overawed by the power of Mythos. “Mythos Preview is not just any new model: it’s a true titan. But titans are big, and big means expensive.”At the time of writing, specific costs are not available, although Anthropic has said it will be 5x as expensive as an Opus model. This made XBOW question whether it would be possible to give a cheaper model more time and get more accuracy at less cost.The conclusion was yes. “If we normalize by estimated running cost, the picture is rather clear: Mythos Preview isn’t terribly inefficient, at least if you desire high accuracy, but it’s not best-in-class on our benchmarks either.” For finding web vulnerabilities with a fixed token budget, Mythos outperforms Opus 4.6 but is outperformed by GPT5.5.None of these findings detract from the original fundamental claim. Mythos is better at finding vulnerabilities in code than other models. Overall, however, the primary takeaways from XBOW’s testing are:Mythos is extremely powerful for source code audits.It’s good, but less powerful, at validating exploits.Its judgment is mixed. It can be too literal and conservative and also tends to overstate the practical relevance of its findings.It is strong in native-code vulnerability discovery and reverse engineering.“Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks,” concludes XBOW.Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

At the time of writing, specific costs are not available, although Anthropic has said it will be 5x as expensive as an Opus model. This made XBOW question whether it would be possible to give a cheaper model more time and get more accuracy at less cost.The conclusion was yes. “If we normalize by estimated running cost, the picture is rather clear: Mythos Preview isn’t terribly inefficient, at least if you desire high accuracy, but it’s not best-in-class on our benchmarks either.” For finding web vulnerabilities with a fixed token budget, Mythos outperforms Opus 4.6 but is outperformed by GPT5.5.None of these findings detract from the original fundamental claim. Mythos is better at finding vulnerabilities in code than other models. Overall, however, the primary takeaways from XBOW’s testing are:Mythos is extremely powerful for source code audits.It’s good, but less powerful, at validating exploits.Its judgment is mixed. It can be too literal and conservative and also tends to overstate the practical relevance of its findings.It is strong in native-code vulnerability discovery and reverse engineering.“Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks,” concludes XBOW.Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

The conclusion was yes. “If we normalize by estimated running cost, the picture is rather clear: Mythos Preview isn’t terribly inefficient, at least if you desire high accuracy, but it’s not best-in-class on our benchmarks either.” For finding web vulnerabilities with a fixed token budget, Mythos outperforms Opus 4.6 but is outperformed by GPT5.5.None of these findings detract from the original fundamental claim. Mythos is better at finding vulnerabilities in code than other models. Overall, however, the primary takeaways from XBOW’s testing are:Mythos is extremely powerful for source code audits.It’s good, but less powerful, at validating exploits.Its judgment is mixed. It can be too literal and conservative and also tends to overstate the practical relevance of its findings.It is strong in native-code vulnerability discovery and reverse engineering.“Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks,” concludes XBOW.Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

None of these findings detract from the original fundamental claim. Mythos is better at finding vulnerabilities in code than other models. Overall, however, the primary takeaways from XBOW’s testing are:Mythos is extremely powerful for source code audits.It’s good, but less powerful, at validating exploits.Its judgment is mixed. It can be too literal and conservative and also tends to overstate the practical relevance of its findings.It is strong in native-code vulnerability discovery and reverse engineering.“Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks,” concludes XBOW.Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

“Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks,” concludes XBOW.Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

Related:Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’Related:Claude Mythos Finds Only One Curl Vulnerability; Experts Divided on What It Really MeansRelated:Claude Mythos Finds 271 Firefox VulnerabilitiesRelated:‘Mythos-Ready’ Security: CSA Urges CISOs to Prepare for Accelerated AI Threats

Source: SecurityWeek