Friday, June 19, 2026

Kinh Tế Việt Nam

https://www.youtube.com/watch?v=4b_TOkCCBg0 2026: KINH TẾ VIỆT - THÁI ĐỔI NGÔI? CÓ BỀN VỮNG HAY KHÔNG?

The Iran War

https://www.bbc.com/vietnamese/articles/c8x2gdkrqzvo Iran và Mỹ được gì từ thỏa thuận và vì sao hai bên khó giữ cam kết? Amir Azimi June 19, 2026 Hơn 100 ngày sau khi bom đạn của Mỹ và Israel bắt đầu trút xuống Iran, cả hai bên đều tuyên bố chiến thắng – một dấu hiệu cho thấy họ đều rất cần một lối thoát. Thỏa thuận đã chính thức chấm dứt giao tranh, nhưng những cuộc đàm phán khó khăn hơn mới chỉ bắt đầu. Trước công chúng trong nước, cả hai chính quyền đều quảng bá rằng thỏa thuận này là một thắng lợi. Tuy nhiên, như những nhà phân tích mà BBC News phỏng vấn nhận định, chưa bên nào thực sự thuyết phục được dư luận, và các tiếng nói chỉ trích trong nước ở cả hai bên đều cho rằng đã có quá nhiều nhượng bộ. Đối với Iran, thỏa thuận với Mỹ mang lại một điều quan trọng không kém lệnh ngừng bắn: cơ hội khẳng định rằng họ không chỉ sống sót qua cuộc chiến mà không đầu hàng, mà còn trở nên mạnh hơn sau khi bước ra khỏi cuộc xung đột. Ngay từ đầu, mục tiêu cốt lõi của Tehran không nhất thiết là đánh bại Mỹ và Israel về mặt quân sự. Điều họ muốn là bảo toàn nền Cộng hòa Hồi giáo, duy trì bộ máy lãnh đạo và tránh để vị thế đàm phán của mình bị đánh sụm hoàn toàn. Biên bản Ghi nhớ (MoU) – cách mà thỏa thuận này được đề cập tới – cho phép Iran tuyên bố rằng họ đã đạt được mục tiêu đó. Văn kiện, do Tổng thống Mỹ Donald Trump và Tổng thống Iran Masoud Pezeshkian ký riêng, thiết lập khuôn khổ đàm phán kéo dài 60 ngày về chương trình hạt nhân Iran. Đồng thời, văn kiện này xác nhận việc lập tức chấm dứt các hoạt động quân sự trên mọi mặt trận, bao gồm cả Lebanon, cam kết tôn trọng chủ quyền lẫn nhau, mở lại Eo biển Hormuz và Mỹ dỡ bỏ phong tỏa hải quân đối với hoạt động vận tải biển của Iran. Những nghĩa vụ trước mắt của Iran tương đối lớn nhưng cũng khá hạn chế. Tehran đồng ý hỗ trợ bảo đảm an toàn cho hoạt động hàng hải thương mại qua Eo biển Hormuz – vốn là điều bình thường trước chiến tranh; tái khẳng định sẽ không theo đuổi vũ khí hạt nhân; và tham gia đàm phán về tương lai của kho uranium làm giàu ở mức cao cùng chương trình làm giàu uranium. Các cam kết của Mỹ có vẻ rộng hơn. Theo MoU, Washington sẽ bắt đầu dỡ bỏ phong tỏa hải quân, cấp miễn trừ cho hoạt động xuất khẩu dầu mỏ của Iran, cho phép tiếp cận những tài sản đã bị phong tỏa hoặc hạn chế, từng bước nới lỏng trừng phạt, đồng thời phối hợp với các đối tác khu vực nhằm triển khai kế hoạch tái thiết và phát triển kinh tế cho Iran trị giá ít nhất 300 tỷ USD. Điều đó phần nào giải thích vì sao phản ứng chỉ trích từ phía Iran cho đến nay vẫn tương đối dè dặt. MoU đem lại cho giới lãnh đạo đủ cơ sở để trình bày thỏa thuận như một chiến thắng: chủ quyền của Iran được công nhận, phong tỏa dự kiến được dỡ bỏ, triển vọng giảm nhẹ trừng phạt đã xuất hiện và nguồn vốn tái thiết cũng được nêu rõ. Tuy nhiên, sự im lặng đó khó có thể kéo dài. Ngay cả phản ứng đầu tiên của lãnh tụ tối cao Iran Mojtaba Khamenei cũng được tính toán cẩn thận. Ông cho phép thỏa thuận được triển khai nhưng nhấn mạnh rằng nó được chấp thuận trên cơ sở trách nhiệm của Hội đồng An ninh Quốc gia Tối cao Iran. Những vấn đề khó khăn nhất đã được trì hoãn chứ chưa được giải quyết. Tương lai của lượng uranium đã làm giàu ở mức cao, quy mô ngành công nghiệp làm giàu uranium và việc khôi phục các cơ sở hạt nhân bị hư hại sẽ tiếp tục được đàm phán dưới áp lực rất lớn. Điều này tạo ra một vấn đề cho giới lãnh đạo Tehran. Truyền thông nhà nước, lực lượng Vệ binh Cách mạng, Quốc hội và các nhân vật cứng rắn đã dành nhiều tuần để tuyên bố với những người ủng hộ rằng Iran đã đánh bại Mỹ và Israel. Kỳ vọng đang rất cao. Bất kỳ thỏa hiệp nào liên quan đến uranium làm giàu hoặc cơ sở hạt nhân đều có thể bị những người chỉ trích coi là sự nhượng bộ khi mà tuyên bố chiến thắng đã được đưa ra. Nhưng việc không thỏa hiệp cũng nguy hiểm không kém. Nếu Tehran từ chối nhượng bộ về uranium làm giàu hoặc hình thái tương lai của chương trình hạt nhân, tiến trình đàm phán có thể đổ vỡ, thậm chí gây sức ép lên chính lệnh ngừng bắn. Điều đó sẽ củng cố lập luận của những người ở Washington và Israel vốn cho rằng Iran chỉ đang lợi dụng MoU để câu giờ, từ đó có thể đẩy hai bên quay trở lại chiến tranh. Ông Mohammad Bagher Ghalibaf, Chủ tịch Quốc hội đồng thời là trưởng đoàn đàm phán Iran, đã cố gắng mô tả các cuộc thương lượng theo cách đầy thách thức. "Tôi không phải là nhà ngoại giao, nhưng tôi biết rõ cách khiến nước Mỹ phải hiểu ra," ông phát biểu trên truyền hình nhà nước. Phản ứng của Khamenei càng khiến mọi chuyện khó khăn hơn. Ông nói mình "có quan điểm khác về mặt nguyên tắc", nhưng vẫn cho phép thực hiện MoU sau khi Tổng thống Pezeshkian, với tư cách người đứng đầu Hội đồng An ninh Quốc gia Tối cao, chấp nhận trách nhiệm bảo vệ quyền lợi của Iran và các đồng minh. Cách diễn đạt đó giúp ông đủ gần với thỏa thuận để nó có thể được thực hiện, nhưng cũng đủ xa để tránh phải chịu hoàn toàn trách nhiệm nếu nó thất bại. Đối với các nhà đàm phán Iran, điều này có thể thu hẹp không gian thỏa hiệp. Họ phải làm hài lòng Washington mà không tạo cảm giác đã vượt qua những lằn ranh mà chính lãnh tụ tối cao vẫn chưa hoàn toàn chấp nhận. Ngôn từ của Ghalibaf hướng tới công chúng trong nước, nhưng cũng nhằm về phía Washington. Cựu chỉ huy Vệ binh Cách mạng này phải thuyết phục một bộ phận cứng rắn vốn rất nghi ngờ mọi sự thỏa hiệp với Mỹ. Việc so sánh với thỏa thuận hạt nhân năm 2015 là điều khó tránh khỏi. Tại Washington, một số người có thể cho rằng MoU lần này còn tệ hơn cả Kế hoạch hành động toàn diện chung (JCPOA), lập luận rằng ông Trump đã chấp nhận một khuôn khổ cho phép Iran được giảm trừng phạt và hưởng lợi kinh tế trong khi các vấn đề hạt nhân khó khăn nhất lại bị trì hoãn. Tại Tehran, mối nguy lại khác. Những người cứng rắn có thể cáo buộc chính phủ và đoàn đàm phán lặp lại điều mà họ xem là "sự phản bội" năm 2015, khi Tổng thống Hassan Rouhani phải đối mặt với chỉ trích từ các nghị sĩ, truyền thông bảo thủ và đối thủ chính trị vì bị cho là đã nhượng bộ quá nhiều về chương trình hạt nhân. Đối với ông Pezeshkian và ông Ghalibaf, thách thức là biến khuôn khổ ngừng bắn thành một thành công chính trị trước khi làn sóng phản đối đó bùng phát mạnh hơn. Iran đã giành được thời gian, giảm bớt áp lực quân sự trước mắt và mở ra triển vọng nhận được những nhượng bộ về mặt kinh tế lớn. Nước này cũng tránh được kết cục mà Washington theo đuổi công khai nhất: sự đầu hàng hoàn toàn. Tuy nhiên, Iran vẫn chưa đạt được thỏa thuận cuối cùng. Trong ngắn hạn, MoU giúp củng cố vị thế của Tehran vì hệ thống chính trị đã sống sót và Washington đã đưa ra những cam kết rõ ràng. Nhưng rủi ro là 60 ngày tới có thể phơi bày khoảng cách giữa hình ảnh chiến thắng được quảng bá trong nước với những thỏa hiệp thực tế cần thiết để ngăn chiến tranh tái diễn. Iran đã bước ra khỏi chương đầu tiên của cuộc chiến với vị thế mạnh hơn nhiều người dự đoán, nhưng những bước tiếp theo có thể còn khó khăn hơn: duy trì sự ủng hộ của lực lượng chính trị trong nước đối với tiến trình đàm phán đủ lâu để đạt được thỏa thuận cuối cùng, mà không để những sự thỏa hiệp ấy bị coi là sự nhượng bộ hay thậm chí là thất bại. Trump ca ngợi thỏa thuận là 'thắng lợi lớn', giới phê bình cho rằng nhượng bộ quá nhiều Những người chỉ trích thỏa thuận – bao gồm cả một số thành viên trong Đảng Cộng hòa – đã cáo buộc ông Trump nhượng bộ quá nhiều Tổng thống Mỹ Donald Trump ca ngợi thỏa thuận là một "thắng lợi lớn" đối với nước Mỹ vì ông cho rằng nó cuối cùng đã đạt được mục tiêu chiến tranh tổng quát của Washington: ngăn chặn Iran sở hữu vũ khí hạt nhân. Tuy nhiên, trong ngắn hạn, một "chiến thắng" cấp bách hơn chính là việc nền kinh tế toàn cầu được mở lại nhờ Eo biển Hormuz được khai thông. Khi cuộc xung đột kéo dài và Eo biển Hormuz trên thực tế vẫn bị đóng cửa, các cuộc thăm dò liên tục cho thấy người dân Mỹ ngày càng bất mãn với giá xăng tăng cao và những tác động mà cuộc chiến gây ra đối với đời sống của họ. Chính sự không hài lòng về tình hình kinh tế là một trong những lý do quan trọng khiến cử tri đưa ông Trump trở lại Nhà Trắng vào năm 2024. Vì thế, nhận thức rằng cuộc chiến do chính ông phát động đang làm tổn hại túi tiền của người dân đã trở thành một gánh nặng chính trị đối với ông. Dù bản thân ông Trump không phải đối mặt với bầu cử trong cuộc bầu cử giữa nhiệm kỳ vào tháng 11 tới, tâm lý bất an đó lại xuất hiện vào thời điểm khó khăn đối với các nghị sĩ Cộng hòa. Nhiều người trong số họ đang phải đối mặt với những cử tri ngày càng tức giận, cũng như những người có ý định bỏ phiếu ngày càng lên tiếng mạnh mẽ về nguy cơ một cuộc xung đột kéo dài và rơi vào bế tắc. Trong bối cảnh đó, thỏa thuận này mang lại cho ông Trump không gian để xoay xở. Các đồng minh chính trị của ông hy vọng nó sẽ giúp ông xây dựng hình ảnh là người đã nhanh chóng chấm dứt cuộc xung đột và tránh được những sự can dự quân sự kéo dài bất tận ở nước ngoài – kiểu "những cuộc chiến vĩnh viễn" mà ông từng phản đối. Tuy nhiên, những người chỉ trích thỏa thuận – bao gồm cả một số thành viên trong Đảng Cộng hòa – đã cáo buộc ông Trump nhượng bộ quá nhiều. Trọng tâm của những chỉ trích này là cam kết rằng Iran sẽ được hưởng lợi từ quỹ tái thiết trị giá 300 tỷ USD. "Không có chuyện Mỹ chi trả 300 tỷ USD cho Iran. Đó là tin giả," ông Trump viết trên mạng xã hội Truth Social. "Mỹ có được thành công, giá dầu giảm và chiến thắng." Mặc dù ông Trump và các quan chức trong chính quyền đã nhiều lần khẳng định rằng số tiền này sẽ không đến trực tiếp từ Mỹ, nhưng thông tin đó vẫn khiến một số thành viên Đảng Cộng hòa cảm thấy lo ngại. "Lịch sử cho thấy việc trao hàng tỷ đô la cho những kẻ cuồng tín thần quyền muốn giết chúng ta không phải là một ý tưởng hay," Thượng nghị sĩ bang Texas Ted Cruz – vốn là một đồng minh đáng tin cậy của ông Trump – nói với báo The Hill. "Tôi cho rằng tổng thống đang nhận được những lời khuyên rất tệ." Nhà bình luận bảo thủ Tucker Carlson, người vẫn có ảnh hưởng lớn đối với phong trào MAGA dù gần đây thường xuyên chỉ trích chính quyền, còn thẳng thắn hơn: "Đây là một thất bại khá nhục nhã của nước Mỹ," ông phát biểu trong chương trình của mình trên X. "Đây là một thất bại." Đáng chú ý, chính quyền Mỹ cũng buộc phải thừa nhận rằng một số mục tiêu chiến tranh trước đây dường như không còn là ưu tiên và hoàn toàn không được đề cập trong MoU. Chẳng hạn, ở giai đoạn đầu của cuộc xung đột, ông Trump từng tuyên bố quân đội Mỹ sẽ "phá hủy tên lửa của họ [Iran] và san phẳng toàn bộ ngành công nghiệp tên lửa của họ", cho nó bị "xóa sổ hoàn toàn". Tương tự, MoU cũng không đề cập đến mối quan hệ của Iran với các lực lượng ủy nhiệm trong khu vực, bất chấp cam kết của ông Trump hồi tháng Ba rằng Mỹ đang nỗ lực để bảo đảm "chế độ Iran không thể tiếp tục trang bị vũ khí, tài trợ và chỉ đạo các đội quân bên ngoài biên giới của mình". Hiện nay, chính quyền đã bỏ mục tiêu ấy. Phó Tổng thống JD Vance nói với các phóng viên rằng Mỹ "mong đợi" lực lượng Hezbollah sẽ không tiếp tục bắn vào Israel. Ông cũng thừa nhận rằng các lệnh ngừng bắn thường "khá lộn xộn" và việc xuất hiện những đợt bùng phát giao tranh trở lại là điều có thể xảy ra. Chỉ riêng điều đó cũng đủ khiến thỏa thuận trở nên không được lòng những thành viên Đảng Cộng hòa vốn coi cam kết của Mỹ đối với an ninh của Israel là một nguyên tắc bất khả xâm phạm trong chính trị Mỹ. Related: https://www.bbc.com/vietnamese/articles/c4gy2glgpgyo Thỏa thuận Mỹ - Iran: Yếu tố nào đáng chú ý? https://www.bbc.com/vietnamese/articles/cvgqvngpj8vo Hàng ngàn người thiệt mạng trong cuộc chiến Mỹ-Israel chống Iran, nhưng con số thực có thể 'không tưởng tượng nổi' Christine Jeavans & Matt Murphy. BBC Verify Hàng ngàn người đã thiệt mạng trên khắp Trung Đông kể từ khi cuộc chiến Mỹ-Israel với Iran bắt đầu vào tháng Hai, theo các số liệu chính thức, trong bối cảnh một thỏa thuận chấm dứt xung đột hiện đã được thông qua. Theo các báo cáo thương vong chính thức từ Iran và Lebanon, hơn 7.300 người đã thiệt mạng tại hai quốc gia này kể từ ngày 28/2. Trong số đó có hàng trăm trẻ em và hàng chục nhân viên y tế. Ngoài ra, nhiều người khác cũng đã thiệt mạng trên khắp khu vực. Tuy nhiên, một số nhà phân tích cho rằng những con số này gần như chắc chắn vẫn thấp hơn thực tế. Iran Tính đến giữa tháng 4, ít nhất 3.468 người Iran, trong đó có 499 phụ nữ, đã thiệt mạng kể từ khi các cuộc không kích của Mỹ và Israel bắt đầu, theo số liệu chính thức của chính phủ Iran. Theo hãng thông tấn nhà nước IRNA ngày 26/4, con số này bao gồm 1.460 dân thường và 2.008 quân nhân. Tuy nhiên, Hãng tin Các Nhà hoạt động Nhân quyền (HRANA), một tổ chức theo dõi nhân quyền của Iran có trụ sở tại Mỹ, cho biết số người thiệt mạng mà họ thống kê được là 3.636 người. Trong một báo cáo công bố ngày 18/5, HRANA cho biết con số này gồm: • 1.701 dân thường, trong đó có 307 trẻ em; • 1.221 quân nhân; • và 714 người chưa thể xác định danh tính hoặc tình trạng. Tổ chức này nhấn mạnh rằng những con số họ ghi nhận được nên được xem là "mức tối thiểu", bởi việc thu thập thông tin về các trường hợp tử vong bị hạn chế nghiêm trọng do khó tiếp cận hiện trường, tình trạng cúp internet do chính phủ áp đặt và sự đàn áp về chính trị. "Bằng nhiều cách khác nhau, giới chức thường xuyên không công bố thông tin về thương vong, và các gia đình có thể phải chịu áp lực không được công khai lên tiếng về hoàn cảnh dẫn đến cái chết của người thân," bà Skylar Thompson, Phó Giám đốc HRANA, nói. Giới chức Iran cáo buộc Mỹ và Israel đã tấn công hạ tầng dân sự trong các cuộc không kích trên khắp đất nước. Nhiều cuộc điều tra đã kết luận rằng một cuộc tấn công bằng tên lửa của Mỹ ngay trong ngày đầu tiên của cuộc chiến đã đánh trúng một trường học tại thị trấn Minab. Theo các quan chức Iran, vụ việc đã khiến 168 người thiệt mạng, trong đó có 110 trẻ em. Quân đội Mỹ cho biết họ đang điều tra vụ tấn công này. Vài ngày sau, nhà chức trách Iran cho biết một tên lửa đã đánh trúng nhà thi đấu nơi đang diễn ra một trận bóng chuyền nữ ở thị trấn Lamerd, khiến 20 người thiệt mạng. Mỹ bác bỏ cáo buộc liên quan đến vụ tấn công, nhưng các chuyên gia trao đổi với BBC Verify nhận định rằng loại vũ khí được sử dụng nhiều khả năng là Tên lửa tấn công chính xác tầm xa (PrSM) do Mỹ chế tạo. Lebanon …giới chức y tế Lebanon cho biết đã xác định được 3.912 người thiệt mạng trong các cuộc tấn công của Israel, trong đó có 366 phụ nữ và 247 trẻ em. Hiện chưa rõ có bao nhiêu thành viên Hezbollah nằm trong số những người thiệt mạng. BBC Verify đã liên hệ với Bộ Y tế Lebanon nhưng chưa nhận được phản hồi. Trong khi Hezbollah chưa công bố số liệu riêng, Thủ tướng Israel Benjamin Netanyahu cho biết vào tháng trước rằng 3.000 tay súng Hezbollah đã bị tiêu diệt kể từ khi cuộc chiến với Iran bắt đầu. Đầu tháng Ba, Bộ Y tế Lebanon cho biết 41 người đã thiệt mạng trong một chiến dịch không kích và tấn công trên bộ quy mô lớn của Israel quanh một thị trấn ở thung lũng Bekaa ở miền đông nước này. Lực lượng Phòng vệ Israel (IDF) cho biết lúc đó binh sĩ của họ đang tìm kiếm và quy tập hài cốt của một phi công quân sự mất tích trong một cuộc xung đột tại Lebanon cách đây 40 năm. Tuy nhiên, các quan chức Lebanon cho biết ba binh sĩ Lebanon đã thiệt mạng trong chiến dịch này, cùng với một số dân thường và trẻ em. Israel …Giới chức Israel cho biết 60 người đã thiệt mạng, phần lớn do các cuộc tấn công của Iran và các cuộc giao tranh với Hezbollah. Theo số liệu do chính phủ Israel cung cấp cho BBC, trong số này có 29 dân thường, trong đó 21 người thiệt mạng trong các cuộc tấn công bằng tên lửa của Iran. Ngoài ra còn có 31 binh sĩ Israel tử trận trong chiến đấu. Chính phủ Israel cho biết thêm một người khác đã thiệt mạng do hỏa lực nhầm từ phía Israel. Số người chết trên khắp Trung Đông …Việc xác định chính xác tổng số người thiệt mạng trên toàn khu vực là rất khó khăn, bởi không phải tất cả các quốc gia đều công bố số liệu thương vong cộng dồn. Tuy nhiên, các tuyên bố chính thức và thông tin trên báo chí đã ghi nhận có người thiệt mạng tại hầu hết các quốc gia vùng Vịnh. Tại UAE, Bộ Quốc phòng nước này cho biết đã có 13 người thiệt mạng. Tại Iraq, hơn 100 người đã thiệt mạng, theo số liệu do Al Jazeera và AFP tổng hợp. Trong số này, ít nhất 80 người được cho là thành viên của Lực lượng Huy động Nhân dân (PMF), một lực lượng bán quân sự do các nhóm dân quân Hồi giáo dòng Shia thân Iran chi phối. Những người này được cho là đã thiệt mạng trong các cuộc không kích của Mỹ và Israel. Trong khi đó, theo Bộ Quốc phòng Mỹ, 13 quân nhân Mỹ đồn trú tại Trung Đông cũng đã thiệt mạng, gồm: • 7 người trong các cuộc tấn công của Iran; • 6 người trong một vụ rơi máy bay tiếp dầu tại Iraq. Tổ chức Hàng hải Quốc tế (IMO) cho biết 14 thủy thủ thuộc nhiều quốc tịch khác nhau đã thiệt mạng trong các cuộc tấn công nhằm vào tàu thuyền tại Eo biển Hormuz và những khu vực khác ở Trung Đông. Ông Iain Overton lưu ý rằng những hạn chế trong việc tiếp cận hiện trường, hạ tầng bị hư hại và các yếu tố nhạy cảm về chính trị tại một số khu vực ở Trung Đông đã cản trở công tác thống kê thương vong, thậm chí trong một số trường hợp khiến số liệu không được công bố đầy đủ. "Kinh nghiệm từ các cuộc xung đột tại Iraq, Syria và nhiều nơi khác cho thấy số người thiệt mạng cuối cùng nhiều khả năng vẫn sẽ là chủ đề gây tranh cãi, và có thể cao hơn đáng kể so với những con số hiện có," ông Overton nói. https://www.bbc.com/vietnamese/articles/cm203nmz2l2o Công dân Việt Nam chết trong trung tâm giam giữ di trú Mỹ 18 tháng 6 2026 Đầu tháng Tư, ông Tuan Van Bui, 55 tuổi, người Việt Nam, đã gục xuống và tử vong tại Speedway Slammer, một nhà tù an ninh nghiêm ngặt tại quận Miami, bang Indiana, Mỹ. Nơi giam giữ này đã được cải tạo lại và trở thành biểu tượng cho chiến dịch trấn áp nhập cư của chính quyền Trump. Theo thông cáo báo chí của Cơ quan Thực thi Di trú và Hải quan Hoa Kỳ (ICE), ông Tuan Van Bui, được tuyên bố tử vong lúc 6 giờ10 phút chiều ngày 1/4/2026 (giờ địa phương) sau khi nhân viên phát hiện ông bất tỉnh. Mặc dù nhân viên y tế đã can thiệp nhưng họ đã không thể cứu sống ông Tuan, người mắc bệnh tim mạch. Truyền thông Mỹ khi đó đưa tin ông Tuan Van Bui nhập cảnh hợp pháp vào Mỹ năm 1990, theo Đạo luật Hồi hương người Mỹ gốc Á (Amerasian Homecoming Act), đạo luật cấp thị thực cho trẻ em sinh ra ở Việt Nam có cha và người thân trực hệ là người Mỹ. ICE cho biết người đàn ông Việt Nam này chưa bao giờ nhận hoặc nộp đơn xin quốc tịch Mỹ. Thị thực AM-1 của ông có thể giúp ông đủ điều kiện để được cấp thẻ thường trú hợp pháp, thường được gọi là "thẻ xanh". Ông Tuan không phải là trường hợp duy nhất tử vong trong các trung tâm giam giữ của ICE. Tại một trung tâm giam giữ ở Pennsylvania, một người đàn ông Trung Quốc đã được tìm thấy treo cổ chết trong phòng tắm, sau khi từng tự tử bất thành trước đó.

Saturday, June 13, 2026

Software Engineering, Data Science, Measuring Factual Quality in the Age of AI

https://www.normaltech.ai/p/why-ai-hasnt-replaced-software-engineers Why AI hasn’t replaced software engineers, and won’t Coding agents as normal technology Arvind Narayanan and Sayash Kapoor Jun 10, 2026 There is great anxiety and uncertainty about AI replacing jobs. How can we move past vague warnings and bombastic predictions and bring data to bear on this question? One good way is to look at the profession where AI capabilities are furthest along and adoption has been exceptionally rapid: software engineering. In this essay, we argue that there is enough evidence to reject the narrative that once AI capabilities reach a certain threshold, it will cause mass layoffs. Given that this is true even in a sector with very few regulatory barriers, most other professions are likely to be even more cushioned. We also have a good understanding of why this is the case. We can think of many kinds of knowledge work, including software development, as a “decide-execute-deliver sandwich”. AI compresses the “execute” layer — the middle of the sandwich — but the other two layers resist automation in a way that will not be overcome by capability improvements alone. We conclude on a note of cautious optimism about the future trajectory of demand for software engineering. This essay is the first in a series, and the next one will look at reasons why individual software engineers’ careers might be rocky even if overall demand is healthy. The series is based on the published literature in economics and software engineering, our own evaluations and observations of AI agents, and many software engineers’ reflection on the present and future of AI impacts on their profession, gleaned both from published writings and our interactions with the community. The stories of AI-driven mass layoffs in software seem to be classic “AI washing” Consider three stories that made the headlines and how they contrasted with reality: • In February, fintech company Block (maker of Cash App, Square, Afterpay, and other such apps) announced layoffs of 4,000 employees because, according to founder Jack Dorsey, AI is “enabling a new way of working” with “smaller and flatter teams”, specifically citing late-2025 improvements in model capabilities. But subsequent reporting revealed a radically different picture. After growing headcount more than threefold during the pandemic, the company was under massive financial pressure. A data scientist on the Cash App team, Naoko Takeda posted that Block “shoved AI down everyone’s throats” yet she saw “very limited gains in productivity.” She refused a 75% retention raise and quit. Other employees interviewed had a sharply different understanding of what AI was capable of at Block and whether Dorsey had a competent understanding of the issues. As Aaron Levie has pointed out, CEOs are uniquely prone to delusions about AI’s usefulness because they can build quick prototypes but can’t see the 90% of work it takes to turn it into a finished product. Dorsey’s public statements about AI seem to fit exactly this pattern. • In April, Snap laid off about 1,000 people, with CEO Evan Spiegel primarily citing AI as the reason in his layoff memo. He also said that AI generated 65% of new code. In reality, the layoffs followed a campaign by an activist investor demanding cost cuts. (Snap has posted a net loss every full year since its 2017 IPO and shares were down over 30% in 2026). Tellingly, the nature of the cuts, such as 150 jobs spanning various roles in the augmented reality division, don’t correlate with the cuts we would expect to see if they were driven by AI (i.e. programming and other “AI-exposed” jobs across the board, not concentrated in any unit). • In May, Intuit announced 3,000 cuts, alongside deals with Anthropic and OpenAI. The press connected the two, framing the layoffs as AI-driven restructuring. For once, the CEO actually pushed back on this easy narrative, saying that “none of it had to do with AI” and that the cuts targeted “coordination-heavy roles” and too many management layers. We did not cherry-pick these examples. In every story about AI-driven software engineering layoffs that we examined, the same narrative violation emerged. It turns out that “AI washing” of job cuts is an economy-wide phenomenon, evidenced by many surveys: • 59% of U.S. hiring managers admitted they emphasize AI when explaining hiring freezes or layoffs because it plays better with stakeholders than citing financial constraints. • Forrester principal analyst J. P. Gownder says of companies preparing supposedly AI-driven layoffs: “When we ask if they have a mature, vetted AI app ready to fill in those jobs, nine out of 10 times, the answer is no—and they haven’t even started.” • In a HBR survey of over 1,000 global executives, 21% had made large headcount reductions “in anticipation of” AI, with another 39% having made low or moderate anticipatory headcount reductions. In contrast, only 2% had already made large reductions in headcount related to actual AI implementation. The 10x gap suggests that executives, like everyone else, are highly prone to succumbing to the misleading narratives about AI replacing jobs. Another interesting data point comes from the WARN Act, which requires certain disclosures of plant closings and mass layoffs affecting over 100 workers. In March 2025, New York became the first U.S. state to add an AI disclosure checkbox to WARN Act filings. In the full first year, more than 160 companies filed WARN notices. Not a single one checked the AI box.1 We reached out to the NY Department of Labor who confirmed that as of late May, only one company, Nespresso, checked the box.2 If these filings are accurate, only 46 out of about 25,000 laid off workers in New York State in the relevant period, or about two-tenths of a percent, were affected by AI. Even more damning for the AI-driven-mass-layoffs narrative: layoffs are the wrong signal of AI’s potential productivity benefits in the first place! The research is clear that the effect operates through “slower hiring rather than increased separations”. Firing existing workers results in the loss of precisely the tacit knowledge and organizational capital that allows workers to operate AI effectively. Besides, it is expensive in terms of severance, damage to morale, and rehiring risk. Given these costs, it is largely unnecessary given that natural turnover achieves the same result in a few years. So what does the data tell us when we look beyond layoffs to overall employment trends? An important paper from Federal Reserve economists compiles the evidence in the U.S. context. Software engineer employment is still growing, but they find that it is growing slower post-ChatGPT compared to a no-AI counterfactual, by about 3 percentage points per year. One important limitation of this study is that the methodology can’t capture self-employment, so it is possible that some of the slowdown in growth is being absorbed by entrepreneurship instead. We do have evidence from other studies that AI makes entrepreneurship easier. So the real picture is probably even healthier than the Federal Reserve study suggests.3 Finally, it is worth acknowledging two kinds of indirectly-AI-driven job losses in software engineering that are real, but different from AI replacing software engineers. First, AI sometimes decimates demand for the product, in cases like Chegg (homework help) or Stack Overflow (technical help), both of which have laid off workers. AI doesn’t directly do the job that these workers did, but rather obviates the need for it. The historical parallel is strong: Among the 270 jobs in the 1950 U.S. census, only one job was automated away — elevator operator. But many others were rendered obsolete by new technology, like the job of telegraph operator. Another credible AI-driven layoffs story is among companies that sell AI, rather than buy it. So when companies like IBM or SAP announce layoffs because of AI, a more accurate framing is “we reallocated headcount from legacy functions to our fastest-growing product line.” That’s ordinary corporate restructuring around a revenue opportunity, not technology displacing workers. Why coding agents haven’t led to labor displacement: the decide-execute-deliver sandwich Many tech leaders, like the Snap CEO above, report the percentage of code written by AI alongside reports of layoffs or predictions of future job losses. This feeds into the simplistic mental model that once AI writes all the code, there is no need for coders. Fortunately, this mental model is wrong. This AI-written-code metric is almost completely disconnected from what matters for labor displacement. Here’s why. Writing code isn’t, and never was, the bottleneck. For example, a 2019 paper summarized existing studies with the conclusion that “developers spend surprisingly little time with coding, 9% to 61% depending on the study”. This finding was consistent with the paper’s own data from 6,000 developers at Microsoft. As coding agents began to be taken up, there was an explosion of blog posts in late 2025 pointing out that writing code isn’t the bottleneck, as developers realized that using agents to write most of the code led to little impact on overall productivity [1, 2, 3, 4, 5, 6, 7, 8]. If writing code isn’t the bottleneck, what is? The task-breakdown surveys point at things like meetings or debugging. This just leads to more questions: what are developers doing in those meetings and why can’t it be done by AI? Won’t debugging get automated as capabilities improve? To understand the real bottlenecks, we have to get qualitative, and dig into software engineers’ own understanding of what it is they do that resists automation. When we did this analysis, it revealed three things as the real bottlenecks (1) deciding and specifying what to build, (2) verifying and being accountable for what is delivered, and (3) the deep human understanding — of the codebase, the business, and the environment — required to carry out both of these. In other words, software engineers’ work consists of a “decide-execute-deliver” sandwich (with understanding being a prerequisite for all three). AI has compressed the middle of the sandwich, but has left the two ends largely unchanged. As long as software development teams are in charge of decision making and accountable for what they deliver, engineers still need to spend time building up a deep understanding of the system. These are the three bottlenecks. Figure: Software development consists of three layers: (1) Decision making — problem framing, specification, planning (2) execution — design and implementation (3) delivery — testing, verification, integration, maintenance, etc. Note that these are conceptual layers, not temporal phases. It is common to switch back and forth in the course of a project. Evidence for the sandwich model of AI’s productivity effects comes from a recent paper on “Writing Code vs. Shipping Code”. Across 100,000 developers on GitHub, the researchers found that AI agents led to an eight-fold increase in the number of lines of code written, consistent with the idea that AI almost completely compresses the Execute layer of the sandwich. But this led to only 30% more releases, strongly suggesting that human bottlenecks (the Decide and Deliver layers) remain in place.4 Can the sandwich be further compressed? We don’t think so. At one end of the pipeline, development teams need to decide what to build. One of the most important lessons junior software engineers learn is that requirements specification (the profession’s lingo for this layer) takes surprisingly long, and if it is compressed, it leads to much more pain down the line. This layer is hard to automate because it requires thinking about user needs, market signals, organizational priorities, and in some cases regulatory constraints. As AI capabilities improve, the kinds of decisions that can be delegated to AI increase over time. But this does not make the “decide” layer thinner — once a decision can be delegated to AI, it is no longer a source of competitive advantage, and the value of human decision-making migrates upward. Software increases in complexity over time, so there is no ceiling to this process. At the other end of the sandwich, human teams need to be accountable for what they deliver. It is possible that some day in the future teams will ship mission-critical code without fully testing and understanding it, but today’s AI is so unreliable that such haphazard practices would represent an existential threat to software teams and their customers. Even if the technical barriers go away in the future, we don’t have to cede control to AI. A central insight of AI as Normal Technology is that we can collectively choose to keep humans accountable through shared norms, law, and policy. This is a much more resilient way to control the speed of AI impacts and improve safety than trying to slow the development of technical capabilities. These speed barriers are already largely in place due to liability laws and sector-specific regulation, but can be further strengthened. (For a longer version of this argument, see the original essay.) In this vision, as more and more of the execution layer gets delegated to AI, the software engineer’s role in the future becomes analogous to that of a crane operator. AI agents will do most of the cognitive heavy lifting; supervising the agent and keeping it in control becomes most of the human’s job. Some commentators argue that a future with humans staying in control is unlikely because it is too costly to pay people to do so. There have already been a few viral stories of poorly-supervised coding agents deleting production databases or causing other types of damage. But we view these as “man bites dog” stories rather than an emerging norm. They go viral precisely because they represent such irresponsible and unusual behavior that they have shock value, and serve as regular reminders and learning moments helping the community guard itself against over-reliance on AI. As the aphorism goes, “if it’s in the news, don’t worry about it”. Still, being able to detect whether there is an uptick in poorly-supervised use of AI for high-stakes tasks — across the economy, not just in software engineering — remains one of the most critical data gaps we have today. By the way, the sandwich getting squished is a new trend and it is not uniquely due to AI. Over two decades ago, the Bureau of Labor Statistics started tracking programming separately from software engineering. Roughly speaking, programmers are responsible only for execution while software engineers manage a bigger part of the sandwich. Not only has programming been shrinking, it is also pays much less because it is seen as grunt work. AI merely accelerates this long-existing trend, further devaluing purely technical skills. Software engineering versus programmer employment. Chart by The Washington Post. This pattern — where humans remain heavily involved at both ends of the decide-execute-deliver sandwich, even as AI increasingly automates the middle layer, seems to be broadly applicable to most knowledge work, though it is farthest along in software. After all, complex decision making and accountability are common to most fields. A lack of recognition of this phenomenon has led to many overconfident predictions about imminent job losses, such as among radiologists. Vibe coding is not agentic engineering One reason for confusion about the extent to which software engineering is changing is the sloppy use of the term “vibe coding” to refer to a wide spectrum of practices, the ends of which are conceptually distinct and more dissimilar than similar. In true vibe coding the user simply tells the agent what to do, doesn’t supervise it when it’s running, doesn’t review the code — might not even have the skills to do so — and doesn’t evaluate the output, beyond perhaps noticing when things are visibly broken. This is in contrast to how most software engineers are actually using agents — as a tool, with the human remaining in control and accountable for the output. Fortunately, the term agentic engineering is gaining currency as a descriptor of this practice. As agentic engineering has become the norm, engineers are discovering that supervising coding agents is surprisingly time consuming. For example, Simon Willison, a prominent developer and chronicler of the AI transition, has noted how he is mentally exhausted by 11am from supervising agents. This is consistent with our experience as well. More quantitative evidence comes from SWE-chat, a dataset of coding agent interactions from open-source developers who opted into a logging tool. The study found that only 44% of agent-produced code survives into user commits, that vibe-coded commits introduce vulnerabilities at nine times the human-only rate, and that the most common user intent is understanding existing code, not generating new code (19% vs 13%). The self-selected nature of the dataset means that we can’t draw strong conclusions based on this study alone, but it does reinforce many other lines of evidence that vibe-coding and agentic engineering patterns are quite different. Agentic engineering is not vibe coding To re-iterate, these are not two distinct categories. They are two ends of a spectrum, and there is a blurry middle. Not every project is either a throwaway or mission-critical. Not every workflow fits precisely in the left column or the right column of the table. But the key implication for the jobs question remains solid — companies can’t ship production software by hiring unqualified vibe coders instead of software engineers. What does the future hold? AI boosters might claim that mass layoffs are coming; they just haven’t happened yet because human-level software engineering abilities are very recent (or haven’t been achieved yet). But if the sandwich model is correct, these predictions won’t come true. AI has already largely compressed the middle of the sandwich (and the compression actually started decades ago). So even making the execution layer instant and perfect will only be a small change from the status quo. The reasons why the other two layers have resisted AI is not because of capability limitations. In fact, not only are software engineering jobs not going away due to AI, there might even be an increase in demand for software engineers. When software (or anything else) gets cheaper to create due to technological productivity improvements, people will buy a lot more software (in econ jargon, software is highly “price elastic”). And as we have argued, AI doesn’t replace software engineers (the “elasticity of substitution” is low), so the demand for more software results in a derived demand for more software engineers. A loosely related but flashier economics term, “Jevons’ paradox”, is often thrown around in the AI discourse to describe this concept. Historically, this has been the pattern — programmer employment in the U.S. has grown from near-zero around 1950 to millions today. This is sharply different from occupations such as agriculture in which labor demand was famously decimated due to mechanization and automation. The difference is that the amount of calories people consume is relatively fixed — even a 25% increase led to the obesity epidemic — whereas the amount of software produced has grown a millionfold. Modern cars have something like a hundred million lines of code running on their various on-board computers. If there is a ceiling to the demand for code, we are nowhere near it. Virtually all cognitive work benefits from software. As AI makes coding cheaper, people are creating all kinds of one-off utilities — whether for work or personal use — that it never made sense to create until now. To be clear, while we think there will be a lot more software in the future, and likely more software engineers, this doesn’t mean big tech companies will get even bigger. The majority of software engineers today already work in-house in non-software firms, and that share might grow in the future. Then there’s the idea of “AI rollups”, which refers to venture capital or private equity firms buying “Main street” businesses — dentistry practices, accounting firms, and whatnot — and rebuild them from the ground up to be “AI-native” by embedding software engineers or AI engineers into those businesses. Of course, it might end up being nothing more than hype. It’s too early to tell. Some people predict that demand for software engineering skills will fall because of democratization. They acknowledge that there will be more software produced than ever before, and also that more human time will be spent producing software than ever before, but that this work will be done by people who are not software engineers. The idea is that AI will democratize software engineering to the extent that legal software, for instance, can be more easily created by those with training in law than in software engineering. Maybe. But we’ll bet against it. In our view, this falls into the same trap of conflating vibe coding with agentic engineering, and the execution layer with the the whole decide-execute-deliver sandwich. In fact, when we look at the history of programming, there have always been claims that we are at the threshold of democratization — old languages such as FORTRAN, COBOL, and SQL were all accompanied by such prominent hopes at the time of their introduction. It never happened. The barrier isn’t actually learning the syntax. It’s having enough skilled judgment to make good decisions while maintaining accountability. Ultimately the distinction may be semantic. It seems clear that the amount of time people spend on getting computers to do new things will increase over time. This might take the form of building software, or managing complex workflows using agents, or something else. It will require a mix of software skills, AI skills, and domain expertise. Whether it is today’s software engineers who will best adapt to fill these new roles remains to be seen. That last point about the need for adaptation sets up the next essay in this series. The fact that aggregate labor demand in software is likely to remain strong doesn’t mean that most individual workers won’t be affected. We will argue that AI will create massive structural shifts in how software is produced, which will have big impacts on which software engineers stand to gain or lose — based on the types of firms they work in, their geography, their seniority, the pace at which they can adapt. Further reading Deena Mousa points out the superficiality of broad, economy-wide analyses of AI impacts based on metrics like “AI exposure”, and instead calls for “careful, occupation-specific work”. We hope that this series of essays will play a role in establishing a nuanced understanding of AI’s transformation of software engineering. We’ve previous coauthored, with Justin Curl, a paper analyzing AI in legal services that seriously engages with regulatory and other bottlenecks that make that occupation unique. We plan to do more occupation-specific deep dives in the future. In a remarkable essay called No Silver Bullet 40 years ago, Fred Brooks distinguished between the “essential complexity” and “accidental complexity” of software. He argued that some of the complexity of software is accidental, arising from limitations of present technology such as the clunkiness of programming languages, and can be alleviated over time as tooling improves. But some of it is essential, because specifying the correct behavior of software is itself hard. He presents a forceful articulation of why the “decide” layer of the sandwich is thick and resists automation. Interestingly, hopes of boosting programmer productivity through AI were already prominent back then! Brooks argues that because AI or any other technology only reduces accidental complexity, it won’t result in an order-of-magnitude productivity improvement. (Brooks is the author of The Mythical Man Month, an essay collection that is almost certainly the best known and most influential writing on software engineering of all time. No Silver Bullet later became part of the collection.) We are grateful to Felix Chen for feedback on a draft. 1 The checkbox is actually labeled “technological innovation or automation”. If checked, there is a second menu that to disclose the specific technology such as AI or robotics. The current WARN Act data have various limitations — it is New York only, and it is possible that companies are under-reporting AI as a reason for layoffs because of ambiguity or asymmetric risks from checking versus not checking the box (though we have no specific reason to think this). Stronger transparency requirements are in the works at both the federal and state levels; closing this data gap is urgent. 2 We are grateful to our colleague Mihir Kshirsagar for connecting us to the New York State Department of Labor and Elena Grovenger from the department for a prompt response. 3 The paper uses the term coder, but it defines the term based on skills rather than roles, resulting in a broad sweep of jobs that is much broader than “coding”. Measurements based on industry, title, and skills cannot be easily compared to one another. 4 Interestingly, in a sub-study looking at mobile apps, the paper found that the usage of the resulting apps did not go up at all. This gets at one important difference between consumer and enterprise software. The former competes for a relatively fixed pool of attention; more apps published doesn’t mean more hours of app usage. But in enterprise software there is a lot of room for growth, as previously human processes can be software-mediated or automated. Subscribe to AI as Normal Technology Launched 4 years ago Analyzing AI as transformative but normal technology, not superintelligence. https://blog.citp.princeton.edu/2026/06/11/ai-is-already-giving-medical-conclusions-are-they-any-good/ AI Is Already Giving Medical Conclusions. Are They Any Good? June 11, 2026 – by Center for Information Technology Policy Comments Artificial Intelligence, Data Science & Society Authored by: Hayoung Jung Recently, I was talking with some family members from South Korea who mentioned their back pain. My immediate question: “What did the doctor say?” Healthcare is highly accessible and affordable in South Korea, so I assumed they had already seen one. Nope. They asked ChatGPT. In all honesty, this was not truly surprising given how useful these models are. But the moment captures a growing social phenomenon happening everywhere. AI systems are becoming the first stop for health and scientific questions, even in countries where professional care is available and accessible. And people are not just asking these systems to retrieve webpages or list sources, as they might in traditional search engines. Agentic systems, such as Google AI Overview, OpenEvidence, and OpenAI Deep Research, synthesize information from multiple sources and present immediate conclusions to users’ questions in real time. Increasingly, users are directly asking, What is my diagnosis? What are the best treatment options? What should I do next? Reports suggest this is happening across audiences. Laypeople ask AI systems about symptoms, treatments, and scientific claims, while more than 80% of U.S. physicians use them in their professional workflows, including to explore medical questions and support decision-making. When AI systems are becoming the first (or even the only) stop for health and scientific questions, are they even reliable at synthesizing scientific evidence into conclusions that people may actually act on? A Benchmark for Scientific Synthesis To answer this, I worked with my amazing PhD advisors Manoel Horta Ribeiro and Aleksandra Korolova (who also have their own Substacks here and here) to create a benchmark for evaluating how well current AI agents synthesize scientific conclusions from the open web. Scientific conclusion synthesis requires several steps. An agent must retrieve relevant evidence from the open web, filter out irrelevant or low-quality sources, reason across multiple studies, weigh conflicting findings, preserve uncertainty, and synthesize a long-form conclusion. Importantly, these kinds of tasks are long-horizon and open-ended, as expert scientists often spend months searching the literature on the open web, evaluating studies, and synthesizing careful conclusions about what the evidence in the field actually supports. To evaluate this, we built SciConBench, a large-scale benchmark of 9.11K scientific questions paired with expert-written conclusions from Cochrane systematic reviews, a gold standard in evidence-based medicine. Each SciConBench task asks an AI agent to use web tools to answer a scientific question with a paragraph-length conclusion, which we compare against the corresponding expert-written Cochrane conclusion. Importantly, SciConBench is a live benchmark: it is continuously updated as new Cochrane reviews are published, enabling timely evaluations and reducing benchmark leakage as new models are trained on recent web data. Overview of SciConBench. We evaluate whether AI agents can use tools to synthesize scientific conclusions from the open web, without simply retrieving the expert-written answer online. We compare AI-generated conclusions against expert-written Cochrane conclusions by measuring how accurate and complete their factuality are. Even under this controlled setup, frontier AI agents struggle to synthesize reliable scientific conclusions. The Leakage Problem While running SciConBench, we ran into a surprising issue from looking at our agent logs: AI agents were explicitly looking for the benchmark answers directly from Cochrane review articles, even when we instructed them not to in the system prompt. Anthropic recently released a neat blog on this phenomenon called “evaluation awareness,” in which these models would know they are being evaluated and explicitly look for answers online. As models become increasingly capable, a major challenge in evaluating web-enabled agents is that they can often find the answer directly. If a benchmark question comes from a published systematic review, an agent with web access may simply retrieve the review itself, or another webpage that covers its conclusion (e.g., news coverage). At that point, the task is no longer about synthesizing the scientific evidence from scratch, but rather merely retrieving the ground-truth answer (a much easier task!). The model may look impressive, but we would not be measuring the capability we actually care about. To address this, we built SciConHarness, a clean-room evaluation harness. This evaluation harness enforces the clean-room protocol, ensuring agents have controlled access to web search, browsing, and paper search tools, while filtering out ground-truth artifacts such as Cochrane pages and review articles that could leak the answer. This lets us evaluate whether the agent can synthesize the conclusion from the open-web evidence, rather than shortcutting to the already-written expert answer. Measuring factual quality In our study, we work with doctors to validate every component of our benchmark creation and evaluation pipeline. After an AI agent synthesizes a conclusion from the open web, we evaluate their conclusions using our expert-validated factual evaluation pipeline. Instead of judging the whole paragraph at once, the idea is we decompose both the AI-generated conclusion and the expert-written reference conclusion into a series of facts, e.g., statements containing a single piece of information. Then, we measure two things: • Factual precision (correctness): Are the facts in the AI-generated conclusion supported by the reference, or do they contradict it? • Factual recall (coverage): Does the AI-generated conclusion cover the key facts from the reference conclusion needed to answer the question? We use these two metrics because a scientific conclusion can fail in different ways. A conclusion may contain incorrect claims – for example, by overstating weak evidence or flipping the direction of a treatment effect. Alternatively, it may be mostly true but incomplete, omitting key facts or caveats that matter for decision-making. To capture both correctness and completeness, we also report Factual F1, the harmonic mean of factual precision and factual recall. In other words, a system can only score highly on F1 if it performs well on both dimensions: it must avoid making unsupported or contradictory claims, while also covering the key facts needed to answer the question. All metrics range from 0 to 1, with higher being better. So how do these AI agents perform? Our benchmark results. Note that each metric ranges from 0 to 1, with higher being better! We test across frontier models and deep research agents (DR) using SciConHarness, where the best score under the clean-room was 0.337 factual F1-score. As shown in \delta_{Clean} F1, we found models and deep research agents consistently decrease in performance when applying the clean-room. Let’s see the benchmark results above! Across frontier models and deep research agents, synthesizing scientific conclusions remains far from solved. Under clean-room evaluation, which better isolates true synthesis capability, the best-performing agent (OpenAI’s o3-deep-research) achieved only a factual F1 of 0.337. In other words, even the strongest systems struggled to produce conclusions that were both correct and comprehensive with respect to the expert-written Cochrane reviews. We also found that clean-room evaluation consistently reduced performance. When agents had unrestricted web access (e.g., no clean-room), they performed better. However, when we filtered out ground-truth leakage with our clean-room, their scores consistently dropped. This suggests that some apparent performance in open-web evaluations comes from retrieving benchmark artifacts, not genuinely synthesizing conclusions from evidence. This leakage issue is important beyond our benchmark. If we evaluate AI agents in environments where they can shortcut and find the answer directly, we may overestimate their real capabilities, especially for high-stakes tasks in health and science. The deployed agents were also unreliable. We audit consumer-facing agents, like Google AI Overview and OpenEvidence, using our benchmark! Given that these tools are used millions of times in real-world health decision-making, this could result in substantial amounts of incorrect advice given to both clinicians and laypeople. We also audited consumer-facing agents, including Google AI Overview, Google AI Mode, and OpenEvidence. These agents are already being used by laypeople and clinicians to synthesize health information. OpenEvidence, in particular, is marketed as a “clinical AI copilot for doctors” for “high-stakes decisions” and is used hundreds of millions of times in the medical context. Looking more closely at the table above, even when these agents had access to the ground-truth review, their conclusions were often incomplete and sometimes contradictory. OpenEvidence performed best among the audited agents, but still covered only about half of the reference facts and produced contradictory claims: in fact, 50.8% of its generated conclusions contained at least one claim that contradicted the Cochrane review. Google AI Overview and Google AI Mode performed worse, with lower coverage and similarly concerning contradiction rates: 56.3% and 59.0% of their conclusions, respectively, contained at least one contradiction. In many cases, the ground-truth answer was already available online, meaning the models should have been able to identify, retrieve, and prioritize such high-quality sources. This suggests that the failure likely occurred somewhere in the synthesis process, such as evaluating the quality of evidence, integrating high-quality ones, and communicating the evidence correctly. So what? Scientific conclusions are compressed decision-making tools. The optimistic view of AI agents is that they will help democratize expertise by synthesizing these scientific conclusions at scale in real-time. A clinician could quickly get up to speed on an unfamiliar condition. A patient, including someone like my own family member with back pain, could determine whether a treatment seems promising. A scientist could accelerate literature review and understand the frontiers of science. A policymaker could synthesize scientific conclusions before making a decision. The vision is compelling. However, our results suggest that current systems are not yet reliable enough to synthesize scientific conclusions, especially in high-stakes settings like health where even a single misleading answer can deeply impact stakeholders. These agents can generate seemingly competent conclusions that omit key information, include unsupported claims, or contradict expert reviews, creating the risk of patients, clinicians, scientists, and policymakers relying on conclusions that do not faithfully reflect the underlying evidence. Given that these tools are used hundreds of millions of times in health contexts, even modest error rates could translate into a substantial amount of misleading advice or unsafe answers in practice. Our findings suggest that these systems and their use in clinical settings deserve much greater public scrutiny. While AI agents provide real utility in health and science, we need to be much more precise about what they can and cannot do. With SciConBench, we hope to push agentic evaluation closer to an important real-world task we expect these systems to perform: synthesizing careful scientific conclusions from the open web. More broadly, we see this work as part of the measurement infrastructure needed for AI systems in high-stakes domains. If these systems are going to be used in medicine and science, we need stronger evaluations of the tasks people actually delegate to them, along with greater transparency from AI providers, including usage data and post-deployment monitoring. Without that transparency, it is difficult to know how often these errors happen in the real world, who is affected, and when they lead to harm. For now, our results suggest that we should treat these systems less like expert reviewers and more like fallible assistants: useful in some contexts, but requiring careful expert oversight, independent verification, and much stronger evaluation before they are trusted in high-stakes decisions. AI may one day help democratize expertise. But until then, ask a doctor or a scientist before letting the chatbot make the call. Interested in reading more? Check out our paper! Hayoung Jung is a Ph.D. student in computer science at Princeton University, co-advised by Manoel Horta Ribeiro and Aleksandra Korolova. His research broadly focuses on advancing inclusive AI technologies and online platforms to better serve society and communities often overlooked in system development. Drawing on an interdisciplinary background, Hayoung develops technical frameworks and methods grounded in social science theories, with two main goals: auditing AI systems and online platforms, and studying social phenomena such as community norms through language and online behavior. He completed his undergraduate degrees in computer science and political science, and his M.S. in computer science, at the University of Washington. https://arxiv.org/pdf/2606.11337

Sunday, May 17, 2026

Kinh Thủ Lăng Nghiêm giảng giải - Bảy Đoạn Phật Hỏi Về Tâm - Lê Sỹ Minh Tùng

https://www.youtube.com/watch?v=DYKdkvgfu5w Kinh Thủ Lăng Nghiêm giảng giải - Bảy Đoạn Phật Hỏi Về Tâm - Lê Sỹ Minh Tùng

Friday, May 15, 2026

Frauds in HealthCare, MediCare and Medicaid

1/ Medicare, Home Care... and Frauds https://smpresource.org/medicare-fraud/fraud-schemes/home-health-care-fraud/ Medicare Parts A and B cover intermittent or short-term home health services. These services must be provided by a Medicare-approved home health agency that works with your doctor to manage your care. To be eligible for Medicare coverage: • Your doctor must determine it’s medically necessary for you to receive skilled care services at home. Skilled care services at home could include part-time or “intermittent” nurse and nurse aide visits (personal, hands-on care) and rehabilitation services, which include speech-language pathology, physical and occupational therapy, and medical social services. • Your condition must be expected to improve in a reasonable amount of time or your condition requires skilled therapy to maintain your current condition or prevent or slow, further deterioration. • You must be considered “homebound.” This means you are unable to leave your home without assistance, it requires considerable and major effort, or it is considered dangerous due to your current health condition. You may leave home for medical care and some short or infrequent outings (for example, worship services) as long as you meet these conditions. o Note: Even if you do not qualify for home health services, you may still be eligible to receive outpatient therapy services in a doctor’s office, outpatient hospital setting, rehabilitation agency, Comprehensive Outpatient Rehabilitation Facility (CORF), public health agency, or your home. Outpatient therapy services are covered by Medicare Part B and subject to the 20% copayment. Report potential home health care fraud, errors, or abuse if: • You see on your Medicare Summary Notice (MSN) or Explanation of Benefits (EOB) charges for: o Home health services when you did not meet Medicare’s “homebound” criteria o Services that were not deemed medically necessary by your doctor o Home health services like skilled nursing care and/or therapy services that were not provided • You were: o Enrolled in home health services by a doctor you do not know o Offered things such as “free” groceries or a “free” ride from a home health agency in exchange for your Medicare number or to switch to a different home health agency o Charged a copayment for home health services o Asked to sign forms verifying that home health services were provided even though you did not receive any services • Someone came to your home and provided housekeeping or medication services, but you see on your Medicare Summary Notice (MSN) or Explanation of Benefits (EOB) that Medicare was billed for a covered service like skilled nursing or other therapy instead. • You accept cash or gifts in exchange for going along with a home health scam. To learn more about tips related to home health care fraud, click here. To learn how to read your Medicare Summary Notice (MSN) and Explanation of Benefits (EOB), click here. Report Suspected Fraud To report suspected fraud, click here. Report Suspected Medicare Fraud SMP Resources • Home Health Care Fraud Tip Sheet (English) (Arabic) (Chinese Simplified) (French) (German) (Korean) (Russian) (Spanish) (Tagalog) (Vietnamese) • Home Health Care Fraud Infographic (English) (Arabic) (Chinese Simplified) (French) (German) (Korean) (Russian) (Spanish) (Tagalog) (Vietnamese) • Home Health Care Fraud Video 2/ https://www.npr.org/sections/health-shots/2020/01/21/789958067/patients-want-to-die-at-home-but-home-hospice-care-can-be-tough-on-families ...Usually, hospice care is offered in the home, or sometimes in a nursing home. Since the mid-1990s, Medicare has allowed the hospice benefit to cover more types of diagnoses, and therefore more people. As acceptance grows among physicians and patients, the numbers continue to balloon — from 1.27 million patients in 2012 to 1.49 million in 2017. According to the National Hospice and Palliative Care Association, hospice is now a $19 billion industry, almost entirely funded by taxpayers. But as the business has grown, so has the burden on families, who are often the ones providing most of the care. For example, one intimate task in particular changed Joy Johnston's view of what hospice really means — trying to get her mom's bowels moving. Constipation plagues many dying patients. "It's ironically called the 'comfort care kit' that you get with home hospice. They include suppositories, and so I had to do that," she says. "That was the lowest point. And I'm sure it was the lowest point for my mother as well. And it didn't work." Hospice agencies primarily serve in an advisory role and from a distance, even in the final, intense days when family caregivers, or home nurses they've hired, must continually adjust morphine doses or deal with typical end-of-life symptoms, such as bleeding or breathing trouble. Those decisive moments can be scary for the family, says Dr. Joan Teno, a physician and leading hospice researcher at Oregon Health and Science University. How To Be A Better Caregiver When A Loved One Gets Sick "Imagine if you're the caregiver, and that you're in the house," Teno says. "It's in the middle of the night, 2 o'clock in the morning, and all of a sudden, your family member has a grand mal seizure." That's exactly what happened with Teno's mother. "While it was difficult for me to witness, I knew what to do," she says. In contrast, Teno says, in her father's final hours, he was admitted to a hospice residence. Such residences often resemble a nursing home, with private rooms where family and friends can come and go and with round-the-clock medical attention just down the hall. Teno called the residence experience of hospice a "godsend." But an inpatient facility is rarely an option, she says. Patients have to be in bad shape for Medicare to pay the higher inpatient rate that hospice residences charge. And by the time such patients reach their final days, it's often too much trouble for them and the family to move. HHS Inspector General Finds Serious Flaws In 20% Of U.S. Hospice Programs Hospice care is a lucrative business. It is now the most profitable type of health care service that Medicare pays for. According to Medicare data, for-profit hospice agencies now outnumber the nonprofits that pioneered the service in the 1970s. But agencies that need to generate profits for investors aren't building dedicated hospice units or residences, in general, mostly because such facilities aren't profitable enough. Joe Shega, chief medical officer at for-profit Vitas, the largest hospice company in the U.S., insists it's the patients' wishes, not a corporate desire to make more money, that drives his firm's business model. "Our focus is on what patients want, and 85 to 90 percent want to be at home," Shega says. "So, our focus is building programs that help them be there." For many families, making hospice work at home means hiring extra help.... This experience of family caregivers is typical, but often unexpected. 'It's a burden I lovingly did' "It does take a toll" on families, says Katherine Ornstein, an associate professor of geriatrics and palliative medicine at Mount Sinai Hospital in New York, who studies what typically happens in the last years of patients' lives. The increasing burden on loved ones — especially spouses — is reaching a breaking point for many people, her research shows. This particular type of stress has even been given a name: caregiver syndrome. "Our long-term-care system in this country is really using families — unpaid family members," she says. "That's our situation." A few high-profile advocates have even started questioning whether hospice is right for everybody. For some who have gone through home hospice with a loved one, the difficult experience has led them to choose otherwise for themselves. Social worker Coneigh Sea has a portrait of her husband that sits in the entryway of her home in Murfreesboro, Tenn. He died of prostate cancer in their bedroom in 1993. Coneigh Sea is a social worker from Murfreesboro, Tenn., who cared for her husband as he died on home hospice. Now, she wants to make sure her children don't do the same for her. Blake Farmer/WPLN Enough time has passed since then that the mental fog she experienced while managing his medication and bodily fluids — mostly by herself — has cleared, she says. But it was a burden. "For me to say that — there's that guilt," she says, then adds, "but I know better. It was a burden that I lovingly did." She doesn't regret the experience but says it is not one she wishes for her own grown children. She recently sat them down, she says, to make sure they handle her death differently. "I told my family, if there is such a thing, I will come back and I will haunt you," she says with a laugh. "Don't you do that." Sea's family may have limited options. Sidestepping home hospice typically means paying for a pricey nursing home or passing away with the cost and potential chaos of a hospital — which is precisely what hospice care was set up to avoid. As researchers in the field look to the future, they are calling for more palliative care, not less — even as they also advocate for more support of the spouses, family members and friends who are tasked with caring for the patient. "We really have to expand — in general — our approach to supporting caregivers," Ornstein says, noting that some countries outside the U.S. pay for a wider range and longer duration of home health services. "I think what we really need to do is be broadening the support that individuals and families can have as they're caring for individuals throughout the course of serious illness," Ornstein says. "And I think that probably speaks to the expansion of palliative care in general." Blake Farmer's reporting on end-of-life care is part of a reporting fellowship on health care performance, sponsored by the Association of Health Care Journalists and supported by the Commonwealth Fund. 3/ https://www.kff.org/medicaid/understanding-medicaid-home-care-amid-cms-focus-on-potential-fraud-and-abuse/ Understanding Medicaid Home Care Amid CMS Focus on Potential Fraud and Abuse Authors: Alice Burns, Abby Wolk, and Robin Rudowitz Published: Feb 24, 2026 PrintEmailCopy LinkAdd KFF on Google Potential fraud in state Medicaid programs is getting renewed attention, with a recent emphasis on home care, also known as personal care or in-home supportive services. Home care helps with self-care activities such as bathing, dressing, and eating for older adults and people with disabilities. KFF estimates that over 5 million people use Medicaid home care, which allows individuals to receive long-term care without moving into an institution. The Trump administration has recently pointed to Medicaid home care as a source of fraud. Medicaid home care is susceptible to fraud because services are provided in people’s homes to vulnerable individuals who may be less able to advocate for themselves, including some with Alzheimer’s and other dementias. However, there are also additional safeguards against fraud in Medicaid home care compared to other types of Medicaid services. This issue brief describes how Medicaid home care operates, including who is eligible, the various systems in place to promote program integrity in its delivery, and challenges using data newly released by the Centers for Medicare and Medicaid Services (CMS). Key takeaways include the following. • All states provide optional home care services to people whose needs are sufficient to warrant institutionalization. An institutional level of care is generally beyond what family members are capable of providing. • Recognizing the higher risk of fraud in Medicaid home care, federal and state governments have implemented additional tools to identify and detect home care fraud. States, along with the federal government, use provider credentialing and enrollment and data analytics to help prevent fraud. There has been new attention on fraud in Minnesota’s Medicaid program recently, but the fraud, and the state’s work to root it out, date back at least 18 months. • On February 14, 2026, CMS released a dataset with provider-level spending data that the agency suggests could be used to identify unusual billing patterns for specific services, states, or providers, but the limited data could result in mistaken conclusions. Home care is a major emphasis of the new dataset, which stems from the fact that second to hospital spending, long-term care is the second-largest source of Medicaid spending. Although Medicaid long-term care was historically provided primarily in nursing facilities, most enrollees who use long-term care now receive home care. Why does Medicaid cover home care and who is eligible for services? All states provide optional home care services. Under Medicaid, states are required to cover long-term care provided in nursing facilities, but not home care, which has been referred to as the “institutional bias” in Medicaid. States may only provide home care if they can demonstrate that providing the services would cost no more than institutional care would cost for an individual. All states choose to provide optional home care to people who would otherwise require institutionalization. The increased availability of home care reflects people’s preferences to remain in their homes. Expansions of Medicaid home care services also followed the 1999 Supreme Court ruling in Olmstead v. L.C., which declared that unjustified institutionalization of people with disabilities by a public entity (including Medicaid) is a form of discrimination and not permissible under the 1990 Americans with Disabilities Act. Even though nearly all of the benefits are optional for states to provide, the majority of people who use long-term care now do so at home. Medicaid home care use is limited by eligibility criteria that generally make it only available to people whose needs are sufficient to warrant institutionalization. To be eligible for Medicaid home care, applicants must meet both financial and “functional” eligibility criteria. Functional eligibility for Medicaid home care, which is evaluated by assessment tools developed by states, generally requires individuals to demonstrate that they need an institutional level of care. There are no recent data available about states’ specific definitions for an institutional level of care, but it generally indicates that people would require 24-hour services and assistance with multiple activities of daily living (ADLs), which include bathing, dressing, eating, toileting, continence, and transferring between bed and other settings. An institutional level of care is generally beyond what family members are capable of providing. People who require an institutional level of care generally have complex needs that require both skilled and unskilled services and often require services to be provided around the clock. In some cases, family caregivers may not have the medical expertise to provide services, but there are also challenges related to the physical demands of the job and having time to provide such intensive services. Helping family members to bathe, dress, and toilet themselves often requires the strength to lift them, which not all family members have. The time required to provide such intensive services also makes it difficult for family caregivers to provide this level of care and maintain employment or take care of their own health needs. KFF’s focus groups with paid and unpaid family caregivers provide detail that caregiving is physically, mentally, and emotionally challenging; and that family caregivers cannot provide an institutional level of care without supports. To help people requiring an institutional level of care remain at home, Medicaid supports family caregivers by providing supplemental paid care and with direct supports, such as respite care, training, and in some cases payments to the family caregivers to reflect the fact that caregiving makes it impossible to maintain outside employment. What program integrity tools for Medicaid home care exist? Recognizing the higher risk of fraud in Medicaid home care, federal and state governments have implemented additional tools to identify and detect home care fraud. In 2016, Congress passed the 21st Century Cures Act, which requires states to implement electronic visit verification for all Medicaid personal care and home health services if a visit is made to a person in the home. State’s electronic visit verification must include six data elements: member receiving the services, caregiver providing the service, type of service, location of the service delivery, date of the service, and time the service begins and ends. Electronic visit verification was established to help promote fiscal integrity for Medicaid home care, and states had until 2023 to fully implement the requirements. The Health and Human Services Office of Inspector General (HHS OIG) has an active project underway to evaluate the availability and completeness of the electronic visit verification data and how states are using the data to promote program integrity. An HHS OIG report finds that in fiscal year 2024, there were 298 fraud convictions....

AI and Research in Medicine and Other Fields

https://www.cbsnews.com/news/ai-hallucinate-citations-medial-research/?intcid=CNR-02-0623 AI is fabricating citations in biomedical studies, researchers find By Megan Cerullo Updated on: May 13, 2026 / 5:09 PM EDT / CBS News Artificial intelligence is fabricating references to medical research that does not exist, according to recent findings. A recent audit found that, among millions of biomedical papers, more than 4,000 contained citations to non-existent research, according to an article in The Lancet. Such fabricated citations can undermine the clinical guidelines that health care professionals rely on to provide care, said Maxim Topaz, an associate professor at the Columbia School of Nursing and the study's lead author. An audit of millions of biomedical papers found more than 4,000 citations to bogus studies, the researchers said in a recent article published in The Lancet. Fabricated citations are dangerous because they influence clinical guidelines, which are based on public research that health care professionals follow in providing care, Maxim Topaz, an associate professor at the Columbia School of Nursing and the study's lead author, told CBS News. "When those fake references are making it into the literature, they will end up in those guidelines, and that's how doctors decide how to provide care for you," he said. "Your doctor could be making decisions around treatment based on studies that never existed." Growing problem Also troubling is that none of the mistakes Topaz and his team identified have been corrected or retracted, and could still be influencing patient care, he said. "The rate of fake references showing up in published medical literature is growing," Topaz added, noting that the number of such erroneous citations has grown 12-fold over the last three years. The fabricated references spanned nearly 3,000 academic papers. Topaz's own experience spurred him to investigate the issue. An AI app he was using to help polish one of his own scientific papers inserted a fake citation, he told CBS News. It then slipped through several layers of peer reviews before one sharp-eyed editor caught the phony reference. "I was mortified, because I've been studying AI for the past 15 years, so if it can happen to me, it can happen to anyone," he said. Such mishaps arise when an author asserts a statement of fact and asks AI for a citation, Topaz explained. "In some cases, AI would slip those in, inadvertently," he said. "You would hope the facts are accurate, but if they are supported by fabricated citations, you don't know if the 'facts' are accurate." In some cases, an AI tool will also cite a real author while inventing research and attributing it to that person. Other times, citations were completely fabricated, Topaz said. "This is just the tip of the iceberg," he said, noting that research across other fields could also be subject to the same issues. Meanwhile, faux AI-generated scientific citations can "look perfectly real," Topaz added, who emphasized the importance of researchers rigorously fact-checking their work.

Sunday, May 10, 2026

AI Literacy Across the United States Workforce

https://blog.citp.princeton.edu/2026/05/05/make-america-ai-ready-strengths-weaknesses-and-recommendations/ What Does It Do Well? It’s accessible. The choice of SMS for delivery maximizes reach. It meets people where they are, requiring no app installation, account creation, or navigating unfamiliar web platforms. The 10-minute-a-day pacing is practical. It emphasizes verification of AI outputs. The course consistently emphasizes that AI output must be checked, not blindly trusted. The example of looking up a restaurant only to find out that a nail salon has opened in its place is memorable (Lesson 6, below). The course also thoughtfully extends this skepticism to AI-generated images, video, and audio. It centers human responsibility. The quiz question about a coworker submitting an AI-generated report with fabricated statistics (Lesson 2, below) returns a sensible response: the human is responsible. This is repeated throughout the course and is one of its most important messages. It’s honest about AI’s limitations. The course doesn’t shy away from the fact that AI can be confidently wrong. The term “hallucination” is introduced clearly, the concept of training data cutoffs is explained, and the course repeatedly emphasizes that AI predicts rather than knows or understands. For a 101-level course, this is appropriately calibrated. What could be fixed in AI 101? There are some things we’d recommend fixing about the course. The course repeatedly contradicts its own privacy and security advice. The course contains a serious inconsistency when it comes to data privacy and security. On the last day of the course it offers common-sense advice, stating “PROTECT your private info. Never share passwords, Social Security numbers, medical records, or confidential work data with AI tools,” later adding not to share “income data.” But some of the advice and exercises leading up to that point had already prompted users to input some of these “never share” types of data. • On Day 3, the course urges the user to input a photo, PDF or recording of their own voice. • On Day 4, it says that a “power move” is for users to “give AI your own data to work with,” including instructions to “paste your resume” and “share your monthly expenses.” • On Day 5, the course says that a good use case for AI is putting “medical symptoms” in to learn medical terms and prepare questions for a doctor. • On Day 6, it tells the user to share their address to find a restaurant near them. These self-contradictions expose a central tension: AI tools can be more useful when they know more about you, so a blanket prohibition against sharing private information will limit their usefulness. Unfortunately, there is no simple answer to the question of how to protect your privacy when using AI, and there is no single approach that will work for everyone. It requires critical thinking based on an understanding of different threat models, including prompt injection risks, traditional cybersecurity risks, legal risks, AI companies’ eagerness to train on user data, and workplace policies that of course vary between organizations. We recognize that this level of nuance would be too much for an introductory course. We would recommend that the privacy protection lesson come earlier in the course, and include information about privacy settings that AI tools offer, such as temporary or incognito chats. Instead of the “never share” language, giving people at least a rudimentary understanding of what could go wrong would be more helpful, along with links to resources where they can learn more. The quizzes adopt a right-wrong dichotomy The quiz questions often ask the user for an explanation of AI’s failure modes and social effects. While it is important to face these head-on, the questions consistently have one “obviously correct” answer that maps to the course’s framing. Several wrong answers are absurd strawmen (“AI likes making things up to test you,” “AI’s internet connection was slow”). This limits the potential to build genuine understanding or critical thinking about AI’s functioning and societal implications. We would recommend an approach that highlights known issues without pretending that the explanations are simple. Flexibility in how issues are framed will allow course participants to grapple with them in a manner that is relevant to the skills they are building. More open-ended quiz questions might include: “Your employer starts mandating that all workers use AI. This may enable your employer to monitor your productivity. What are your options?” or “You are about to apply for a loan. How can you find out whether and how AI will be used in evaluating your application?” What could DOL build upon in AI 201? Expanding upon the introductory materials in the 101 course, there are several opportunities for content development that we would recommend. The course misses how AI is reshaping work For a course that is offered by the Department of Labor, there is very little content on the subject of work — the course frames AI solely as a productivity tool workers can use. The Department of Labor exists to protect workers, their wages, their safety, and their rights, yet the course largely skips over the ways AI is already reshaping hiring, performance monitoring, and layoffs of workers across many sectors. An AI 201 course could provide more information on these, and inform citizens of legitimate reasons they may have to call for regulation. It could also go into more depth on the privacy question. Finally, AI 201 could reckon with the broader societal consequences of this technology: for instance, bias, surveillance, and the concentration of power in the hands of a few large technology companies. Workers who understand these dynamics are not just AI-literate; they are better equipped to advocate for themselves. Deepening Technical Explanations The 101 course keeps its terminology simple, which is important. But sometimes it oversimplifies. An AI 201 could deepen the explanation of how models are trained, make inferences, and deliver human-interpretable results. The course’s technical explanation — AI finds patterns and makes predictions — serves as the entire mental model. This framing makes AI sound more mechanistic and less opaque than it actually is. On day 3, the language of pattern and prediction drops out, with the language of “instruction” and “results” substituting in for the human input and predicted output of AI. The current course also equates predicting with guessing and AI training with “studying” – analogies that might be a useful starting point, but are quite limiting. For an AI 201 course, the connections between AI learning, model weights and predictions – as well as the connections between all of these things and the results generated from instructions – could be deepened. Indeed, how AI can be biased, can hallucinate, and otherwise can make errors is easier to comprehend when one understands a bit of the math behind machine learning. More Active Learning Engagement The quizzes in AI 101 are based on reputable learning science. Often the quiz will introduce a new concept or ask the user to stretch what they just learned to cover a new situation. There’s good evidence to think that this sort of “pre-assessment,” followed quickly by lessons teaching the correct answer, does improve retention in general. But as we said the AI 101 quiz questions consistently have one “obviously correct” answer that maps to the course’s framing, limiting the potential to challenge the user’s understanding. Additionally, we found minimal tailoring of text-message responses to the user’s quiz answers, despite the affordances of the interactive platform. If one user selects what is considered a right answer while another selects a wrong one (we tested this), the course responds with similar if not identical information. Better quizzes in AI 201 could perhaps be assessed by an LLM, with adaptive responses that meet the user where they are, and stretch their understanding when they’ve acquired a solid base. The daily challenges in AI 101 (Quick Draw, Udio music generation, fridge photo recipes) are well-designed to get people past the intimidation barrier. They’re low-stakes, fun, and demonstrate AI capabilities concretely. But for AI 201 they could be more effectively leveraged to actually show people how AI can be useful in their work and daily lives, and can (as promised by AI 101) “save them 5 hours per week”. Who created the course, and how? The DOL’s press release announcing the course points to a collaboration with a private partner called Arist. Arist’s website at the time of writing states that “Arist is the #1 enablement AI. Arist’s agents orchestrate creation, delivery, and analytics, end-to-end.” While the DOL announcement gives little detail as to the nature of the collaboration, if the company co-developed actual course content using generative AI this fact should be disclosed. One of us ran selected course content through Pangram, a tool which purports to detect AI content, and the results came back suggesting it was 100% AI-generated. Without putting too much stock in that, we began to suspect that some of the faults in the course could be explained this way. The simplistic framing of how AI generates results (patterns/predictions, instructions/results) could come from AI: since LLMs are trained on old explanations of how LLMs work, they may reach for framings that are not up-to-date. Also, if each module/quiz was generated separately, that could explain abrupt changes in terminology and the contradictions we identified regarding the sharing/not sharing of private information. The use of AI for content creation isn’t a problem per se; but the failure to disclose left a missed opportunity for a teachable moment on the utility and risks associated with generative content. Also, the contradictions in regards to security and privacy, which we discussed earlier, should have been caught by human oversight. Additionally, going forward, transparency about how commercial partners are involved can lend itself to wider adoption and trust of course materials and DOL initiatives. The final lesson of the course refers users to an Arist-sponsored AI summit featuring Tony Robbins and Dean Graziosi. While the Summit appeared to be free, it raises the question of what other paid AI-enablement sessions or products these well-known coaches might offer. Graziosi has drawn attention for his role in other problematic training programs. Users deserve to know who benefits from pursuing the recommendations made by a Federal agency. Conclusion Make America AI Ready offers significant insight into the priorities the Federal government holds in reaching widespread AI-literacy across the United States workforce. Although we suggested several areas for development, the course content and manner in which it was released are a useful start in achieving this aim.