contigs的raw counts转化到bin的raw counts

06-01 1479阅读
#!/usr/bin/env python
#########################################################
# Add contig raw read counts by bin mapping
# Written by PeiZhong in IFR of CAAS
# Optimized by ChatGPT for robustness
import argparse
import pandas as pd
parser = argparse.ArgumentParser(description='Aggregate contig raw read counts into bins')
parser.add_argument('--stb', '-m', required=True, help='Mapping file: contig to bin (TSV format)')
parser.add_argument('--raw_reads', '-r', required=True, help='Contig-level raw read count table (TSV format)')
parser.add_argument('--output_name', '-o', required=True, help='Output file name for bin-level count table (TSV)')
args = parser.parse_args()
# 1. Load contig-to-bin mapping
map_df = pd.read_csv(args.stb, sep='\t', header=None, names=["Contig", "Bin"])
# 2. Load contig-level raw count matrix
count_df = pd.read_csv(args.raw_reads, sep='\t')
# 3. Merge to add Bin info to contig count table
merged_df = pd.merge(map_df, count_df, left_on="Contig", right_on="GeneID", how='inner')
# 4. Aggregate counts by bin (sum across contigs in the same bin)
bin_counts = merged_df.drop(columns=["Contig", "GeneID"]).groupby("Bin").sum()
# 5. Save as TSV
bin_counts.to_csv(args.output_name, sep='\t')
print(f"Bin-level count matrix saved to: {args.output_name}")

stb文件

Dairy_cattle_Abomasum-1__c384	RGIG1.fa
Dairy_cattle_Abomasum-1__c1727	RGIG1.fa
Dairy_cattle_Abomasum-1__c4302	RGIG1.fa
Dairy_cattle_Abomasum-1__c6442	RGIG1.fa

raw reads文件

GeneID	ATCC_10	ATCC_11	ATCC_1	ATCC_2	ATCC_3	ATCC_4	ATCC_5	ATCC_6	ATCC_7	ATCC_8	ATCC_9	CK_10	CK_11	CK_1	CK_2	CK_3	CK_4	CK_5	CK_6	CK_7	CK_8	CK_9
*	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
Dairy_cattle_Abomasum-1__c100066	2	2	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	1	0	2	0
Dairy_cattle_Abomasum-1__c100090	0	0	0	0	0	0	0	0	1	0	0	0	2	4	3	0	0	0	0	0	2	0

结果

Bin	ATCC_10	ATCC_11	ATCC_1	ATCC_2	ATCC_3	ATCC_4	ATCC_5	ATCC_6	ATCC_7	ATCC_8	ATCC_9	CK_10	CK_11	CK_1	CK_2	CK_3	CK_4	CK_5	CK_6	CK_7	CK_8	CK_9
RGIG1.fa	53	196	106	383	227	82	117	168	210	126	96	237	225	132	146	234	129	185	162	267	125	306
RGIG1000.fa	138	143	177	151	146	109	100	116	129	133	155	143	182	126	188	144	156	144	218	183	186	168
RGIG10000.fa	76	139	98	103	192	71	107	111	170	136	116	123	176	146	177	214	161	222	204	272	212	363
RGIG10001.fa	6999	1483	1643	17601	86843	47775	4379	4506	4197	12932	3891	2968	16374	2753	2802	1354	3820	2672	5509	2798	5807	4192
RGIG10002.fa	55367	62596	48127	61821	47531	80204	54267	33811	62336	44081	63759	69962	45994	87378	78818	115251	72333	57748	78264	59453	59145	55542

 

 

contigs的raw counts转化到bin的raw counts
(图片来源网络,侵删)
contigs的raw counts转化到bin的raw counts
(图片来源网络,侵删)
contigs的raw counts转化到bin的raw counts
(图片来源网络,侵删)
免责声明:我们致力于保护作者版权,注重分享,被刊用文章因无法核实真实出处,未能及时与作者取得联系,或有版权异议的,请联系管理员,我们会立即处理! 部分文章是来自自研大数据AI进行生成,内容摘自(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供学习参考,不准确地方联系删除处理! 图片声明:本站部分配图来自人工智能系统AI生成,觅知网授权图片,PxHere摄影无版权图库和百度,360,搜狗等多加搜索引擎自动关键词搜索配图,如有侵权的图片,请第一时间联系我们。

目录[+]

取消
微信二维码
微信二维码
支付宝二维码