Numpy中的数组堆叠:UNIX和Windows的不同行为

2024-04-19 09:27:36 发布

您现在位置:Python中文网/ 问答频道 /正文

注意:这是python2.7,不是Py3

这是对先前问题的更新尝试。您要求我提供完整的代码、内容说明和示例输出文件。我会尽我所能把这个格式化好的。你知道吗

这个代码是用来从荧光“平板阅读器”获取输入文件,并将读数转换为DNA浓度和质量。然后,它生成一个按照8x12平板方案(DNA/分子工作标准)组织的输出文件。行标记为“A,B,C,…,H”,列仅标记为1-12。你知道吗

根据用户输入,需要堆叠数组以格式化输出。但是,在UNIX中堆叠数组时(打印或写入到输出文件),它们仅限于第一个字符。你知道吗

换句话说,在Windows中,如果数组中的一个数字是247.5,它将打印完整的数字。但是在UNIX环境(Linux/Ubuntu/MacOS)中,它会被截断为“2”。在Windows中,-2.7的数字通常会打印,但在UNIX中只打印为“-”。你知道吗

完整的代码可以在下面找到;注意最后一个块是代码中最相关的部分

#!/usr/bin/env python

Usage = """
plate_calc.py - version 1.0

Convert a series of plate fluorescence readings
to total DNA mass per sample and print them to 
a tab-delimited output file.

This program can take multiple files as inputs
(separated by a space) and generates a new
output file for each input file.

NOTE: 

1) Input(s) must be an exported .txt file.
2) Standards must be in columns 1 and 2, or 11 
and 12.
3) The program assumes equal volumes across wells.

Usage:

    plate_calc.py input.txt input2.txt input3.txt
"""

import sys
import numpy as np

if len(sys.argv)<2:
    print Usage
else:
#First, we want to extract the values of interest into a Numpy array
    Filelist = sys.argv[1:]
    input_DNA_vol = raw_input("Volume of sample used for AccuClear reading (uL): ")
    remainder_vol = raw_input("Remaining volume per sample (uL): ")
    orientation = raw_input("Are the standards on the LEFT (col. 1 & 2), or on the RIGHT (col. 11 and 12)? ")
    orientation = orientation.lower()
    for InfileName in Filelist:
        with open(InfileName) as Infile:
            fluor_list = []
            Linenumber = 1
            for line in Infile: #this will extract the relevant information and store as a list of lists
                if Linenumber == 5:
                    line = line.strip('\n').strip('\r').strip('\t').split('\t')
                    fluor_list.append(line[1:])
                elif Linenumber > 5 and Linenumber < 13:
                    line = line.strip('\n').strip('\r').strip('\t').split('\t')
                    fluor_list.append(line)
                Linenumber += 1
            fluor_list = [map(float, x) for x in fluor_list] #converts list items from strings to floats
            fluor_array = np.asarray(fluor_list) #this takes our list of lists and converts it to a numpy array

这部分代码(上面)从输入文件(从读板器获得)中提取感兴趣的值,并将其转换为数组。它还需要用户输入以获取用于计算和转换的信息,并确定放置标准的列。你知道吗

最后一部分将在稍后的数组堆叠时发挥作用,这就是问题行为发生的地方。你知道吗

        #Create conditional statement, depending on where the standards are, to split the array
        if orientation == "right":
            #Next, we want to average the 11th and 12th values of each of the 8 rows in our numpy array 
            stds = fluor_array[:,[10,11]] #Creates a sub-array with the standard values (last two columns, (8x2))
            data = np.delete(fluor_array,(10,11),axis=1) #Creates a sub-array with the data (first 10 columns, (8x10))

        elif orientation == "left":
            #Next, we want to average the 1st and 2nd values of each of the 8 rows in our numpy array   
            stds = fluor_array[:,[0,1]] #Creates a sub-array with the standard values (first two columns, (8x2))
            data = np.delete(fluor_array,(0,1),axis=1) #Creates a sub-array with the data (last 10 columns, (8x10))

        else:
            print "Error: answer must be 'LEFT' or 'RIGHT'"

        std_av = np.mean(stds, axis=1) #creates an array of our averaged std values

        #Then, we want to subtract the average value from row 1 (the BLANK) from each of the 8 averages (above)
        std_av_st = std_av - std_av[0]

        #Run a linear regression on the points in std_av_st against known concentration values (these data = y axis, need x axis)
        x = np.array([0.00, 0.03, 0.10, 0.30, 1.00, 3.00, 10.00, 25.00])*10 #ng/uL*10 = ng/well
        xi = np.vstack([x, np.zeros(len(x))]).T #creates new array of (x, 0) values (for the regression only); also ensures a zero-intercept (when we use (x, 1) values, the y-intercept is not forced to be zero, and the slope is slightly inflated)
        m, c = np.linalg.lstsq(xi, std_av_st)[0] # m = slope for future calculations

        #Now we want to subtract the average value from row 1 of std_av (the average BLANK value) from all data points in "data"
        data_minus_blank = data - std_av[0]

        #Now we want to divide each number in our "data" array by the value "m" derived above (to get total ng/well for each sample; y/m = x)
        ng_per_well = data_minus_blank/m

        #Now we need to account for the volume of sample put in to the AccuClear reading to calculate ng/uL
        ng_per_microliter = ng_per_well/float(input_DNA_vol)

        #Next, we multiply those values by the volume of DNA sample (variable "ng")
        ng_total = ng_per_microliter*float(remainder_vol)

        #Set number of decimal places to 1
        ng_per_microliter = np.around(ng_per_microliter, decimals=1)
        ng_total = np.around(ng_total, decimals=1)

以上代码执行必要的计算,以根据DNA“标准”的线性回归计算给定样品中DNA的浓度(ng/uL)和总质量(ng),可以在第1列和第2列(user input=“left”)或第11列和第12列(user input=“right”)。你知道吗

        #Create a row array (values A-H), and a filler array ('-') to add to existing arrays
        col = [i for i in range(1,13)]
        row = np.asarray(['A','B','C','D','E','F','G','H'])
        filler = np.array(['-','-','-','-','-','-','-','-','-','-','-','-','-','-','-','-',]).reshape((8,2))

上面的代码创建要与原始数组堆叠的数组。“filler”数组是基于用户输入的“right”或“left”(堆叠命令np.c_u[],见下图)。你知道吗

        #Create output
        Outfile = open('Total_DNA_{0}'.format(InfileName),"w")
        Outfile.write("DNA concentration (ng/uL):\n\n")
        Outfile.write("\t"+"\t".join([str(n) for n in col])+"\n")
        if orientation == "left": #Add filler to left, then add row to the left of filler
            ng_per_microliter = np.c_[filler,ng_per_microliter]
            ng_per_microliter = np.c_[row,ng_per_microliter]
            Outfile.write("\n".join(["\t".join([n for n in item]) for item in ng_per_microliter.tolist()])+"\n\n")
        elif orientation == "right": #Add rows to the left, and filler to the right
            ng_per_microliter = np.c_[row,ng_per_microliter]
            ng_per_microliter = np.c_[ng_per_microliter,filler]
            Outfile.write("\n".join(["\t".join([n for n in item]) for item in ng_per_microliter.tolist()])+"\n\n")
        Outfile.write("Total mass of DNA per sample (ng):\n\n")
        Outfile.write("\t"+"\t".join([str(n) for n in col])+"\n")
        if orientation == "left":
            ng_total = np.c_[filler,ng_total]
            ng_total = np.c_[row,ng_total]
            Outfile.write("\n".join(["\t".join([n for n in item]) for item in ng_total.tolist()]))
        elif orientation == "right":
            ng_total = np.c_[row,ng_total]
            ng_total = np.c_[ng_total,filler]
            Outfile.write("\n".join(["\t".join([n for n in item]) for item in ng_total.tolist()]))
        Outfile.close

最后,生成输出文件。这就是问题行为发生的地方。你知道吗

使用一个简单的print命令,我发现stacking命令numpy.c\[]是罪魁祸首(不是数组写入命令)。你知道吗

因此,在Windows中,numpy.c_[]似乎不会截断这些数字,但会将这些数字限制在UNIX环境中的第一个字符。

在这两种平台上,有哪些替代方案可行?如果不存在,我不介意创建一个特定于UNIX的脚本。你知道吗

谢谢大家的帮助和耐心。很抱歉没有提供所有必要的信息。你知道吗

这些图片是屏幕截图,显示了Windows的正确输出以及最终在UNIX中得到的结果(我试图为您格式化这些内容……但它们是一场噩梦)。我还提供了一个屏幕截图,当我简单地打印数组“ng\u per\u microll”和“ng\u total”时,在终端中获得的输出 UNIX OutputPrinted Arrays (UNIX Terminal)


Tags: andofthetoinfornpng
2条回答

在用户hpaulj的帮助下,我发现这不是操作系统和环境之间行为不同的问题。这很可能是因为用户拥有不同版本的numpy。你知道吗

数组的串联自动将“float64”数据类型转换为“S1”(以匹配“filler”数组('-')和“row”数组('A'、'B'等))。你知道吗

numpy的较新版本(特别是v1.12.X)似乎允许在没有这种自动转换的情况下串联数组。你知道吗

我仍然不确定在旧版本的numpy中如何解决这个问题,但是建议人们升级他们的版本以获得完整的性能应该是一件简单的事情。:)

Using a simple print command, I found that the stacking command numpy.c_[ ] is the culprit (NOT the array writing command).

So it appears that numpy.c_[ ] does not truncate these numbers in Windows, but will limit those numbers to the first character in a UNIX environment.

用简单的例子说明这些陈述。np.c_[]不应该做任何不同的事情。你知道吗

在Py3中,默认的字符串类型是unicode。和numpy 1.12

In [149]: col = [i for i in range(1,13)]
     ...: row = np.asarray(['A','B','C','D','E','F','G','H'])
     ...: filler = np.array(['-','-','-','-','-','-','-','-','-','-','-','-','-','-','-','-',]).reshape((8,2))
     ...: 
In [150]: col
Out[150]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
In [151]: "\t"+"\t".join([str(n) for n in col])+"\n"
Out[151]: '\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\n'
In [152]: filler
Out[152]: 
array([['-', '-'],
       ...
       ['-', '-'],
       ['-', '-']], 
      dtype='<U1')
In [153]: row
Out[153]: 
array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 
      dtype='<U1')
In [154]: row.shape
Out[154]: (8,)
In [155]: filler.shape
Out[155]: (8, 2)
In [159]: ng_per_microliter=np.arange(8.)+1.23
In [160]: np.c_[filler,ng_per_microliter]
Out[160]: 
array([['-', '-', '1.23'],
       ['-', '-', '2.23'],
       ['-', '-', '3.23'],
      ...
       ['-', '-', '8.23']], 
      dtype='<U32')
In [161]: np.c_[row,ng_per_microliter]
Out[161]: 
array([['A', '1.23'],
       ['B', '2.23'],
       ['C', '3.23'],
         ....
       ['H', '8.23']], 
      dtype='<U32')

在早期的numpy版本中,U1(或Py2中的S1)数组与数值的串联可能会将数据类型保留在U1。在我的例子中,它们已经扩展到U32。你知道吗

因此,如果您怀疑np.c_,则显示这些结果(如果需要,使用repr

print(repr(np.c_[row,ng_per_microliter]))

跟踪dtype。你知道吗

对于v1.12发行说明(可能更早)

The astype method now returns an error if the string dtype to cast to is not long enough in “safe” casting mode to hold the max value of integer/float array that is being casted. Previously the casting was allowed even if the result was truncated.

这可能在执行连接时起作用。你知道吗

相关问题 更多 >