Skip to content

Different feature dim numbers after PCA in example script? #4

@Y-SHI-MxLucid

Description

@Y-SHI-MxLucid

Hi there,

Have read the preprint very nice one.

I am trying to run the example script in the project, and I found that, the input of MultiMAP.integration:

adata = MultiMAP.Integration([rna, atac_genes], ['X_pca', 'X_lsi'])

rna.obsm['X_pca'] has the dim (4382, 50) while atac_genes.obsm['X_lsi'] has the dim (3166, 49). atac_genes.obsm['X_lsi'] is the output of MultiMAP.TFIDF_LSI() in init.py and MultiMAP.TFIDF_LSI() called tfidf() in matrix.py

MultiMAP.TFIDF_LSI(atac_peaks)
atac_genes.obsm['X_lsi'] = atac_peaks.obsm['X_lsi'].copy()

I later checked in matrix and I think the dim number = 49 might due to the discarding of the first column of the sklearn.decomposition.TruncatedSVD() output?

# n_components passed to here is 50
def tfidf(X, n_components, binarize=True, random_state=0):
    from sklearn.feature_extraction.text import TfidfTransformer
    sc_count = np.copy(X)
    if binarize:
        sc_count = np.where(sc_count < 1, sc_count, 1)
    tfidf = TfidfTransformer(norm='l2', sublinear_tf=True)
    normed_count = tfidf.fit_transform(sc_count)
    lsi = sklearn.decomposition.TruncatedSVD(n_components=n_components, random_state=random_state)
    lsi_r = lsi.fit_transform(normed_count)
    # Here↓↓↓↓
    X_lsi = lsi_r[:, 1:]
    return X_lsi

I wonder is the discarding of the column #0 is to remove the PC1 which usually strongly correlated to sequencing depth? In this way, the 2 inputs of MultiMAP.Integration() has PCA dim of 50 and 49 respectively although the function still runs normally and returns a result with dim (7548, 2), but, is that okay to do so? I have an impression reading the preprint that the 2 dataset to be integrated should have the same PC dim number after PCA reduction, because the inter-dataset point distance need to be calculated. Please could you correct me if my understanding is wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions